Relationship between Broad and Sanger models

I’m confused about

  1. The relationship between DepMap, CCLE, and Cell Model Passports
  2. Why Broad and Sanger are using different IDs for the cell lines (models)

I downloaded a file with a list of models from both Sanger and Broad. Then I joined them and performed some counts. There are models that seem to be in the Broad system and NOT the Sanger system, as well as visa versa.

As a result, I am not free to pick one or the other to use as the standard.

Broad vs. Sanger Cell Line IDs

Ariel Balter

30 May, 2022

DepMap Project

Broad and Sanger are both part of the DepMap project.

Sanger

Sanger has a web page for its DepMap
Models

Under this section is a link to the Cell Model
Passports
section which

provides a single location where information on Sanger DepMap cell
models is available in a user-friendly environment.

Cell Model Passports has a download
page
which provides

Stable
link
that always points to the latest version.

Broad

Broad hosts data for the DepMap project at a dedicated portal:

https://depmap.org/portal/download/

Broad also has a seemingly-related project called the [Cancer Cell Line
Encyclopedia (CCLE)](https://sites.broadinstitute.org/ccle. The CCLE
Datasets page
has a
link for an annotated list of cell lines, however, that link is dead.
The link for Processed Data leads to the DepMap download portal.

That portal lists a file called
sample_info.csv
which could very well be the annotated cell line information.

Download Sanger Model List

model_list =
  read_csv("https://cog.sanger.ac.uk/cmp/download/model_list_latest.csv.gz") %>%
  select(
    sanger_model_id = model_id,
    depmap_id = BROAD_ID,
    sanger_sample_id = sample_id,
    sanger_patient_id = patient_id,
    model_type,
    cell_line_name = model_name,
    ccle_id = CCLE_ID,
    tissue,
    cancer_type,
    cancer_subtype = cancer_type_detail,
    sample_site
  )
Rows: 1984 Columns: 51
-- Column specification -----------------------------------------------
Delimiter: ","
chr (36): model_id, model_name, synonyms, model_type, growth_proper...
dbl  (7): pmed, mutational_burden, ploidy, age_at_sampling, samplin...
lgl  (8): mutation_data, methylation_data, expression_data, cnv_dat...

i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.

Download BROAD DepMap “Sample Info”

sample_info =
  read_csv("https://ndownloader.figshare.com/files/35020903") %>%
  select(
    depmap_id = DepMap_ID,
    sanger_model_id = Sanger_Model_ID,
    ccle_id = CCLE_Name,
    cell_line_name,
    stripped_cell_line_name,
    tissue = sample_collection_site,
    cancer_type = primary_disease,
    cancer_subtype = Subtype,
    lineage,
    lineage_subtype
    )
Rows: 1840 Columns: 29
-- Column specification -----------------------------------------------
Delimiter: ","
chr (27): DepMap_ID, cell_line_name, stripped_cell_line_name, CCLE_...
dbl  (2): COSMICID, WTSI_Master_Cell_ID

i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.

Joined

joined  = full_join(
  sample_info,
  model_list,
  by = c("sanger_model_id", "depmap_id"),
  suffix = c("_broad", "_sanger")
)

sorted_colnames =
  colnames(joined) %>%
  sort() %>%
  setdiff(., c("sanger_model_id", "depmap_id")) %>%
  c(c("sanger_model_id", "depmap_id"), .)

joined = joined %>% select(!!sorted_colnames)

Some Counts

joined %>%
  mutate(
    has_depmap_id = !is.na(depmap_id),
    has_sanger_id = !is.na(sanger_model_id)
  ) %>%
  count(has_depmap_id, has_sanger_id) %>%
  kable()
  has_depmap_id   has_sanger_id        n
  --------------- --------------- ------
  FALSE           TRUE               269
  TRUE            FALSE              687
  TRUE            TRUE              1730
joined %>%
  mutate(
    has_depmap_id = !is.na(depmap_id),
    has_sanger_id = !is.na(sanger_model_id)
  ) %>%
  group_by(model_type) %>%
  count(has_depmap_id, has_sanger_id) %>%
  kable()
  model_type   has_depmap_id   has_sanger_id        n
  ------------ --------------- --------------- ------
  Cell Line    FALSE           TRUE               195
  Cell Line    TRUE            TRUE              1715
  Organoid     FALSE           TRUE                74
  NA           TRUE            FALSE              687
  NA           TRUE            TRUE                15
joined %>%
  mutate(
    has_depmap_id = !is.na(depmap_id),
    has_sanger_id = !is.na(sanger_model_id),
    has_ccle_id_broad = !is.na(ccle_id_broad),
    has_ccle_id_sanger = !is.na(ccle_id_sanger),
    has_cell_line_name_broad = !is.na(cell_line_name_broad),
    has_cell_line_name_sanger = !is.na(cell_line_name_sanger)
  ) %>%
  count(has_depmap_id, has_sanger_id, has_ccle_id_broad, has_ccle_id_sanger, has_cell_line_name_broad, has_cell_line_name_sanger) %>%
  arrange(!has_depmap_id, !has_sanger_id, !has_ccle_id_broad, !has_ccle_id_sanger, !has_cell_line_name_broad, !has_cell_line_name_sanger) %>%
  kable()
 has_depmap_id   has_sanger_id   has_ccle_id_broad   has_ccle_id_sanger   has_cell_line_name_broad   has_cell_line_name_sanger        n
 --------------- --------------- ------------------- -------------------- -------------------------- --------------------------- ------
 TRUE            TRUE            TRUE                TRUE                 TRUE                       TRUE                          1108

 TRUE            TRUE            TRUE                TRUE                 FALSE                      TRUE                            28

 TRUE            TRUE            TRUE                FALSE                TRUE                       TRUE                             2

 TRUE            TRUE            TRUE                FALSE                TRUE                       FALSE                           14

 TRUE            TRUE            TRUE                FALSE                FALSE                      FALSE                            1

 TRUE            TRUE            FALSE               TRUE                 FALSE                      TRUE                           575

 TRUE            TRUE            FALSE               FALSE                FALSE                      TRUE                             2

 TRUE            FALSE           TRUE                FALSE                TRUE                       FALSE                          620

 TRUE            FALSE           TRUE                FALSE                FALSE                      FALSE                           63

 TRUE            FALSE           FALSE               FALSE                TRUE                       FALSE                            4

 FALSE           TRUE            FALSE               TRUE                 FALSE                      TRUE                             1

 FALSE           TRUE            FALSE               FALSE                FALSE                      TRUE                           268