I’m confused about
- The relationship between DepMap, CCLE, and Cell Model Passports
- Why Broad and Sanger are using different IDs for the cell lines (models)
I downloaded a file with a list of models from both Sanger and Broad. Then I joined them and performed some counts. There are models that seem to be in the Broad system and NOT the Sanger system, as well as visa versa.
As a result, I am not free to pick one or the other to use as the standard.
Broad vs. Sanger Cell Line IDs
Ariel Balter
30 May, 2022
DepMap Project
Broad and Sanger are both part of the DepMap project.
Sanger
Sanger has a web page for its DepMap
Models
Under this section is a link to the Cell Model
Passports section which
provides a single location where information on Sanger DepMap cell
models is available in a user-friendly environment.
Cell Model Passports has a download
page which provides
Stable
link
that always points to the latest version.
Broad
Broad hosts data for the DepMap project at a dedicated portal:
https://depmap.org/portal/download/
Broad also has a seemingly-related project called the [Cancer Cell Line
Encyclopedia (CCLE)](https://sites.broadinstitute.org/ccle. The CCLE
Datasets page has a
link for an annotated list of cell lines, however, that link is dead.
The link for Processed Data leads to the DepMap download portal.
That portal lists a file called
sample_info.csv
which could very well be the annotated cell line information.
Download Sanger Model List
model_list =
  read_csv("https://cog.sanger.ac.uk/cmp/download/model_list_latest.csv.gz") %>%
  select(
    sanger_model_id = model_id,
    depmap_id = BROAD_ID,
    sanger_sample_id = sample_id,
    sanger_patient_id = patient_id,
    model_type,
    cell_line_name = model_name,
    ccle_id = CCLE_ID,
    tissue,
    cancer_type,
    cancer_subtype = cancer_type_detail,
    sample_site
  )
Rows: 1984 Columns: 51
-- Column specification -----------------------------------------------
Delimiter: ","
chr (36): model_id, model_name, synonyms, model_type, growth_proper...
dbl  (7): pmed, mutational_burden, ploidy, age_at_sampling, samplin...
lgl  (8): mutation_data, methylation_data, expression_data, cnv_dat...
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Download BROAD DepMap “Sample Info”
sample_info =
  read_csv("https://ndownloader.figshare.com/files/35020903") %>%
  select(
    depmap_id = DepMap_ID,
    sanger_model_id = Sanger_Model_ID,
    ccle_id = CCLE_Name,
    cell_line_name,
    stripped_cell_line_name,
    tissue = sample_collection_site,
    cancer_type = primary_disease,
    cancer_subtype = Subtype,
    lineage,
    lineage_subtype
    )
Rows: 1840 Columns: 29
-- Column specification -----------------------------------------------
Delimiter: ","
chr (27): DepMap_ID, cell_line_name, stripped_cell_line_name, CCLE_...
dbl  (2): COSMICID, WTSI_Master_Cell_ID
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Joined
joined  = full_join(
  sample_info,
  model_list,
  by = c("sanger_model_id", "depmap_id"),
  suffix = c("_broad", "_sanger")
)
sorted_colnames =
  colnames(joined) %>%
  sort() %>%
  setdiff(., c("sanger_model_id", "depmap_id")) %>%
  c(c("sanger_model_id", "depmap_id"), .)
joined = joined %>% select(!!sorted_colnames)
Some Counts
joined %>%
  mutate(
    has_depmap_id = !is.na(depmap_id),
    has_sanger_id = !is.na(sanger_model_id)
  ) %>%
  count(has_depmap_id, has_sanger_id) %>%
  kable()
  has_depmap_id   has_sanger_id        n
  --------------- --------------- ------
  FALSE           TRUE               269
  TRUE            FALSE              687
  TRUE            TRUE              1730
joined %>%
  mutate(
    has_depmap_id = !is.na(depmap_id),
    has_sanger_id = !is.na(sanger_model_id)
  ) %>%
  group_by(model_type) %>%
  count(has_depmap_id, has_sanger_id) %>%
  kable()
  model_type   has_depmap_id   has_sanger_id        n
  ------------ --------------- --------------- ------
  Cell Line    FALSE           TRUE               195
  Cell Line    TRUE            TRUE              1715
  Organoid     FALSE           TRUE                74
  NA           TRUE            FALSE              687
  NA           TRUE            TRUE                15
joined %>%
  mutate(
    has_depmap_id = !is.na(depmap_id),
    has_sanger_id = !is.na(sanger_model_id),
    has_ccle_id_broad = !is.na(ccle_id_broad),
    has_ccle_id_sanger = !is.na(ccle_id_sanger),
    has_cell_line_name_broad = !is.na(cell_line_name_broad),
    has_cell_line_name_sanger = !is.na(cell_line_name_sanger)
  ) %>%
  count(has_depmap_id, has_sanger_id, has_ccle_id_broad, has_ccle_id_sanger, has_cell_line_name_broad, has_cell_line_name_sanger) %>%
  arrange(!has_depmap_id, !has_sanger_id, !has_ccle_id_broad, !has_ccle_id_sanger, !has_cell_line_name_broad, !has_cell_line_name_sanger) %>%
  kable()
 has_depmap_id   has_sanger_id   has_ccle_id_broad   has_ccle_id_sanger   has_cell_line_name_broad   has_cell_line_name_sanger        n
 --------------- --------------- ------------------- -------------------- -------------------------- --------------------------- ------
 TRUE            TRUE            TRUE                TRUE                 TRUE                       TRUE                          1108
 TRUE            TRUE            TRUE                TRUE                 FALSE                      TRUE                            28
 TRUE            TRUE            TRUE                FALSE                TRUE                       TRUE                             2
 TRUE            TRUE            TRUE                FALSE                TRUE                       FALSE                           14
 TRUE            TRUE            TRUE                FALSE                FALSE                      FALSE                            1
 TRUE            TRUE            FALSE               TRUE                 FALSE                      TRUE                           575
 TRUE            TRUE            FALSE               FALSE                FALSE                      TRUE                             2
 TRUE            FALSE           TRUE                FALSE                TRUE                       FALSE                          620
 TRUE            FALSE           TRUE                FALSE                FALSE                      FALSE                           63
 TRUE            FALSE           FALSE               FALSE                TRUE                       FALSE                            4
 FALSE           TRUE            FALSE               TRUE                 FALSE                      TRUE                             1
 FALSE           TRUE            FALSE               FALSE                FALSE                      TRUE                           268