Announcing the 24Q4 Release

24Q4 Public DepMap Release Notes

We’re excited to announce new and expanded datasets available in the 24Q4 DepMap Release! We’ve added 35 new genome-wide CRISPR screens in this release, 98 new model Omics, as well as started an effort to resequence CRISPR screened cell lines with WGS to complete our dataset.

This release also contains important updates to re-processing of Humagne screens and batching so please read the CRISPR Pipeline Updates carefully.

Additionally, DepMap data is now available on dbGaP!: phs003444.v1.p1


MODEL ANNOTATIONS

  • We’ve added new metadata columns!
    • New Columns
      • Model.csv
        • PatientSubtypeFeatures [text]: Aggregated features known for the patient tumor
        • ModelSubtypeFeatures [text]: Curated list of confirmed molecular features seen in the model
        • ModelType [controlled vocab]: Type of model at onboarding based on model derivation technique (e.g. Cell Line, Organoid)
        • ModelTreatment [controlled vocab]: Indicates how a cell line was transformed, if applicable (e.g. SV40, EBV, etc.)
        • SerumFreeMedia [boolean]: Indicates a non-serum based media
        • EngineeredModelDetails [text]: Detailed information for genetic knockdown/out models
        • CulturedResistanceDrug [controlled vocab]: Drug of resistance used for cultured to resistance models
        • ModelAvailableInDbgap [controlled vocab]: Indicates the availability of data for a Model on DbGaP. Refer to the “SharedToDbgap” column in the OmicsProfiles.csv for specific Omics Profile data available
      • ModelCondition.csv
        • SerumFreeMedia [boolean]: Indicates a non-serum based media
  • We updated our annotations for 12 matched normal models to reflect lineage and sample collection more precisely.
    • For matched normal cell lines where DepMapModelType was previously ‘ZMNOR’, we have updated the annotation to ‘ZMNORBL’, and SampleCollectionSite was previously ‘matched_normal_tissue’, have been updated to ‘B lymphoblastoid cells’ to help better categorize these models.
      • The models updated in this release include: ACH-002384, ACH-002382, ACH-002378, ACH-002376, ACH-002375, ACH-002374, ACH-002372, ACH-002370, ACH-002368, ACH-002367, ACH-002362, ACH-002357.

NEW DEPMAP DATA

Standard Release Data

We’ve added 98 new paired WGS/RNA profiles and 35 genome-wide CRISPR KO screens, including:

  • 9 new adult models including a NUT midline carcinoma, a Glioblastoma and a Lung Neuroendocrine Tumor.
  • 26 new models of pediatric cancers as part of our Pediatric Cancer Dependency (PedDep) Accelerator initiative (see PedDep.org for details of this collaboration).

WGS sequencing effort

Additionally, we have started an effort to sequence full genomes for models with CRISPR screens. We are releasing the first set of 281 new WGS to the portal. These DepMap WGS will become the default genomic profile for DepMap models in the portal. To identify which profiles are default, use the OmicsDefaultModelProfiles.csv and OmicsDefaultModelConditionProfiles.csv mapping files in the current release dataset.

CRISPR PIPELINE UPDATES

Reprocessing Humagne Readcounts

We discovered that over the last year, the sequencing quality of Hi-Seq gradually deteriorated. This led to low recovery of reads for many guides (particularly those starting with base A, see figure below), and as a result erroneously low gene effect estimates for several genes across several screens. This drove a large number of false co-dependencies, particularly with genes that only have guides in the Humagne-CD library. Since this effect seemed largely due to single-nt sequencing errors, we reprocessed the FAST-Q files for all Humagne Hi-Seq sequences with 1 base pair mismatch allowed in order to try to recover dropped reads and fix this issue. This “fuzzy” matching strategy resulted in the recovery of over 1 billion reads, increasing the total readcounts in Hi-seq by about 10% and led to improvement in all quality metrics for these screens. As expected, this introduced a large change in the correlation structure for many genes in the Humagne data, especially those with little or no viability signal.

Change to SequenceMaxCorr Metric

The purpose of the SequenceMaxCorrelation quality control metric is to ensure biological replicates within a screen exhibit similar signals in high variance genes, indicating reproducibility of results as well as enabling detection of sample swaps. However, we found many cases where sequences scored highly due to having highest correlation with a sequence representing the same replicate at a different timepoint. As this does not reflect consistency between biological replicates, we have updated this metric to only consider correlation between sequences from different replicates within a screen.

However, this does not result in any QC changes.

pDNA batch metadata updates

  1. We corrected the pDNA batch (day 0 reference) for one Humagne-CD screen (NRHUPS2). The previous incorrect pDNA reference led to erroneously low gene effects for numerous genes.

  2. The following screens and their corresponding sequence IDs have been updated:

  • SC-003180.CD01 (NRHUPS2)
    • SC-003180.CD01_CD-HiSeq_21_A
      * Previous Sequence ID: SC-003180.CD01_CD-NovaSeq_21_
    • SC-003180.CD01_CD-HiSeq_21_B
      * Previous Sequence ID: SC-003180.CD01_CD-NovaSeq_21_B
  1. This update will be reflected in the ScreenSequenceMap.csv.

OMICS PIPELINE UPDATES

Pipeline updates

  • New Columns
    • We have added three new columns in OmicsSomaticMutations and OmicsSomaticMutationsProfile:
      • Intron [string]: intron number
      • Exon [string]: exon number
      • RescueReason [string]: supporting evidence for why each variant is rescued
    • Profile level dbGaP indication column in OmicsProfiles:
      • SharedTodbGaP [boolean]: Indicate whether a specific profile is available in dbGaP
  • Pipeline changes
    • We have updated our strategy to generate relative and absolute gene-level copy number matrices (OmicsCNGene and OmicsAbsoluteCNGene) from segment-level data. Genomic coordinates are first mapped to ENSG IDs using biomart (nov2020) and to Hugo symbols and Entrez IDs using Gene.csv.

New Files

  • Gene.csv file
    • This file contains the metadata about all genes which are loaded into the DepMap portal’s database. Most pipelines will now use this table for mapping between various gene and protein identifiers as we work to get them all switched over. This file will be used to ensure all files have consistent gene symbols and other gene identifiers.
    • This table was taken from the quarterly export the gene information published by HGNC and downloaded from HGNC Quarterly tab separated data archive | HUGO Gene Nomenclature Committee
  • Microsatellite Repeats
    • OmicsMicrosatelliteRepeatsProfile contains the weighted mean of the number of motif repeats predicted by MSIsensor2 at each microsatellite site.

PORTAL UPDATES

We’ve launched a new Resources page! We have curated some DepMap resources to get you started. We will continue to build out DepMap resources in future releases. As you explore the new resources page, we welcome any feedback!

We’ve added a new feature to the Downloads page: For each dataset listed in the chart on the Downloads Overview page, you can now select the dataset to see a breakdown of the cancer lineages and subtypes available within that dataset:

Proteomics dataset alignment

  • Harmonized proteomics datasets available in Data Explorer 2 by mapping features to uniprot IDs. Now, all proteomics datasets are indexed by the same identifiers within the portal.

Changes to datasets in Data Explorer 2

  • We have removed “Copy Number (Absolute)” (was CN calls using the ABSOLUTE method, generated on the original CCLE lines) and in its place added “Omics Absolute CN Gene Internal 24Q4,” which contains CN calls for additional lines in DepMap. (This dataset is computed by PureCN instead of ABSOLUTE)
  • “Aneuploidy” under CN has been removed as an ongoing updated version exists. New aneuploidy scores can be found under “Global genomics” as a feature in the “Omics Signatures” dataset.
  • “ssGSEA Public 2024” will continue to be available in Data Explorer, but this dataset is no longer being updated with new models. We are working to bring this dataset back with an updated method.

Changes to Repurposing screen organization in Downloads

  • Previously, PRISM’s Repurposing Screen datasets were organized by release date under “Drug Screens” in the downloads section. We’ve now grouped these into a pair of entries “PRISM Repurposing Primary Screen” and “PRISM Repurposing Secondary Screen”. Once you select either the Primary or Secondary screen, you’ll be able to choose which version you’re interested in:

Previously:

Now: