Announcing the 25Q2 Release

25Q2 Public DepMap Release Notes

In this release, we have added new CRISPR and Omics data, more WGS data for lines that previously had WES and important updates to our Omics and CRISPR pipelines.

Please read the release note below for more information.

MODEL ANNOTATIONS

  • We’ve added new metadata columns:
    • New Columns
      • Model.csv
        • ModelIDAlias (Freetext): Previously known Model IDs (ACH-XXXXXX) used to reference the model, in cases when models are merged due to possible fingerprint swaps, duplication, etc.
        • PediatricModelType (Boolean): Indicates if this model represents a pediatric cancer subtype based on molecular features rather than age alone.
      • Modified Columns
        • ModelAvailableInDbgap (Controlled Vocab): terms have been updated to reflect the appropriate level of access across databases where sequencing data may be available.

NEW DEPMAP DATA

Standard Release Data

We’ve added 11 new paired WGS/RNA profiles and 5 new genome-wide CRISPR KO screens, including new models of pediatric cancers as part of our Pediatric Cancer Dependency (PedDep) Accelerator initiative (see PedDep.org for details of this collaboration).

WGS sequencing effort

We are releasing the next set of 155 WGS that are now available on the portal. These DepMap WGS will become the default genomic profile for DepMap models in the portal.

CRISPR PIPELINE UPDATES

  • Reprocessing Humagne Readcounts

    • FAST-Q files for all Humagne Hi-Seq sequences were re-processed last release to allow 1 base mismatch to recover reads that were dropped as sequencing quality of Hi-Seq gradually deteriorated. Since reprocessing Humagne Hi-Seq sequences, we have observed a batch effect between sequencing technologies, Hi-Seq and Nova-Seq. To remove possible sources of sequencing technology batch effect, we re-processed FAST-Q files for all Humagne NovaSeq sequences with 1 base pair mismatch allowed. This led to the recovery of over 650 million reads, increasing the total readcounts in Nova-Seq by about 3%. However we suspect that the batch effect between sequencing technologies is still present, which we are working to address. In the meantime, genes which appear in only Humagne/KY library will remain dropped in the following files: CRISPRGeneEffect/Uncorrected.csv, CRISPRGeneDependency.csv, and CRISPRInferredCommonEssential.csv.
  • Changes to Humagne Guides

    • We removed 9 guides from the Humagne library to mitigate the effect of sequencing technology batch effect. We noticed that there are guides that showed abnormally high log-fold change in Nova-Seq sequences compared to Hi-Seq. We discovered that this is caused by the references used for each sequencing technology. Due to biases between gDNA and pDNA, we now use cell lines infected with the Humagne library but without asCas12a to measure the library abundance of sgRNAs. In the Novaseq readcounts for these lines, these nine guides dropped out; however, they appear to be present in asCas12a positive screens, which led to a set of guides that exhibited high log-fold change. We are working on resolving this issue. In the meantime, “UsedByChronos” column is marked as False for the 9 guides in HumageGuideMap.csv.

OMICS PIPELINE UPDATES

We’ve made significant updates across our Omics pipelines.

Mapping files changes

  • OmicsProfiles.csv [master mapping file]

    • A new column “is_default_entry” (Binary) has been added to this mapping file. This is a flag that indicates whether the associated model condition and omics datasets with a model are the default for Depmap portal display for that model_id.
    • For each combination of model_id and data type, there will be 1 row where “is_default_entry” is True and 0 or more rows where “is_default_entry” is False. For instance if a model has two datasets, one with drug treatment and one without, the one without drug treatment will have “is_default_entry” set to True and the other datasets for that model will have “is_default_entry” set to False.
  • OmicsDefaultModelProfiles.csv

    • This file previously indicated which profiles were default for a model. This is now accounted for in the OmicsProfile.csv (as described above) so this file is being retired - please use OmicsProfiles.csv
  • OmicsDefaultModelConditionProfiles.csv

    • This file will be retired in the next release but is still available as a supplemental file for this release.
  • Synchronization of genome annotation files

    • All gene-level Omics outputs will now uniformly use Gencode V38 as annotation.
  • Mutation Pipeline:

    • Minimum depth threshold for variants is now 5 instead of 2. As a result, some variants in previous releases have now been excluded due to having insufficient read support. This has a larger impact on WES-based mutation calls - specifically, variants outside of the exonic targeted regions with less than 5 reads covering the sites. If one wishes to revisit WES-based mutation calls with less read support, they will remain available on the portal under past releases.
    • VEP has been updated from v110 to v113, which includes the latest version of gnomAD (v4.1). This caused the change of inferred somatic/germline status prediction for some variants due to the updated population allele frequency information.
    • We discovered that some variants were previously assigned to incorrect Hugo Symbols. For example, damaging mutations located on one gene would be labeled as “upstream variants” of the gene downstream and consequently filtered out. As a result, VEP is now run with a custom –pick_order flag that enforces gene mapping prioritizing higher impact: “mane_select,mane_plus_clinical,canonical,ccds,biotype,rank”.
    • Variants flagged by Mutect2 as “clustered_events” used to be excluded as QC failures regardless of our rescue criteria. This had resulted in certain cancer-relevant mutations involving genes such as IDH2 being dropped. They can now be overwritten by rescue.
    • The oncogene and tumor suppressor gene lists have been updated to a more recent version of OncoKB (current version downloaded from OncoKB™ - MSK's Precision Oncology Knowledge Base on 2/5/2025).
    • For more details on the DepMap mutation calling and annotation pipeline, please refer to our documentation: https://storage.googleapis.com/shared-portal-files/Tools/25Q2_Mutation_Pipeline_Documentation.pdf.
  • Copy Number Pipeline:

    • Quantification of WGS-derived relative copy number has been upgraded to a more accurate and sensitive pipeline, with gene-level resolution to identify potential focal amplifications and deletions.
    • The improved WGS CN pipeline is not compatible with WES. We therefore strongly recommend not analyzing the WES and WGS datasets together. To make this explicit, we have separated CN output into two sets.
      • The primary table will contain model x gene matrix of copy numbers derived from the new pipeline, and thus will only contain data from WGS
      • WES derived copy number calls (from the older pipeline) will be available as a supplementary file
    • Pure-CN based Absolute CN calling remains unchanged for this release. While a new pipeline is being worked on, absolute copy number values, LoH, ploidy and arm-level gain/loss information will be available only for samples released till 2024 Q4 (available under past releases)
    • Biomart has been updated to version may2021 (corresponding to GENCODE v38).
  • Expression Pipeline:

    • STAR alignment pipeline has been upgraded to version STAR 2.7.11b
    • Quantification of expression is now performed using Salmon version v1.10.0 (Patro et al April 2017).
    • Raw read counts for uniquely aligned reads are now derived from STAR
    • RNA-seq output at the profile level will contain additional columns(model_id) and a flag(is_default_entry) indicating whether the profile is the default for its parent model.
    • OmicsExpressionProteinCodingGenesTPMLogp1.csv will now have one row per model (similar to 24Q4)
    • OmicsExpressionRNASeQCGeneCount.csv is now called OmicsExpressionRawReadCount.csv and contains gene-level counts for uniquely aligned reads output from STAR version 2.7.11b
    • While we continue to release expression datasets run in both stranded and unstranded settings, we are exploring options for detecting and correcting for factors contributing to potential batch effect beyond strandedness. As a result, OmicsExpressionProteinCodingGenesTPMLogp1BatchCorrected.csv will be removed from the release while we explore a more comprehensive batch-correction method.
  • Fusion Pipeline:

    • Fusion quantification has been upgraded to use Arriba version 2.5.0 (Uhrig et al March 2021)
    • Arriba output contains 1 row per unique breakpoint and fusion orientation. The full fusion output table (OmicsFusionFiltered_supplementary.csv) contains the following columns. Depmap-specific columns are annotated with “*” (please refer to 05 Output files · suhrig/arriba Wiki · GitHub for full documentation of Arriba output columns)
    • Fusion output on the portal (OmicsFusionFiltered.csv) will be summarized at a gene fusion level and will
  • The file “OmicsFusionUnfilteredProfile.csv” will no longer be released and all qualified fusions will be present in OmicsFusionFiltered_supplementary.csv

PORTAL UPDATES

  • Context Manager reorganization
    • Context manager has received a reorganization as we prepare to incorporate more metadata into the tool and update it to use consistent nomenclature with the metadata column names that you’ll find in the downloadable files. In this release, you can still find the old names listed under “Legacy Annotation” but the new names for these properties, and match the column names in downloads are listed under “Annotations”.
  • Bug fixes and minor updates:
    • Previously, the “common essential” badge shown on the gene page was based on a separate implementation of the same methodology that Chronos is using. However, due to small differences in implementation, some genes were inconsistent with the set of genes that Chronos reported. The portal has been updated to use the set of genes Chronos reports instead of recomputing the common essential genes.
    • Custom analysis previously reported an internal error if there were no overlaps between the in or out groups and the dataset interrogated. This has been fixed and replaced with an error message explaining the specified groups cannot be used with the dataset due to lack of overlap.