OVERVIEW
We’re excited to release new Omics and CRISPR data to the DepMap portal, as well as updated portal tools, like Data Explorer 2, and a new data hub where you can learn more about DepMap’s data structure and how to map data files.
Citing DepMap Datasets
As DepMap data grows, we want to ensure that the community is aware of how best to cite data found in the public DepMap portal. Data in the DepMap portal is available for the community to use. The DepMap team does not need to be included as authors should you seek to publish on the data, but we do ask that you use the following citation for citing current DepMap Release data, including CRISPR Screens, PRISM Drug Screens, Copy Number, Mutation, Expression, and Fusions:
DepMap, Broad (2024). DepMap 24Q2 Public. Figshare+. Dataset. DepMap 24Q2 Public
There are also other datasets available in the DepMap portal. You can find the appropriate citation to use for each dataset in the dropdown on the All Data Downloads page.
Take a look at our new data hub for more information.
Connect With Us on Social Media!
Follow us on X ( formerly Twitter) or LinkedIn for DepMap updates.This is also where you can find information about our bi-annual releases, workshops and other things we have ongoing at DepMap.
- X (Twitter): @CancerDepMap
- Linkedin:
Subscribe to our Forum to provide feedback, get your questions answered and learn more about updates to our pipelines.
- Subscribe to the DepMap Community Forum
We want to engage with the DepMap community on all things data and portal related!
Resources
We are working hard to update our portal resources. Keep checking back with us for new tutorials and videos.
Recent Publications
Qin Q, Popic V, Yu H, White E, Khorgade A, Shin A, Wienand K, Dondi A, Beerenwinkel N, Vazquez F, Al’Khafaji AM, Haas BJ. CTAT-LR-fusion: accurate fusion transcript identification from long and short read isoform sequencing at bulk or single cell resolution. bioRxiv [Preprint]. 2024 Feb 28:2024.02.24.581862. doi: 10.1101/2024.02.24.581862. PMID: 38464114; PMCID: PMC10925146.
de Matos Simoes R, Shirasaki R, Downey-Kopyscinski SL, Matthews GM, Barwick BG, Gupta VA, Dupéré-Richer D, Yamano S, Hu Y, Sheffer M, Dhimolea E, Dashevsky O, Gandolfi S, Ishiguro K, Meyers RM, Bryan JG, Dharia NV, Hengeveld PJ, Brüggenthies JB, Tang H, Aguirre AJ, Sievers QL, Ebert BL, Glassner BJ, Ott CJ, Bradner JE, Kwiatkowski NP, Auclair D, Levy J, Keats JJ, Groen RWJ, Gray NS, Culhane AC, McFarland JM, Dempster JM, Licht JD, Boise LH, Hahn WC, Vazquez F, Tsherniak A, Mitsiades CS. Genome-scale functional genomics identify genes preferentially essential for multiple myeloma cells compared to other neoplasias. Nat Cancer. 2023 May;4(5):754-773. doi: 10.1038/s43018-023-00550-x. Epub 2023 May 26. PMID: 37237081; PMCID: PMC10918623.
Continue reading the release notes below to find:
- New Data
- PRISM Updates
- Portal Updates
- Pipeline Updates
- CRISPR pipeline updates
- Omics pipeline updates
NEW DATA
We’re excited to bring you 26 Standard CRISPR KO screens and 39 new Omics profiles for 39 new models in the 24Q2 DepMap release!
We are also releasing some new genomic features you’ll find in the following files:
- PureCN absolute copy number data:
- OmicsAbsoluteCNGene.csv: Gene-level absolute copy number matrix, indexed by ModelID
- OmicsAbsoluteCNSegmentsProfile.csv: segment-level absolute copy number, indexed by ProfileID
- OmicsLoH.csv: gene-level Loss of Heterozygosity (LoH) status matrix, indexed by ModelID
- OmicsSignatures.csv (indexed by ModelID) and OmicsSignaturesProfile.csv (indexed by ProfileID) containing a selection of genomic signatures, including:
- MSIScore (float, from MSISensor2)
- Ploidy (float, from PureCN)
- CIN (Chromosomal Instability; float, from PureCN)
- WGD (Whole Genome Doubling status; binary, 1 indicates presence of WGS, 0 otherwise, from PureCN)
- LoHFraction (float, from PureCN)
- Aneuploidy (int, method from Aneuploidy renders cancer cells vulnerable to mitotic checkpoint inhibition | Nature)
DATA EXPLORER 2.0
We’re thrilled to bring you a big update to the way you can visualize data in the portal: Data Explorer 2.0!
Data Explorer 2.0 was built from the ground up as a replacement for our long standing Data Explorer application, taking lessons we’ve learned from the original. This new version of Data Explorer has a more polished interface and many more plotting and configuration options.
In addition to Data Explorer 1’s plotting features across models in violin and scatter plots, Data Explorer has support for creating waterfall plots and correlation heatmaps. Most importantly, Data Explorer 2 now lets you make comparisons between features between contexts.
The original Data Explorer application will be removed at some point in the future, but for the time being, you can still access it by clicking the “Go back to the original” button in the top when you first open Data Explorer. Also, if you have created a plot in the original data explorer, you can see how to generate the same plot in Data Explorer 2 by clicking the “Open in data explorer 2” button.
To help ease people into the new user interface and illustrate the new capabilities, we have provided several examples of different types of plots you can generate, which are shown in the center section of the UI before you’ve made any selections. These examples are all clickable, which when selected, will populate the Data Explorer’s controls with the settings to generate the corresponding figure. From there, feel free to experiment with the options and get a sense of how to use the new UI.
Some of the key new functionality in Data Explorer is the ability to define “contexts” (most commonly cohorts of cell lines with some shared properties) and aggregate some measurements across those models for use in plots. To make it easier to define these cohorts, this release also includes a new tool named Context Manager which allows you to define rules for determining which models are part of the context.
Prior to Context Manager, you could use the Cell Line Selector tool to define contexts. You can still do so, but now Cell Line Selector is embedded within Context Manager. After opening Context Manager, click the “Create new with Cell line selector” button to use the Cell Line Selector UI to define a context.
Check out our resources over the upcoming weeks to learn more.
PRISM UPDATES
This PRISM Repurposing dataset now contains two screens: Repurposing-1M and Repurposing-300! All Repurposing-1M [REP1M] and Repurposing-300 [REP300] compounds (1522 = 1280 REP1M + 242 REP300) were screened in the PRISM assay at a dose of 2.5 μM with a 5-day treatment against 906 cancer cell lines (859 of them passed quality checks -QC- for all tested compound with two high quality replicates). Two PRISM cell line collections were used in the assay: PR500A, which includes only adherent cell lines, and PR500B, which has adherent and suspension cell lines. Together, these PRISM cell line collections form the PR1000 cell line collection. All compounds were run in triplicate, and each plate contained positive (Bortezomib, 20μM) and negative (DMSO) controls. The screen can be considered an extension of the PRISM Repurposing Primary Screen, with PR500A mainly covering the existing cell line panel, while PR500B extends the cell line collection with new subtypes and lineages.
For the assay details, please refer to Corsello et al., 2020 (Discovering the anticancer potential of non-oncology drugs by systematic viability profiling | Nature Cancer) and https://www.theprismlab.org.
PORTAL UPDATES
We’ve launched a new centralized “Downloads” portal page that serves as a one-stop for all things DepMap data. Here, you’ll find descriptions of all data contained in the portal, the Release data structure including file mapping, and an easier navigation format for finding files and file descriptions.
PIPELINE UPDATES
CRISPR Pipeline Updates
-
For the Humagne library, we found that some guides have a substantial difference in relative abundance when sequenced directly from plasmid pools vs after integration into cells. As a result, we infected three cell lines with the library pool but without enCas12 and are now using these as the “pDNA” reference for Humagne screens. This change was made in place with no impact on the format of corresponding data files. We see improvement in screen quality by all metrics.
-
A portion of the Humagne screens are now being screened with a split library, meaning the set C and D guides were sequenced separately. A new column has been added to AchillesSequenceQCReport.csv, ”SequenceFracReadsFromOtherLibrarySubset,” to report the fraction of reads originating from the other library than what is annotated. For example, if a sequence screened with the set C library has >10% of its reads map to the set D library, it is likely contaminated and will be excluded from the dataset. This leads to changes in QC status for 2 screens:
-
We have refined our sequence correlation QC metric to better measure whether two replicates are indicating the same biology and to detect potential contamination. These QC metrics (replicate sequence correlation and random sequence correlation) have been computed using the residual Log2-fold Change (LFC). Residual LFC is calculated for each library and replicate by creating a linear model of the replicate’s LFC of high variance genes as a function of the mean LFC of high variance genes across replicates. Only sequences that show strong control separation and pass the Null-Normalized Median Difference (NNMD) threshold are considered when calculating the residual LFC. The residuals of the model are used for correlating replicates. This step is used to account for mean essentiality and screen quality bias when calculating the correlation between sequences. Additionally, we now compute the same correlation with all other replicates. Cases of unexpectedly large correlation indicate a potential contamination or sample swap.
- AchillesSequenceQCReport.csv has been updated:
- “SequenceMaxCorrelation” (changed) reports the maximum residual LFC correlation between replicate sequences. This metric indicates how well replicate sequences agree with each other after accounting for mean essentiality and screen quality bias.
- “UnexpectedHighCorrelationPartners” and “UnexpectedHighCorrelation” (new columns) report highly correlated random sequences and their correlation coefficient. The two metrics are indicative of potential sample swaps. Sequences reported in these columns do not share the same ScreenID or DepMapModelType (cancer lineage).
- Changes in screen QC status did not cause drastic changes in gene effect in CRISPRGeneEffect.csv.
Omics Pipeline Updates
-
Copy Number
-
Previously, we masked all genes in the CNV matrices (relative and absolute) that overlap with segmentally duplicated/repeatMaster flagged regions. To avoid dropping cancer-relevant genes, we are now rescuing all genes on OncoKB’s oncogene list even though they might fall into low-mapping quality regions.
-
To be consistent with absolute copy number data, the relative copy number matrix is no longer log2 transformed.
-
-
Mutation
- AlphaMissense has been added to our small variant annotation pipeline resulting in the addition of two columns: “AMClass” (categorical) and “AMPathogenicity” (float).
- Two TERT promoter mutations, C228T and C250T, have been added to our list of hotspots.
- “Hotspot” is now a boolean column in the mutation table.
- Pipeline documentation (for reference)
-
Expression
-
We have added a new gene-level read count matrix generated using RNASeQC2 as a downloadable file: OmicsExpressionRNASeQCGeneCountProfile.csv.
-
DepMap’s RNAseq data contains a mixture of stranded and non-stranded sequencing protocols generated over a 10+ year long period. In past releases, samples were processed uniformly with the non-stranded setting in RSEM, regardless of the sequencing method. Starting with the 24Q2 release, the stranded samples have also been processed with the stranded setting in RSEM and the stranded expression data for this subset will be released as an additional downloadable file while non-stranded samples keep using the unstranded setting in RSEM. To determine whether an RNAseq profile was run using the stranded protocol, you can refer to the “Stranded” column in OmicsProfiles.csv.
For this release the primary file has not changed, but you will have the option to load the preliminarily batch-corrected gene expression matrix done using ComBat on the log2(TPM+1) transformed expression.
We provide a full detailed analysis here. We aim to replace the main expression matrix with a batch-corrected version before the 24Q4 release.
24q2 Strandedness review.pdf (1.4 MB)
-