Omics Characterization and Pipelines

DepMap provides comprehensive genomic characterizations of human cancer cell lines, offering a wealth of data for understanding cancer biology. Datasets include detailed information on gene expression, copy number alterations, mutations and fusions, enabling researchers to explore the genetic vulnerabilities and potential therapeutic targets across genetically diverse cancer types.

With the Broad Institute’s Broad Clinical Labs, DepMap sequences cancer models and processes genomic data using published pipelines to generate high quality Omics data.

Sequencing

  • 30X short-read WGS
  • 50X WES
  • 100X short-read RNA

Alignment/ Processing

  • All data are aligned to hg38
  • Initial check for file size and read count quality
  • Germline structural variants are filtered out using gnomAD v4.1
  • STR and SNP profiles confirmed

Standard Data Generation

Mutation pipeline

Mutation calls are generated using Mutect2 and annotated and filtered downstream. Variants are aligned to hg38.

For detailed documentation, see: https://storage.googleapis.com/shared-portal-files/Tools/26Q1_Mutation_Pipeline_Documentation.pdf.

Copy Number pipelines

Relative Copy Number data from WGS is generated from 1000bp bins and an HMM-based caller. See the workflow and R script on Github for more information.

Legacy data for relative Copy Number data from WES is generated by running the GATK copy number pipeline described here: https://software.broadinstitute.org/gatk/documentation/article?id=11682, https://software.broadinstitute.org/gatk/documentation/article?id=11683.

Absolute Copy Number data from WGS/WES is generated using PureCN. Detailed documentation can be found here:
(PureCN: copy number calling and SNV classification using targeted short read sequencing | Source Code for Biology and Medicine | Springer Nature Link).

Please note that we now have separate datasets for WGS and WES absolute copy number data, both generated from the legacy GATK-based copy number caller described above for WES. Absolute copy numbers are not currently being generated for new WGS data while we evaluate alternatives to PureCN.

Expression pipeline

RNASeq expression data for genes and transcripts(TPMs) is quantified using Salmon version v1.10.0 (Patro et al April 2017).

Raw counts are quantified by STAR v2.7.11b.

For code and configuration details, see: depmap-omics-rna/workflows/quantify_sr_rna at main · broadinstitute/depmap-omics-rna · GitHub.

Based on Gencode v38.

Fusion pipeline

RNAseq based fusion calls are generated using Arriba 2.4.0 (Uhrig et al March 2021).

For code and configuration details, see depmap-omics-rna/workflows/call_sr_rna_fusions at main · broadinstitute/depmap-omics-rna · GitHub.

Based on Gencode v38.