How are the gene counts obtained for the mRNA expression datasets of DepMap 22Q2?

The “Expression_22Q2_Public.csv” file under “Custom downloads” tabs seems to have the same info as the “CCLE_expression.csv” file from the “File Downloads” tab under DepMap Public 22Q2 All Files (which does not show the Expression_22Q2_Public btw). From inspecting the counts, these seem to be normalized counts, presumably adjusted for gene length and GC content? How can we find info on what exactly are these counts, are they logTPM or logRPKM? And what is the difference between CCLE_expression_full.csv vs CCLE_expression.csv?

There are other datasets to download, such as “CCLE_RNAseq_reads.csv”. These seem to be non-log transformed counts, are these RPKM/TPM/other? In general, where can we find information about what kind of reads are these, vs the other datasets, and what normalization steps they underwent, if any etc.

Can we also access the raw count integer values for these CCLE_RNAseq/expression files? I do not see any under the 42 files of DepMap 22Q2. Let me know if I missed an obvious FAQ or infosheet or similar regarding nature of these mRNA expression datasets and differences between them.

Hello mtruica,

Clicking on the filename on the download page, you should be able to see some more information about the file itself. Moreover the full pipeline to process the data is available on our github: GitHub - broadinstitute/depmap_omics: What you need to process the Quarterly DepMap-Omics releases from Terra with tags for different releases.

Hope this helps and let me know if you have further questions.
Best,

Hi, I have the same question. What is “Expression Public XXQX” on the “Custom Downloads” page? Is there any information about how this file was created?