Normalization of DepMap CCLE RNAseq Public files

Chess · August 7, 2022, 11:07pm

I want to use the DepMap Public CCLE RNAseq files. One type of file is CCLE_RNAseq_reads.CSV. I understand this to be raw not normalized reads. Is this correct?

The other type of file is CCLE_RNAseq_transcripts.CSV. I understand this data to be in TPM and to be normalized for read depth and gene length. Is this a correct understanding?

Is it valid to compare gene counts from one cell line with another cell line in the CCLE_RNAseq_transcripts.CSV data without further normalization?

Thanks very much.

jkobject · August 22, 2022, 1:25pm

Hi Chess,

By clicking on the file you should see a documentation pop up.

Best,

Pedro_Sanchez · August 23, 2022, 3:42pm

Hi Jeremie,

I am wondering kind of the same. I want to download the data from CCLE_expression_transcripts_expected_count.csv which are “RNAseq read count data from RSEM”. However, I have not seen an explanation of that in the Nature 2019 paper, since RSEM (version 1.2.22) was only used for quantification of isoform-level expression in TPM (transcripts per million).

Do you know how the count data from RSEM were generated or where to find it? I mean, are they, for example, upper quartile normalized RSEM data?

Thank you!

Pedro

Chess · August 25, 2022, 9:47am

Hi Jkobject,

I am confused because when I click on the file, it says gene-level counts were obtained with RSEM; however, in the Ghandi et al. Nature 2019 Supplemental materials (where I think these data came from), it says gene-level counts were obtained as follows “Gene level RPKM and read count values were calculated using RNA-SeQC.”

jkobject · August 25, 2022, 1:34pm

I think this is where you got confused.
The data does not come from the CCLE2 paper (except if you have selected the CCLE 2019 dataset). Since then the pipeline got many iterative updates and the processing is quite different than what was mentioned in the 2019 paper.

Hope it helps,

Chess · August 25, 2022, 9:03pm

OK, thanks.

~WRD0000.jpg

Pedro_Sanchez · August 26, 2022, 2:46pm

Thanks for the explanation, Jeremie!

Can you share then the pipeline in terms of steps you followed to reach those gene-level counts?

I am trying to compare these data with others that are upper quantile normalized RSEM and would like to know the way you obtained the CCLE_expression_transcripts_expected_count.csv

Thanks a lot!

Pedro

jkobject · August 26, 2022, 3:30pm

Hi,

It is and will always be in

Hope it helps,

Topic		Replies	Views
Do the values in CCLE_RNAseq_reads_v2.csv represent raw count data or normalized count data? I observed the presence of floating-point numbers within the dataset Q&A	1	229	October 25, 2023
How are the gene counts obtained for the mRNA expression datasets of DepMap 22Q2? Q&A omics , data	2	1032	February 4, 2023
CCLE gene level count data Q&A omics , data	4	1208	August 14, 2020
CCLE_RNAseq_count data for Deseq2? Q&A	0	409	August 22, 2022
Questions about downloaded data files from depmap Q&A	2	775	September 30, 2020

Normalization of DepMap CCLE RNAseq Public files

Related topics