I want to use the DepMap Public CCLE RNAseq files. One type of file is CCLE_RNAseq_reads.CSV. I understand this to be raw not normalized reads. Is this correct?
The other type of file is CCLE_RNAseq_transcripts.CSV. I understand this data to be in TPM and to be normalized for read depth and gene length. Is this a correct understanding?
Is it valid to compare gene counts from one cell line with another cell line in the CCLE_RNAseq_transcripts.CSV data without further normalization?
I am wondering kind of the same. I want to download the data from CCLE_expression_transcripts_expected_count.csv which are “RNAseq read count data from RSEM”. However, I have not seen an explanation of that in the Nature 2019 paper, since RSEM (version 1.2.22) was only used for quantification of isoform-level expression in TPM (transcripts per million).
Do you know how the count data from RSEM were generated or where to find it? I mean, are they, for example, upper quartile normalized RSEM data?
I am confused because when I click on the file, it says gene-level counts were obtained with RSEM; however, in the Ghandi et al. Nature 2019 Supplemental materials (where I think these data came from), it says gene-level counts were obtained as follows “Gene level RPKM and read count values were calculated using RNA-SeQC.”
I think this is where you got confused.
The data does not come from the CCLE2 paper (except if you have selected the CCLE 2019 dataset). Since then the pipeline got many iterative updates and the processing is quite different than what was mentioned in the 2019 paper.
Can you share then the pipeline in terms of steps you followed to reach those gene-level counts?
I am trying to compare these data with others that are upper quantile normalized RSEM and would like to know the way you obtained the CCLE_expression_transcripts_expected_count.csv