Need File with gene size or FPKM for CCLE RNA expression

mjfreder · July 13, 2023, 10:59pm

I would like to find the CCLE RNA expression file that has either effective gene sizes or FPKM /RPKM (where estimated RSEM values have been used) to do our own normalizations for CCLE gene expression. I don’t like the way the TPM protein coding RNA files have been generated by taking the larger TPM files for 53,000+ analytes and simply extracting values as is for the subset of protein coding genes. RSEM reads should first be filtered for only protein coding genes and TPM should have then been recalculated for protein coding genes, which would give a different result where all the protein coding gene TPMs from each sample would then add up to the same value of 1 million. To me it looks like this may not have been done properly. Therefore, I would like to perform my own data normalization only using protein coding genes. I can see a gene count and RPKM file under CCLE 2019 but the gene counts are not RSEM expected values and it is unclear if RPKM was calculated with effective or constant gene sizes and using RSEM or just the gene counts file.

Alvin_Qin · July 31, 2023, 9:04pm

Hi mjfreder,
Thank you for your feedbacks!
We haven’t released the effective gene sizes (or FPKM) from the RSEM output to the DepMap portal yet, we plan to generate these output by two cell line by gene tables in the next release.
For our latest release data (23Q2), we have both genes and transcripts expected read count (OmicsExpressionGenesExpectedCountProfile.csv, OmicsExpressionTranscriptsExpectedCountProfile.csv) and TPM (OmicsExpressionTranscriptsTPMLogp1Profile.csv) scores which are normalized with the effective gene sizes.
Best,
Alvin

mjfreder · August 3, 2023, 4:02pm

Thank you Alvin for your reply and clarification. I wonder if you can also address the issue of how the TPM are normalized because it seemed to me like all RNA species (~50,000) rather than just protein coding genes were used in that normalization because when I sum the file that is the only way the sums come out to 100%. In the absence of gene sizes, it is near impossible to normalize the protein coding genes from the released expected count data file. I have seen lots of posts on biostars and here where people want to do their own normalizations, and given the caveat with the TPM normalization-- I think this should be a very high priority to release the needed data already. This is an awesome database that thousands of scientists use frequently. What are your thoughts on expediting this release or making at least the gene sizes available somewhere else in the meantime? We just need the expected gene sizes. Thank you.
Best,
Mitch

Alvin_Qin · August 8, 2023, 3:59pm

Hi Mitch,
We have an ongoing biannual release each year, the next release will be around middle September, we will plan to include those data in the release. Thank you for your feedback!
Best,
Alvin

mjfreder · September 25, 2024, 5:32pm

I was glad to see we can now find the effective gene sizes corresponding to the unstranded raw count data, but where can we find the gene sizes for the stranded RNA data? Is there a place to find the effective gene sizes from the stranded data set as well. There are roughly 528 cell lines with stranded RNA and about 512 only have stranded RNA counts. Again, we only see TPM and not FPKM, so it is impossible to impute back their gene sizes. If you could please release either the gene sizes for this data set as well, or at least the FPKM data (then gene sizes can be be reverse imputed) it would be greatly appreciated. Thanks

Topic		Replies	Views
Normalization of DepMap CCLE RNAseq Public files Q&A	7	699	August 26, 2022
How can I download gene length data Q&A	1	305	May 25, 2023
Missing genes in TOPMed gencode30 gtf file Q&A data	5	1010	April 26, 2021
CCLE gene level count data Q&A omics , data	4	1208	August 14, 2020
Do the values in CCLE_RNAseq_reads_v2.csv represent raw count data or normalized count data? I observed the presence of floating-point numbers within the dataset Q&A	1	229	October 25, 2023

Need File with gene size or FPKM for CCLE RNA expression

Related topics