I downloaded first the log TPM values and plotted them for a specific gene and found that it is expressed. Then, I have to use another normalization method and downloaded the raw counts. Checking the raw counts of this exact same gene in the same cell line, I found it zero.
How can this be explained?
the files I downloaded: OmicsExpressionTPMLogp1HumanAllGenes.csv and OmicsExpressionRawReadCountHumanAllGenesStranded.csv.
In my opinion, this is because the expression data was calculated using Salmon 1.10.0 (please refer to the 25Q2 release note), which employs a probabilistic model. But since I explained it based on the latest version of DepMap data, it would be better to clarify the version of the dataset.
I checked the release notes for the current and previous releases, I could not see a clear explanation on how the TPM values were generated. In the current release notes, they state only how the read counts were generated; they used STAR. Do you think TPM values are generated based on Salmon quantification?
As of 25q2, our TPM values were generated using Salmon. The discrepancy you are seeing might be because Salmon allocates ambiguous reads using EM, while STAR raw counts ignores them since it only uses uniquely mapping reads.
We are happy to look into this further if you’d be willing to share some of the genes that show this kind of behavior.