Clarification on Negative Values in Log-Transformed Gene Expression Data

I am working with the DepMap expression datasets, and I have a question regarding the log-transformed values in the OmicsExpressionProteinCodingGenesTPMLogp1.csv file.

As per the dataset documentation, the expression values are inferred using RSEM (unstranded mode) and reported after log2 transformation with a pseudo-count of 1. However, I have noticed that some of the log-transformed gene expression values are negative, which seems inconsistent with typical log transformation expectations.

In standard log2 transformations, values of TPM > 0 should generally yield non-negative results, particularly when a pseudo-count is added. Could you kindly clarify why negative values appear in the log-transformed expression data? Are there any additional steps or adjustments performed during the transformation that would explain this?

I would greatly appreciate any insights you can provide on this matter.

Thank you for your time and assistance.

Hi,

Please correct me if I’m missing something, but I don’t see any negative values in OmicsExpressionProteinCodingGenesTPMLogp1.csv. However, there are negative values in OmicsExpressionProteinCodingGenesTPMLogp1BatchCorrected.csv, which is our batch-corrected expression matrix.

We are aware that the negative values are an undesirable byproduct of the batch-correcting tool, COMBAT, and we are in the process of evaluating other batch correction tools to address this issue. In the meantime, please refer to the non-batch corrected expression matrix (OmicsExpressionProteinCodingGenesTPMLogp1) if needed.

Thanks!
Simone