precomputed correlations features was ruined in the last release

gabesimon · December 21, 2025, 1:47pm

In the recent 25Q3 release many new genes were added that only appear in ~50 cell-lines. These new genes have corrupted the precomputed correlations feature in the data explorer since pearson correlations are being compared to the existing genes with data across 1,100 cell lines and the new genes with less data get artefacturally better scores. For example, when I look at the profile for CNOT9, the top ~30-40 correlated genes have very sparse data across ~50 cell-lines (e.g. NCAM1, SHOX, PCNX4, etc) and the correlations are poor. I have to click on each one going down the list until I get to the first real correlation with RNASEK.

New data for more genes is great. But this precomputed correlation feature should be revised to either correct the correlation scores somehow or remove the sparse-data genes from the analysis. It used to be a great feature and now it’s very broken.

Thanks

pmontgom · January 8, 2026, 3:15pm

Hello, I believe you’re correct in your diagnosis of the problem and this morning, we’ve rolled out a change which filters out the genes which had coverage in the ~50 cell lines from the correlation analysis.

(See also @CRISPR co-depency top hits obscured by newly added screens for more information on this change)

Thanks,

Phil

Topic		Replies	Views
CRISPR co-depency top hits obscured by newly added screens Q&A	9	143	January 8, 2026
Breast cell lines, drug sensitivity Pearson correlation by gene Q&A	0	231	August 30, 2023
Missing Expression data? Issues and Bugs	2	418	June 21, 2021
Detecting Co-Dependencies via RNAi data Q&A	1	558	October 25, 2022
Pre-computed Pearson correlation coefficients by disease? Q&A	0	245	August 23, 2023

precomputed correlations features was ruined in the last release

Related topics