Confounders in the Predictability Tab

grands06 · November 11, 2022, 9:03pm

Hello,
I want to understand whether in the predictability tab you are accounting for known confounders when describing correlations with other feature types. For example, if geneA’s RNAi reveals that top feature (highest importance score) to be SSMD (a known confounder metric), can we assume that the importance score of the next best gene hit (like expression / copy number of geneB) in the list is independent of the effect of SSMD on RNAi sensitivity to geneA?
Thanks in advance!

pmontgom · November 14, 2022, 1:37pm

No, I don’t think you can make a statement as strong as, “the second most important feature is independent from the most important feature”.

The models are random forest models, and I don’t believe that property holds true. For example, if you have two correlated features, it’s not uncommon to see the feature importance distributed across the two features.

The random forest method is training a “forest” of decision trees, each on a random subset of the data, so in practice, one of the the two features will be strongest in some subsets and the other strongest in other subsets. The importances we report are averaged across all trees, so it has the effect of dividing the weight onto the the features.

I can imagine this property could arise in some situations, but in general, I don’t think you can assume that’s the case.

Thanks,
Phil

Topic		Replies	Views
Predictability - direction of effect Q&A omics	2	997	November 3, 2023
Co-dependency list for essential genes Q&A	2	1080	June 29, 2021
Predictability features in cell line subsets Q&A	1	288	May 31, 2022
Random forest code Q&A data , documentation	2	66	July 1, 2025
How are top 100 co-dependencies in a gene webpage determined? Q&A genetic-screens , data	3	950	December 15, 2022

Confounders in the Predictability Tab

Related topics