Confounders in the Predictability Tab

Hello,
I want to understand whether in the predictability tab you are accounting for known confounders when describing correlations with other feature types. For example, if geneA’s RNAi reveals that top feature (highest importance score) to be SSMD (a known confounder metric), can we assume that the importance score of the next best gene hit (like expression / copy number of geneB) in the list is independent of the effect of SSMD on RNAi sensitivity to geneA?
Thanks in advance!

No, I don’t think you can make a statement as strong as, “the second most important feature is independent from the most important feature”.

The models are random forest models, and I don’t believe that property holds true. For example, if you have two correlated features, it’s not uncommon to see the feature importance distributed across the two features.

The random forest method is training a “forest” of decision trees, each on a random subset of the data, so in practice, one of the the two features will be strongest in some subsets and the other strongest in other subsets. The importances we report are averaged across all trees, so it has the effect of dividing the weight onto the the features.

I can imagine this property could arise in some situations, but in general, I don’t think you can assume that’s the case.

Thanks,
Phil