Question on unreleased metrics: "Gene Confidence" and "Predictability"


I was reading a really helpful blog post written by @Joshua_Dempster on “gene confidence scores” to help assess the reliability / utility of the DepMap data for a gene of interest:
Assessing Confidence in Achilles Gene Profiles

The approach made a lot of sense, and I would like to try to replicate it to gain insight into a few genes of interest. However, I noticed that two of the data sources were not publicly available, but pulled from internal files via taiga:

(1) NormLRT scores
gene_summary = tc.get(name="summary-table-0720", file='Target-Discovery-20Q2-internal')

(2) Gene “predictability”
predictions_full = tc.get(name='predictability-d5b9', file='ensemble-regression-complete')

In a previous post, I was able to figure out how to calculate (1) - LRT scores - which agreed with some internal code posted by a DepMap team member (NormLRT code availability).

However, I’m not sure how to figure out (2) - gene ‘predictability’. I’m assuming this corresponds to the model performance of an unbiased model (using multiple feature types - RNA expression, mutations, CNA, GE scores, etc…) in predicting a given gene’s GE score, implemented either using a Random Forest as in Josh’s preprint, - or - perhaps from running ATLANTIS.

I will try generating a model using ATLANTIS or RF (via scikit-learn), but I am curious what the structure of the underlying data are, and how they are generated. Can you say whether you use “predictability” scores generated from ATLANTIS or RF?

Of course, if you have plans to make either of these files publicly available any time soon it would make life a lot easier. Are there plans to release scores for either “gene predictability” (along with feature importance scores) or “gene confidence scores” any time soon?



Yes, you’re right the NormLRT scores are computed with the code that was posted earlier and the predictability is using the methodology from Josh’s preprint.

We’re in the process of preparing the predictability results to be shown in the portal, and with that addition, those results will be downloadable from the portal.


1 Like

good to know! I will play around with Josh’s method to see if I can get it working but definitely look forward to the publicly available version.

Thank you Phil