Question on unreleased metrics: "Gene Confidence" and "Predictability"

pathogen623 · March 24, 2021, 10:09pm

Hi,

I was reading a really helpful blog post written by @Joshua_Dempster on “gene confidence scores” to help assess the reliability / utility of the DepMap data for a gene of interest:
Assessing Confidence in Achilles Gene Profiles

The approach made a lot of sense, and I would like to try to replicate it to gain insight into a few genes of interest. However, I noticed that two of the data sources were not publicly available, but pulled from internal files via taiga:

(1) NormLRT scores
gene_summary = tc.get(name="summary-table-0720", file='Target-Discovery-20Q2-internal')

(2) Gene “predictability”
predictions_full = tc.get(name='predictability-d5b9', file='ensemble-regression-complete')

In a previous post, I was able to figure out how to calculate (1) - LRT scores - which agreed with some internal code posted by a DepMap team member (NormLRT code availability).

However, I’m not sure how to figure out (2) - gene ‘predictability’. I’m assuming this corresponds to the model performance of an unbiased model (using multiple feature types - RNA expression, mutations, CNA, GE scores, etc…) in predicting a given gene’s GE score, implemented either using a Random Forest as in Josh’s preprint, - or - perhaps from running ATLANTIS.

I will try generating a model using ATLANTIS or RF (via scikit-learn), but I am curious what the structure of the underlying data are, and how they are generated. Can you say whether you use “predictability” scores generated from ATLANTIS or RF?

Of course, if you have plans to make either of these files publicly available any time soon it would make life a lot easier. Are there plans to release scores for either “gene predictability” (along with feature importance scores) or “gene confidence scores” any time soon?

Thanks,

Dylan

pmontgom · March 25, 2021, 8:35pm

Yes, you’re right the NormLRT scores are computed with the code that was posted earlier and the predictability is using the methodology from Josh’s preprint.

We’re in the process of preparing the predictability results to be shown in the portal, and with that addition, those results will be downloadable from the portal.

Thanks,
Phil

pathogen623 · March 25, 2021, 10:11pm

good to know! I will play around with Josh’s method to see if I can get it working but definitely look forward to the publicly available version.

Thank you Phil

Dylan

Topic		Replies	Views
Random forest code Q&A data , documentation	2	84	July 1, 2025
NormLRT code availability Q&A	4	1316	February 11, 2021
How to download/calculate gene statistics Q&A	2	2479	August 20, 2020
Announcing the 24Q2 Release Announcements	1	2750	June 3, 2024
About PREDICTABILITY Q&A	1	266	April 26, 2024

Question on unreleased metrics: "Gene Confidence" and "Predictability"

Related topics