Dear DepMap developers,
Is there any way we can get a reference to why 0.5 and -1 is the threshold value for the probability of dependency and gene effect, respectively? What is the mathematical background to these values? I understand the processing and steps that are taken but I cannot still make the sense of why these values are the threshold.
Thank you!
Best wishes
We try to explain this under the Dependent Cell lines tile on the gene page, and so I’ve taken some quotes from that in my explanation below.
We describe the 0.5 threshold as:
A cell line is considered dependent if it has a probability of dependency greater than 0.5.
Note, this is a threshold on the measurement “Probability of Dependency” which we define as:
Probabilities of dependency are calculated for each gene score in a cell line as the probability that score arises from the distribution of essential gene scores rather than nonessential gene scores. See here for details.
“Probability of dependency” is essentially modeling the gene effect as either being sampled from the distribution arising from “essential” genes or “non-essential” genes, and being a probability, the values range from 0 to 1.
The reason for the .5 threshold is that we’re saying that if there is more than a 50% chance that this gene was sampled from the “essential” distribution and not from the “non-essential” distribution, we’ll call that gene “essential”.
As far as what a gene effect of -1 represents: That value is not intended to be used as a threshold rather as a reference point which can be used to compare gene scores. To put gene effect scores on a comparable range with one another, we’ve scaled the values with -1 and 0 representing anchors to put the scores on a similar scale. This normalization is described on the page as follows:
Outcome from DEMETER2 or CERES. A lower score means that a gene is more likely to be dependent in a given cell line. A score of 0 is equivalent to a gene that is not essential whereas a score of -1 corresponds to the median of all common essential genes.
Again, we’re using the distributions of essential and non-essential genes. We’re scaling gene effect such that 0 is placed at the median of the non-essential distribution and -1 is the median of the essential distribution.
I hope that helps!
Thanks,
Phil
3 Likes
This really helps!
Thank you very much Phil and DepMap team!