Conserved dropout frequency of genes in cell lines derived from different tissues

Aslihan · February 4, 2023, 7:58pm

Hi,

Thank you for this valuable platform. I am trying to understand a biologically interesting observation I had using CRISPR_gene_dependency.csv from 22Q1 release.

Briefly, I binarized the probability matrix using a threshold of 0.8 and then calculated drop-out frequencies for genes in colorectal cancer, lung cancer and lymphoma cancer cell lines. I observe that the drop-out frequencies across cell lines from different tissues correlate very well which is unexpected. I can’t really explain biologically why if a gene drops out in 20% of the colorectal cancer cell lines, it is also likely to drop out 20% of the lung cancer cell lines. Could this be an artifact of the model used in probability calculations? Thanks.
Drop_out frequency plots are below:
lung_vs_colon

lymphoma_vs_colon

Joshua_Dempster · February 6, 2023, 2:35pm

Hi Aslihan,

This is a good question that I think raises a few important things to keep in mind in CRISPR analysis:

The great majority of dependencies in any one cell line are common to most or all cell lines
There’s a wide range in mean strength for these common dependencies
Probability of dependency (i.e., the confidence with which you can call a knockout depleting) is inevitably related to the strength of the dependency, along with the quality of the screen.

On that last point, if you plotted mean gene_effect against the fraction cells with gene_dependency > .8, you would see a very strong relationship. So we shouldn’t interpret your plot axes as literally saying “this gene is a dependency in X% of cell lines.” Rather, they mean “this gene dependency is strong enough to be identified with 80% confidence in X% of cell lines.” Most of these genes with dropout fraction > .25 probably have some genuine viability phenotype in most or all cell lines regardless of lineage. But the weaker the phenotype, the fewer the lines in which we will be able to call the dependency. Which specific lines we can detect weak dependencies in is more a function of random noise and screen quality than tissue biology. This is why I generally advocate against thresholding and binarizing. A lot of apparent differences is just the effect of things coming just under or just over the cutoff.

A side note: the dependency probability is called individually in each cell line using the distribution of unexpressed genes and prior common essential gene effects in that line; no information is shared across lines.

Hope that helps,

Josh

Aslihan · February 6, 2023, 9:20pm

Hi Joshua,

Thank you for your answer and pointing out that the larger gene effects are easier to detect in more cell lines.

Best,
Asli

Topic		Replies	Views
Where is probability of dependency used? Q&A portal , genetic-screens	3	2083	October 24, 2023
CRISPRGeneEffect vs CRISPRGenedependency Q&A genetic-screens , data	1	2984	February 24, 2023
DepMap Genetic Dependencies FAQ Q&A	5	14004	August 23, 2024
Cell lines with probability of dependency Q&A	1	538	April 21, 2021
Download numbers for fraction calculation of essentiality in Xyz/1054 cell lines Q&A	3	276	January 21, 2022

Conserved dropout frequency of genes in cell lines derived from different tissues

Related topics