How comparable are the Achilles and CRISPR "gene dependency" files?

abalter · May 20, 2022, 4:21am

Similar to this question: What's the difference between Achilles_gene_effect.csv and CRISPR_gene_effect.csv?

The current README file says:

Achilles_gene_dependency.csv
Pipeline: Achilles
*Post-Chronos* Probability that knocking out the gene has a real depletion effect using gene_effect. - Columns: genes in the format “HUGO (Entrez)” - Rows: cell lines (Broad IDs)

and

CRISPR_gene_dependency.csv

Pipeline: Achilles

Gene Dependency Probabilities represent the likelihood that knocking out the gene has a cell growth inhibition or death effect. These probabilities are derived from the scores in CRISPR_gene_effect.csv as described here: https://doi.org/10.1101/720243 - Columns: genes in the format “HUGO (Entrez)” - Rows: cell lines (Broad IDs)

So the Achilles data was processed with CHRONOS. But the CRISPR wasn’t? At least the README doesn’t say it was.

What is usually considered the “best” gene dependency data set?

Joshua_Dempster · May 20, 2022, 8:53pm

Hi abalter,

CRISPR_gene_dependency is created from CRISPR_gene_effect in the same way that Achilles_gene_dependency is created from Achilles_gene_effect. CRISPR_gene_effect is generated from separate Chronos matrices using Harmonia, as described in the README:

CRISPR_gene_effect.csv

Pipeline: Achilles

Gene Effect scores derived from CRISPR knockout screens published by Broad’s Achilles and Sanger’s SCORE projects.

Negative scores imply cell growth inhibition and/or death following gene knockout. Scores are normalized such that nonessential genes have a median score of 0 and independently identified common essentials have a median score of -1.

Gene Effect scores were inferenced by Chronos ( https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02540-7 )

Integration of the Broad and Sanger datasets was performed as described in https://doi.org/10.1038/s41467-021-21898-7, except that quantile normalization was not performed.

abalter · May 20, 2022, 9:48pm

The two datasets almost completely overlap in terms of both cell lines and genes. So if I’m doing a naive statistical analysis and want a single number to represent gene essentiality for a given cell line, what is the right approach?

My analysis involves many other datasets such as GDSC drug sensitivity, TCGA expression, etc. So I’m less concerned with the details of each pipeline than in what experts consider the most definitive measure of essentiality (or dependency.

Joshua_Dempster · May 31, 2022, 1:02pm

I would use CRISPR_gene_effect

abalter · June 29, 2022, 3:56pm

Hi @Joshua_Dempster. I don’t think I noticed this page on the integrated datasets before, or maybe it’s new. It has a link to the integrated datasets. This dataset contains two data files: CERES_FC.txt and CRISPRcleanR_FC.txt.

The stated goal of the second paper is

Here, we investigate the integrability of the full Broad/Sanger gene-dependency datasets, yielding the most comprehensive cancer dependency resource to date, encompassing dependency profiles of 17,486 genes across 908 different cell lines that span 26 tissues and 42 different cancer types.

Also, the pipeline diagram appears to have three inputs and one output. So, I was expecting a single integrated dataset. Instead there is still a Broad (CERES) and Sanger (CRISPR) version.

Would I be correct to assume that 1) the files on the depmap download site represent the most up-to-date versions and that 2) The CRISPR essentiality scores would be the ones to use if I only pick one?

Joshua_Dempster · June 29, 2022, 7:44pm

Yes, the portal contains the up to date versions of the data and that is what you should use. The dataset generated in support of the publication includes many forms of the data, not just the ones we recommend. If you were only going to use one, the integrated (CRISPR) files have the benefit of spanning more lines.

abalter · June 29, 2022, 8:56pm

Thanks again. That’s really helpful.

Topic		Replies	Views
Difference between gene effects/dependency files Q&A data	2	1224	June 25, 2021
What's the difference between Achilles_gene_effect.csv and CRISPR_gene_effect.csv? Q&A	2	1055	April 29, 2022
Difference between Achilles Common Essentials and CRISPR Common Essentials? Q&A	7	1071	June 7, 2022
Differences between Achilles RNAi downloaded files and web portal Q&A	1	321	October 24, 2022
RPS14 CRISPR gene effect data not available in 22Q2 cdv file Report an Issue data	1	226	October 31, 2022

How comparable are the Achilles and CRISPR "gene dependency" files?

CRISPR_gene_effect.csv

Related topics