Identify conditionally essential genes in clinical subgroups of a specific cancer type

Dear DepMap community,

Initially I would like to wish to everyone a happy new year with health above all !!

Concerning my post, based on an ongoing collaborative project, within a specific cancer type, we have ranked and scored specific mutated genes based on distinct clinical subgroups-which are defined based on the mutational status of known cancer driver genes (i.e. GeneA_mutated_group, GeneB_mutated_group and WT)

Our ultimate goal is to further exploit and validate:

  1. Based on the DepMap portal genetic screens to investigate if any of these identified top ranked genes in each category, could be considered as “selective essential genes” and interesting putative drug targets for further exploitation and hypothesis generation in each clinical subgroup;

  2. Further utilize the DepMap portal resources to further identify any perturbagens that might target these genes of interest;

On this direction, one direct approach would be from custom analysis, to select these cancer cell lines-for example in colorectal cancer-that harbor mutations in gene A vs the same cancer cell lines that have mutations in gene B (which have been mentioned as clinical subgroups from above) and perform a two-group comparison:

Based also on my attached example

volcano plot for some random cell line comparisons, my major questions are the following:

A) For my research goal and hypothesis comparison, the appropriate dataset would be the selected CRISPR (DepMap 21Q4 + Score, Chronos)?

B) For these genes that show a q value <=0.05, it could be translated as these genes show a “preferential essentiality” in the compared cell lines? That is based on the CRISPR latest dataset genetic screen dependencies? And then directly compare these hits with our top prioritized top mutated genes to highlight common targets? That are “clinical group/subtype” specific?

C) In addition to step B there could be a further way to decipher and remove any targets resulted, that could be “pan-cancer” or generally essential in most cancer cell lines and might produce a “toxic effect”? In order to balance efficacy with essentiality ?

Thank you in advance for your time and consideration:)



Hi Efstathios,

I add some thoughts on the specific questions below. For some of these questions you might want/need to download the relevant data and do the analyses outside the portal.

A) Yes, we generally recommend using the combined Sanger+Broad Chronos dataset as a default, with the latest version of the DepMap data, which you have selected here.

B) Yes I think that’s right. Those genes could be interpreted as being significantly more essential in the set of cell lines "New List 1’ compared to the cell lines in the set “New List 2”. Where the values being compared are the gene KO effects measured in the CRISPR dataset selected above. Note that this tool uses empirical-Bayes moderated effect size estimates (see here), which I believe could explain why these effect size estimates seem ‘one sided’. You might want to compare to running basic t-test comparisons per gene if you want to keep things simpler and more intuitive.

C) To estimate the average essentiality of each gene across cell lines you could compute the average gene effect per gene, or the fraction of cell lines ‘called’ dependent (using the dependency probability data we provide, in the files called “gene_dependency”). More info here

Hope that helps!

1 Like

Dear @jmmcfarl ,

thank you very much for your quick response an d very helpful comments !! If not of further disturbance, I would like to ask you for some clarifications based on your separate answers:

B) 1) Regarding the comparison of the cell lines and the empirical-Bayes moderated effect size estimates: as far as the cell line sample size in each group contains at least 5 cell lines, the statistical approach which is implemented should be robust?

  1. For further purposes-extending also the current project goal-if I would like to perform also an additional statistical comparison with basic t-test or similar approaches, I should download from the relative page:

and download the file called
Achilles_gene_dependency.csv ? where the first column named DepMap_ID is the cell line identifier? And then directly perform a statistical comparison on the relative groups without any transformation?

However, what is the difference with the other file called
Achilles_gene_effect.csv ? As the latter also includes negative values? Based on the two articles which you have kindly provided, plus two additional links:

a gene essentiality value probability can be used with a cutoff >=0.5, where the gene effect more like the fold changes I should utilize? And a more negative value denotes a higher effect? However, also based on the above volcano I was a bit confused about the following;

I should search for values based on the two-group comparison less than -0.5, but less than -1? because -1 mainly is related to common essentiality ?

C) In parallel, based on your 2 suggestions:

is it robust from your expertise, after the utilization of DepMap and statistical comparisons between the 3 groups, and identifying some common targets from our analysis:

utilize for example the file called
CRISPR_common_essentials.csv to further remove any hits that are included in this list? As if I have understood well, common essentials refer to actual “positive controls”-that is genes that are fundamental to probably both cancer and normal cells, and their inhibition could be considered toxic, thus found essential in most or a significant number of distinct cancer cell lines?

D) Finally, for more general purposes of making calculations and statistical inferences outside the portal: if I would like to use the gene effect file score, and to compute for example an average gene effect in specific cell lines and/or cancer types:

I could use a metric like mean or median for the selected cell lines, and then apply a simplistic cutoff? For example, less than -0.5 but also lower than -1?

Thank you one more time for your feedback, help and support and apologies for the long message !!



Hi Efstathios,

We can’t really comment on specific downstream analyses or research questions. I’ll try to provide a few quick thoughts.

  1. For analysis like this, I’d suggest using the files “_gene_effect” which give estimated viability effects following gene perturbation. Please see the references cited in the data downloads page for more detailed info on processing. The “_gene_dependency” files represent estimated probabilities of dependency (described more here). By default now we use the files “CRISPR_gene_effect” which includes both Broad and Sanger-generated CRISPR screen data. “Achilles_gene_effect” is just Broad-generated data.

Yes this CRISPR_common_essentials file could be used in this way. The criteria you want to use to define common-essential genes in your application might be different though.

I can’t really comment on the best way for defining common essential genes. Some simple thresholding like you describe could be reasonable I think.

Hope this helps

Dear @jmmcfarl,

thank you very much for your time spent and updated response !! Apologies for my previous email, and I fully understand the scopus of the forum; without further causing disturbance, two very quick and major points that I was a bit confused, for the robust utilization and understanding of the genetic screens and data in DepMap portal:

  1. Regarding common essential genes, strongly selective and other terms utilized also in the forum: based on the following thread: Some measure of # of selective genes which I have also downloaded the file named “Gene Dependency Profile Summary”: I noticed for some genes such as KRAS, that are both strongly selective and common essential for some of the combo datasets; is this possible ?

  2. For final confirmation regarding the general utility purposes of using the DepMap portal: for a small description of any customized analysis using the gene effect scores; using for example CRISPR_common_essentials as one of the lists to remove “positive essential genes”, and then using like a threshold that is less of -1, could be considered overall an initial robust screening strategy? So the justification using also the relative publications is that anything like -1 or even lesser, most probably represent “positive essential genes”?

Thank you in advance :slight_smile:

Kind Regards,