Identification of selectively dependent hits in subgroups of specific cancer cell lines annotated to the same cancer

Dear DepMaps,

I hope my message finds you well !! I wanted to ask a more specific question regarding the appropriate utilization of the DepMap scores, aiming to find preferentially essential/context specific core genes across specific subgroups of cell lines:

briefly, regarding some clinical factors such as microsatellite instability and the prevalence of specific mutations, I have selected a set of ~60 cell lines, annotated to a specific cancer type (using the CCLE mutations latest version file, along with the provided MSI status for the matched cell lines); These cell lines, are further stratified into 4 main groups, based on the combination of aformentioned information; My ultimate goal, is to find the “top” most preferentially essential genes in each group, which could then further be exploited, compared with some in-house data and further validated;

In the search of each cell line (example MCF7: MCF7 DepMap Cell Line Summary it includes the top10 preferentially essential genes; However, this is based on all cell lines, as also it depicts the top10;

Thus, my ultimate questions/concerns are the following:

  1. Regarding the respective methodology

A) Initially, I could utilize the latest CRISPR_gene_effect.csv file, subset only regarding the CCLE cell lines of interest and then use for example the relative file CRISPR_common_essentials.csv to exclude the “pan-cancer common essentials”?

B) Then, similarly as described in the portal, for each remaining gene in each cell line, substract its effect score by the mean of all the remaining cell lines? And as a final step, could prioritize-at least in each cell line the respective genes-? In addition, select an arbitary cutoff?

  1. Could I assume that for each cell line, the top hits can be considered as the top selectively dependent, but not necessarily the most depleted?

  2. Is there an approach that I could also do per group basis? Even the uniqual number of cell lines annotated in each group? In order to somehow aggregate some results and highlight differences/commonalities in respect to essential genes in each group?
    For example regarding step 2, I could do “group comparisons”? For example compute the mean effect of each gene across each group, and then for each pairwise comparisons compute the relative differences? In which similarly, my major focus would be on the most negative resulted values?

Any idea or suggestion would be grateful :slight_smile:

Cheers,

Efstathios