Technical questions regarding the utilization of two-group comparison in custom analysis section

Dear DepMap community,

I would like to ask some more methodological/technical questions regarding the two-class comparison of custom analysis section; in detail, based on pre-defined groups of cancer cell lines, I noticed the following “differences” in the interpretation of results:

while using the CRISPR (DepMap 21Q4 Public, Chronos) dataset

and with the RNAi dataset:

  1. The difference in the results, regarding the top hits and the profiling of included cell lines, is based in each methodological pipeline of scoring dependencies?

  2. In addition, which is the main difference between the above CRISPR and the CRISPR (DepMap 21Q4 Public +Score, Chronos)? mainly on the profiled cell lines and the batch effect correction? And the latter corresponds in the downloads section in the CRISPR_gene_effect.csv file?

  3. Regarding the statistical test implemented: as the two-group comparison uses empirical-Bayes moderated effect size estimates, regarding the hypothesis generation, could someone uses the raw p-values instead of Q-values? even with small effect sizes ?as it might be that including all the available genes, might result in stricter multiple correction?

  4. Finally, when comparing Group 1 vs Group 2, a positive effect estimate would indicate, a higher essentiality in that gene in the second group?

Thank you in advance for your overall help and support :slight_smile:

Best,

Efstathios

Hi Efstathios,

Here are my brief responses for the last two questions are below, hope that helps but please feel free to reach out for any clarifications. For the former two, I refer to @Joshua_Dempster and @mburger

  1. I think that is possible but from a statistical perspective, I would avoid that unless you have a good biological/scientific reason to do so. I personally suggest primarily using effect sizes while filtering relevant genes and then using q-values as a secondary criterion.

  2. A positive effect size implies group one has a higher score than the second, in other words, cell lines in the second group are more vulnerable to the perturbation (either gene knock-out or knock-down).

Warmly,
Mustafa

With regards to question 2, I think you are correct if I understand you correctly. Public+Score is just the integrated dataset and corresponds to CRISPR_gene_effects.

To address your first question, I would say that differences in results when using RNAi or CRISPR datasets most likely reflect the cell lines included (the RNAi dataset is a combination of several sub-genome libraries with many missing values) or the difference between gene suppression (RNAi) and gene knockout (CRISPR). However, they are correct in assuming that the computational data processing pipelines also differ (DEMETER2, Chronos).