meaning of efficacy column in CRISPRInferredGuideEfficacy.csv + a few other questions

Hi,

  1. In the CRISPRInferredGuideEfficacy.csv file there are sequences of sgRNAs and their efficacies, which If I understood correctly, are the probability (or at least some measurement) of how well they caused a loss of function in their target gene which were calculated using sgRNA log2fold change data from hundreds of cell-lines. Did I understand correctly? If not what is the meaning of the efficacy in that column?

  2. I want some measurement of the effect of the sgRNA on its target gene, or at least a measurement for the loss of fitness to the cell for each sgRNA. If this measurement exists only per cell-line then what is the most appropriate data for me to use? If some measurement exists which accounts for all cell-lines, Is the data in CRISPRInferredGuideEfficacy the one that fits the best to my needs? if not what data is?

  3. Where can I find a list of all the cell-lines that were experimented on?

Thanks,

Shai

Hi Shai, the CRISPRInferredGuideEfficacy.csv file contains the efficacies for guides learned from the logfold change while training Chronos, please see Dempster et al. 2021 for more details on how the Chronos model works.

If you are interested in the fitness effect for a gene in a specific cell line (or across all screened cell lines), you can you the CRISPRGeneEffect.csv file. This matrix contains the viability effect estimates produced by Chronos, where a more negative value indicates a cell line depends highly on that gene for survival. The CRISPRGeneDependency.csv file reports the probability that a gene is essential for survival in a given cell line, based on pan-essential and nonessential control genes.

You can find the list of cell lines in the CRISPR data from the ModelID column of the ScreenSequenceMap.csv file, with additional cell line metadata matching those IDs in the Model.csv file.

Thank you for your response! I just want to be 100% sure I understand the data: The efficacies in CRISPRInferredGuideEfficacy.csv are the likelihoods of those sgRNAs to knock out their target gene that maximized the likelihood of the observed matrix of normalized readcounts given all the constraints detailed in the paper? And the cells from which the readcounts are taken from are detailed in ScreenSequenceMap.csv?

Thanks again,

Shai