Genes missing in download data

Hi,

I looked for Chronos data of SEPTIN3 and found all information on the webpage. After downloading gene dependency/effect data, the gene is not included in the csv table of 21Q4.

So the websites told me it shows Chronos data from 21Q4 and I can see the data for the gene, why is this data not included in the download table?

UPDATE: I saw the gene was included in 20Qx data but not in all 21Qx…maybe a problem of Achilles vs. Chronos data ? But why visible online?

Thanks,
Danny

I’m having some trouble reproducing the issue. I just tried downloading Achilles_gene_effect.csv from the 21Q4 release, and I see that gene in the header:

...SEPTIN2 (4735),SEPTIN3 (55964),SEPTIN4 (5414)...

When you say “the download table” which file are you looking at? Or is the download table something you’re downloading not through the file download section of the portal?

Thanks,
Phil

On the gene page, at the top of the tile, you can see that the CRISPR data is coming from the “DepMap 21Q4 Public” data (processed with Chronos)

The file that corresponds to is Achilles_gene_effect.csv in the data release. I have a hunch that you may have looked at CRISPR_gene_effect.csv. This file actually is combination of the Achilles and Sanger’s Score data (also processed by Chronos). In that dataset, the gene is missing, probably because the Score project didn’t knock out the gene, and so it also got dropped from the combined dataset.

If you look at the choices for datasets, you can see that different genes are represented in different datasets. For example, this gene was knocked out in the latest Achilles dataset (labeled as DepMap 21Q4) and the past DepMap 19Q1 dataset.

However, BRAF was knocked out in both the DepMap dataset, the Project Score dataset, and the resulting combined dataset (“DepMap 21Q4 Public+Score”)

image

Thanks,
Phil

Dear Phil,

thanks for your help and efforts. I’m indeed confused. What exactly is the downloadable data file behind the CRISPR (DepMap 21Q4 Public, Chronos) data where I can see the gene in the web-version? Maybe I looked at the wrong file. Because the webpage said it ics CRISPR gene effect …which I also downloaded…but you told me it should be in Achilles_gene_effect.csv…which is confusing I think…so to get it right. CRISPR_gene_effect combines different projects and if a gene is missing in one project it is removed? Where Achilles_gene_effect.csv includes exactly what?

Thanks in advance and all the best,

Danny.

I agree, this can be very confusing if you’re not already familiar with the DepMap releases. One of the things we’ve been talking about internally is how to improve our downloads to make it clearer what each one represents, but that is a work in progress.

As a step in that direction, in the 22Q1 release, we’ve updated the descriptions of some files to explain the situation better. In the new release the description for CRISPR_gene_effect is described is:

Gene Effect scores derived from CRISPR knockout screens published by Broad’s Achilles and Sanger’s SCORE projects. … Negative scores imply cell growth inhibition and/or death following gene knockout. Scores are normalized such that nonessential genes have a median score of 0 and independently identified common essentials have a median score of -1. Gene Effect scores were inferenced by Chronos. View full the full Chronos publication here.
Integration of the Broad and Sanger datasets was performed as described in this publication, except that quantile normalization was not performed.

Whereas, the Achilles_gene_effect.csv file represents just the data that was generated at the Broad, processed by Chronos.

The data in both CRISPR_gene_effect and Achilles_gene_effect are visible in the portal, with the first dataset being preferred. However, if we’re showing a gene which is missing from CRISPR_gene_effect, we fall back to Achilles_gene_effect.

On the perturbation effect tab of the gene page, you can see all the datasets for which we have gene effect values. I’ve overlaid in red which datasets in the portal correspond to the two files in the release.

We try to use those labels consistently in the portal so that you can tell which data is shown. For example in the gene page, you’ll see tiles reference “CRISPR (DepMap 21Q4 Public+Score, Chronos)”

I think the confusion stems from the dataset label in the portal don’t have an obvious mapping to filenames in the release. However, what you can do to work around this: view a feature in data explorer and then look at the link under “See pages for”. This will point to the file that the data you’re seeing came from.

Dear Phil,

many thanks to that detailed explanation!

Think the confusion is exactly what you mentioned in the last sentence…that labels from the portal not obviously map the file names from the release.

Now I can work better with the data!

Thanks again and all the best,

Danny.