I want to get the strongly selective genes and cell line names of these genes from all genes and cell lines of DepMap. So, I downloaded a gene dependency summary file from this link and then got the strongly selective genes. However, the file doesn’t include meta data for dependent cell lines of the genes.
My questions are
- How to get the meta data for cell lines of strongly selective genes using DepMap download files (NOT website)? For example, there are 228 dependent cell lines of EGFR and I want to get the meta data of the cell lines.
- I found gene dependency summary file through a search on this forum. Why isn’t that file on the download page (DepMap Data Downloads)?
I have the exact same questions.
I’m not entirely sure I understand the first question.
It sounds like you’re saying, you’ve identified a group of cell lines and you’re looking for the cell line’s metadata in the downloads section. If that’s the question, then the answer is that all metadata (ie: disease, tissue, etc) about cell lines is included in the “sample_info.csv” file in each release.
However, your question also includes “… for cell lines of strongly selective genes …” which I’m having difficulty understanding. Different cell lines are sensitive to different strongly selective genes. If one was to take the list of strongly selective genes, and then pick the cell lines which were sensitive to at least one of those genes, I’d guess you’d end up with most of the cell lines.
Is that what you’re looking for?
Regardless of how you pick your cell lines, all the metadata we have about the lines should be contained in the sample_info.csv files.
Regarding question #2: The reasons why it is not part of the data release is purely logistical/historical.
We compute the “strongly selective” metric as part of loading data into the portal and intended to show it for any gene that users are interested in.
We later discovered that people wanted them all in bulk, so the we make an interim solution by adding the url you referenced. (This url doesn’t actually downloading a generated file, but runs a query against the portal’s database and get the values the portal is showing.)
The first question can be simplified as such:
778/1070 lines are dependent for a given gene. What are those lines?
With regard to the hidden API, is there any other information about it? Can we pull specific releases or data?
Ah, okay, that I can answer easily.
778/1070 refers to 778 out of 1070 lines in the dataset are “dependent” as defined on the portal:
Now, to get the probability of dependency for each cell line, you can download CRISPR_gene_dependency.csv from the 22Q1 release (because the tile says the dataset is “CRISPR (DepMap 22Q1 Public + Score)” )
Now, if you pull out data for the gene you’re interested in then you can filter the lines for those who has a “probability of dependency” value which is > 0.5 (ie: The line has a gene effect value which looks more like one from an essential gene then a non-essential gene in that line) Those are the 778 lines the tile is talking about.
Lastly, regarding the hidden API, it’s documented along with the other portal APIs at DepMap APIs but there’s really almost no information there.
You cannot get anything other than the current release and there are no additional options. We only computed the values that we needed for displaying on the portal.
Excellent, thanks for the info.
Bummer about the API, are there any plans to expand/develop it further?