I have the same question. @aviad sent us here but folks in my lab can’t find the file. If in supplement of pub or on GEO can you send a link? thanks, Patrick
My guess is that @Aviad was trying to point you to the instructions I wrote on how to download correlates one at a time.
When you say you have the same question, do you mean you’re interested in getting more of the top co-dependencies, or you’re interested in a bulk download of co-dependencies like @Dietrich was asking about at the end of the thread?
We don’t currently expose the co-deps that we’ve computed as a downloadable file. Just the top 100 correlates for all the profiles we compute correlations for comes out to ~15 GB and they’re stored in an internal format designed to facilitate querying. I notice you both are ask specifically for co-dependencies, so perhaps that could be something we provide as a download.
Or would API access to export a set of genes be more useful?
This seems like this is a reoccurring request, and so I’m wondering what the mechanism should be for sharing this.
I’m one of the people in @paddisonp 's Lab who is trying to download your pre-computed associations. Thanks for the reply. We are mainly interested in a bulk download of the correlations between CRISPR (Avana) Gene Effect scores (CERES) for many genes (if possible, even all genes that were in this CRISPR library) and gene expression data. The top 100 gene expression correlations for each gene would be fine, although if we are able to download more that would be better.
On a related note, would it be possible to also include the p-values in this bulk download? I noticed that the p-value for a given association is shown when one clicks to expand the “Linear regression” section, but the p-values are currently not included in the downloads.
An API sounds like the only real option. Flat files that big will be pretty difficult to parse and extract information without specialized tools. Additionally, it may be for certain genes you will want to go much deeper than the top 100 genes/features. Could you make an API that exposes:
Identifying all the features which are significantly associated with a given feature. With the ability to change the filtering criteria:
P-value filter
q-value (corrected p-value) filter
feature types included (dependencies, RNA expression, mutation, etc.)
A way to use number 1 but for a bunch of genes using a uniform set of filters to generate a network.
The results would come out as a csv with:
geneDep1,feature1,stat(to give direction of relationship),p-value,q-value
I agree, that sounds like a good goal. What I’m now thinking about is how to get to a path to deliver something like that.
We plan our development roadmap quarterly, and while we can often squeeze in bug fixes and small changes in when needed, adding this as described wouldn’t be small. As a result, this is in the queue to be scheduled, and does not yet have any ETA.
Things that I can think would be easier given where we are today:
We have stored in the database the top 100 correlations between several datasets stored in a database. We could add a simple API which will fetch the top 100 correlates for a set of genes. However, you won’t get more than the top 100 and you won’t get a p-value because it’s simply not already stored in the DB.
I could share the python code that the portal uses to compute large tables of correlations from our published files. This would hopefully allow one to compute any correlations you want, but it would require one to be comfortable with python.
I think the original suggestion sounds like a capability that I think people would find useful, but it’ll require making a few changes to implement, and therefore won’t be something that we’ll be able to get to for a while.
Would one of the two “easy” options I listed be a worthwhile short-term substitute?
As run above it will correlate the same matrix against itself, but one could also use the same script to correlate expression against gene effects, or any other matrix which is in the format that we provide in the DepMap downloads.
I am interested in this first “easy” option… have you added an API for this function? I am interested in accessing all genome-wide correlations, but the top 100 correlations for all genes could be a good point to begin if that is the limit for accessible data at this point. Is this possible to access genome-wide?
This is an old thread, but wanted to share in the latest portal update, we now have a link to download the top 100 correlates from the co-dependency tile:
This means that you can replace the gene symbol with which ever gene you’d like to get the table of correlates. (This probably is feasible for fetching a small number of genes. I’d discourage pulling all gene’s via this approach. It will be much faster to simply compute the full correlation matrix using the code I provided if you want all correlations.)
Achilles_gene_effect.csv input file will give which co-dependency table for CRISPR (DepMap 21Q2 Public+Score, CERES) or RNAi (Achilles+DRIVE+Marcotte, DEMETER2). And what would be the input file to generate the other table?
Understanding: link https://depmap.org/portal/gene/BRAF/top_correlations?dataset_name=CERES_Combined would provide the CRISPR (DepMap 21Q2 Public+Score, CERES) table.
Query 1: What will be the URL if the requirement is to calculate the co-dpendency table from RNAi (Achilles+DRIVE+Marcotte, DEMETER2) as well.
Query 2: Is there any possibility to retrieve p-value or q-value for these co-relations scores.
Thanks for your response @vidat.
My objective is to have gene co-dependency tables from both CRISPR (DepMap 21Q2 Public+Score, CERES) and RNAi (Achilles+DRIVE+Marcotte, DEMETER2) for the annotated genes. Hence want to run the script and generate the co-dependency matrix.
Query: what should be the input file for correlation_from_csv.py script to generate the output file that would be give the results for CRISPR (DepMap 21Q2 Public+Score, CERES) and for RNAi (Achilles+DRIVE+Marcotte, DEMETER2)
(If you copy the URL from the co-dependency tile in the RNAi section, you should see a similar URL there)
Now, this does not compute the correlations, rather it’s just downloading them from tables we’ve precomputed, and we have not be storing p-values or q-values. So this download gives you the data that portal has already precomputed, but does not do anything additional work.
As far as correlation_from_csv.py goes, you would run it once on the DEMETER2 data ( D2_combined_gene_dep_scores.csv ) for RNAi:
Is there any progress with the API for downloading the precomputed Pearson correlation between dependency (CRISPR or RNAi) with other genomic features, such as expression, mutation, methylation… Using the script to compute the correlation from scratch would not be a feasible choice for many of us to explore lots of genes…
Another question is: is it possible to download the plots (plotly in html or regular image format) through API?