Down load of co-dependencies

Continuing the discussion from Download of Top Co-dependencies Pearson correlation coefficients possible?:

I have the same question. @aviad sent us here but folks in my lab can’t find the file. If in supplement of pub or on GEO can you send a link? thanks, Patrick

My guess is that @Aviad was trying to point you to the instructions I wrote on how to download correlates one at a time.

When you say you have the same question, do you mean you’re interested in getting more of the top co-dependencies, or you’re interested in a bulk download of co-dependencies like @Dietrich was asking about at the end of the thread?

We don’t currently expose the co-deps that we’ve computed as a downloadable file. Just the top 100 correlates for all the profiles we compute correlations for comes out to ~15 GB and they’re stored in an internal format designed to facilitate querying. I notice you both are ask specifically for co-dependencies, so perhaps that could be something we provide as a download.

Or would API access to export a set of genes be more useful?

This seems like this is a reoccurring request, and so I’m wondering what the mechanism should be for sharing this.

Hello,

I’m one of the people in @paddisonp 's Lab who is trying to download your pre-computed associations. Thanks for the reply. We are mainly interested in a bulk download of the correlations between CRISPR (Avana) Gene Effect scores (CERES) for many genes (if possible, even all genes that were in this CRISPR library) and gene expression data. The top 100 gene expression correlations for each gene would be fine, although if we are able to download more that would be better.

On a related note, would it be possible to also include the p-values in this bulk download? I noticed that the p-value for a given association is shown when one clicks to expand the “Linear regression” section, but the p-values are currently not included in the downloads.

Thanks so much!
Pia

Both would be ok. API access would be more flexible. But also the possibility of a bulk download as you described would be great.

image001.jpg

An API sounds like the only real option. Flat files that big will be pretty difficult to parse and extract information without specialized tools. Additionally, it may be for certain genes you will want to go much deeper than the top 100 genes/features. Could you make an API that exposes:

  1. Identifying all the features which are significantly associated with a given feature. With the ability to change the filtering criteria:
    • P-value filter
    • q-value (corrected p-value) filter
    • feature types included (dependencies, RNA expression, mutation, etc.)
  2. A way to use number 1 but for a bunch of genes using a uniform set of filters to generate a network.

The results would come out as a csv with:
geneDep1,feature1,stat(to give direction of relationship),p-value,q-value

1 Like

Does this sound like a plan @pmontgom and @aviad?

I agree, that sounds like a good goal. What I’m now thinking about is how to get to a path to deliver something like that.

We plan our development roadmap quarterly, and while we can often squeeze in bug fixes and small changes in when needed, adding this as described wouldn’t be small. As a result, this is in the queue to be scheduled, and does not yet have any ETA.

Things that I can think would be easier given where we are today:

  1. We have stored in the database the top 100 correlations between several datasets stored in a database. We could add a simple API which will fetch the top 100 correlates for a set of genes. However, you won’t get more than the top 100 and you won’t get a p-value because it’s simply not already stored in the DB.

  2. I could share the python code that the portal uses to compute large tables of correlations from our published files. This would hopefully allow one to compute any correlations you want, but it would require one to be comfortable with python.

I think the original suggestion sounds like a capability that I think people would find useful, but it’ll require making a few changes to implement, and therefore won’t be something that we’ll be able to get to for a while.

Would one of the two “easy” options I listed be a worthwhile short-term substitute?

Thanks,
Phil

Hi Phil,

Thanks so much for the options. Could you please share the python code? Chris Plaisier said he will be able to use that.

Thanks!
Pia

Sure, I posted some code at https://gist.github.com/pgm/ac2ac4c664ef81200ce49133cc4cee02

This code is modified from the portal’s codebase which will compute the top N correlates for given gene effect matrix downloaded from the portal.

Running the following would compute the top 10 co-dependencies:

python scripts/correlation_from_csv.py Achilles_gene_effect.csv Achilles_gene_effect.csv --limit 10 out.csv

As run above it will correlate the same matrix against itself, but one could also use the same script to correlate expression against gene effects, or any other matrix which is in the format that we provide in the DepMap downloads.

thanks,
Phil

3 Likes

Thank you so much, Phil!

Pia

Hello Phil @pmontgom ,

I am interested in this first “easy” option… have you added an API for this function? I am interested in accessing all genome-wide correlations, but the top 100 correlations for all genes could be a good point to begin if that is the limit for accessible data at this point. Is this possible to access genome-wide?

Thank you
Trevor

This is an old thread, but wanted to share in the latest portal update, we now have a link to download the top 100 correlates from the co-dependency tile:

If you look at the link, it actually points to:
https://depmap.org/portal/gene/BRAF/top_correlations?dataset_name=CERES_Combined

This means that you can replace the gene symbol with which ever gene you’d like to get the table of correlates. (This probably is feasible for fetching a small number of genes. I’d discourage pulling all gene’s via this approach. It will be much faster to simply compute the full correlation matrix using the code I provided if you want all correlations.)

3 Likes

Achilles_gene_effect.csv input file will give which co-dependency table for CRISPR (DepMap 21Q2 Public+Score, CERES) or RNAi (Achilles+DRIVE+Marcotte, DEMETER2). And what would be the input file to generate the other table?

Was precisely looking for this data with API access. Would appreciate to understand if there is any progress in this regards.

The input file can be found in the downloads page

Understanding: link https://depmap.org/portal/gene/BRAF/top_correlations?dataset_name=CERES_Combined would provide the CRISPR (DepMap 21Q2 Public+Score, CERES) table.
Query 1: What will be the URL if the requirement is to calculate the co-dpendency table from RNAi (Achilles+DRIVE+Marcotte, DEMETER2) as well.
Query 2: Is there any possibility to retrieve p-value or q-value for these co-relations scores.

Thanks for your response @vidat.
My objective is to have gene co-dependency tables from both CRISPR (DepMap 21Q2 Public+Score, CERES) and RNAi (Achilles+DRIVE+Marcotte, DEMETER2) for the annotated genes. Hence want to run the script and generate the co-dependency matrix.
Query: what should be the input file for correlation_from_csv.py script to generate the output file that would be give the results for CRISPR (DepMap 21Q2 Public+Score, CERES) and for RNAi (Achilles+DRIVE+Marcotte, DEMETER2)

To get the RNAi (Achilles+DRIVE+Marcotte, DEMETER2) codependencies you can use:

https://depmap.org/portal/gene/BRAF/top_correlations?dataset_name=RNAi_merged

(If you copy the URL from the co-dependency tile in the RNAi section, you should see a similar URL there)

Now, this does not compute the correlations, rather it’s just downloading them from tables we’ve precomputed, and we have not be storing p-values or q-values. So this download gives you the data that portal has already precomputed, but does not do anything additional work.

As far as correlation_from_csv.py goes, you would run it once on the DEMETER2 data ( D2_combined_gene_dep_scores.csv ) for RNAi:

python scripts/correlation_from_csv.py D2_combined_gene_dep_scores.csv D2_combined_gene_dep_scores.csv --limit 100 RNAi.csv 

and run it a second time using the data from the latest DepMap release for the CRISPR data ( CRISPR_gene_effect.csv ):

python scripts/correlation_from_csv.py CRISPR_gene_effect.csv CRISPR_gene_effect.csv --limit 100 CRISPR.csv 
1 Like

Hi, there

Is there any progress with the API for downloading the precomputed Pearson correlation between dependency (CRISPR or RNAi) with other genomic features, such as expression, mutation, methylation… Using the script to compute the correlation from scratch would not be a feasible choice for many of us to explore lots of genes…

Another question is: is it possible to download the plots (plotly in html or regular image format) through API?

Thank you!