Users have asked us if they can access raw sequencing data (bam/fastq files) of our cell lines.
A subset of our raw sequencing data is available online as part of this CCLE publication. The data can be accessed under the accession number PRJNA523380 on the SRA website. Furthermore most of these bam files are available on GDC legacy portal. You should be able to download those and then reconvert to fastq if this is what you are looking for.
For the remaining cell lines, at the moment we are not able to share raw sequencing data due to regulations related to patient privacy and MTAs. We are currently working on establishing a protocol for sharing such data, and hope to have data available in appropriate repositories in the near future.
Sorry to resurrect an old thread - I was just wondering if there was an update on a protocol for sharing the raw sequencing data (or even bam files) for the remaining cell lines? Thank you!
Being able to share BAM files of new lines requires very complex legal agreements. This is because the data becomes identifiable (e.g. you could potentially trace relatives of the donor).
However we have recently left another way to access our data. It is very early stage and should not contain any more lines but the data might be a bit more up to date and more complete.
You would need to use google cloud and Terra though:
Once we are able to share more raw sequencing data, we will make an announcement.
Hope it helps!
Thank you for the information, that is very helpful! Indeed this approach gave me around 150 more cell lines than I had in the earlier CCLE analysis.
Interestingly, I found just one single cell line that I had in the previous (dbGap-based, ~2019) release that wasn’t in the updated sample sheet: HCC1588 (ACH-001078). On the DepMap page it does not list RNA-seq for this cell line, but in our database we do have the raw reads from this cell line.
Not a huge discrepancy, but just thought I’d flag it - let me know if a separate thread would be more helpful for that.