Read ids between SRA fastq and AWS open data bam

geneticist · February 15, 2026, 12:59pm

Hello,

I’ve got a list of read ids corresponding to a CCLE line RNA-seq fastq file uploaded to SRA. I would like to extract the alignments for these reads from available files that I’ve found on AWS open data. However, I see that the read ids in these bam files are completely different, e.g. “C1EHHACXX130117”, whereas SRA read ids look like “SRR8616111.130117”. Have these alignments been generated from the same original fastq files? If so, is there any way to find these reads in bam files? Thank you in advance!

Sergey

Devin_McCabe · February 17, 2026, 3:44pm

Hi, Sergey. Could you tell me which sample ID/file (in both data sets) you’re looking at?

geneticist · February 27, 2026, 12:43pm

To be specific, I’m looking for NCIH2228 cell line data. SRA id is SRR8616111, so fastq is here ( SRA Archive: NCBI ) and read ids there are regular SRA ones - SRR8616111.8853031. The RNA-Seq alignment file for this cell line that I’ve found on AWS open data is s3://depmap-omics-ccle/data/rna/bam/G28616.NCI-H2228.1.bam .

Devin_McCabe · March 3, 2026, 8:45pm

According to this wiki page, since at least 2018, SRA has discarded and replaced the read names as part of their data loading process when BAMs are submitted. Even if they’re providing FASTA/FASTQ for a particular succession number, I’m pretty confident that it was a BAM that we originally submitted to them as part of the CCLE study. So for any raw data under the SRP186687 study, we can’t rely on having consistent read IDs.

geneticist · March 4, 2026, 1:03pm

Thanks a lot, Devin, the matters became much clearer to me. Do you know if by any chance the original fastq files used for BAM generation can still be found somewhere?

Devin_McCabe · March 4, 2026, 3:07pm

The data actually originates from Broad Institute Genomics Platform as BAM/CRAM files and we keep them in those formats to save space. With few exceptions (a very small number of older RNA seq BAMs) the BAM/CRAMs retain unmapped reads, so they should be as good as FASTQs and are easily converted to that format with samtools.

Topic		Replies	Views
Find cell line id of raw fastq files Q&A	1	581	September 22, 2021
RRBS Data access Q&A omics , data	2	361	April 27, 2022
Bam files from RNA seq data ? Q&A	1	306	February 16, 2024
CCLE read bed file Q&A data	4	572	April 27, 2021
BAM files from RNA seq Q&A	1	90	June 13, 2025

Read ids between SRA fastq and AWS open data bam

Related topics