RRBS Data access

I am trying to access the aligned RRBS data from the Next-generation characterization of the CCLE. In particular, I would like to run Bismark on aligned SAM/BAM files without rerunning alignment on fastq files. I have accessed the data from the SRA accession number PRJNA523380. Am I using the right files from SRA?

Bismark_methylation_extractor fails to run, output in comment below. Thanks!

I have downloaded with samtools, e.g.:
sam-dump SRR8633856

This produces a bam file, SRR8633878.bam. It is my understanding that this is the aligned file from Bismark 0.7.12 per the manuscript.

$ bismark_methylation_extractor …/SRR8633878.bam then fails (output below).

Output from samtools flagstat:

31856223 + 6315459 in total (QC-passed reads + QC-failed reads)

31856223 + 6315459 primary

0 + 0 secondary

0 + 0 supplementary

0 + 0 duplicates

0 + 0 primary duplicates

31856223 + 6315459 mapped (100.00% : 100.00%)

31856223 + 6315459 primary mapped (100.00% : 100.00%)

0 + 0 paired in sequencing

0 + 0 read1

0 + 0 read2

0 + 0 properly paired (N/A : N/A)

0 + 0 with itself and mate mapped

0 + 0 singletons (N/A : N/A)

0 + 0 with mate mapped to a different chr

0 + 0 with mate mapped to a different chr (mapQ>=5)

However, when I try to run bismark_methylation_extractor, I get the following output:

*** Bismark methylation extractor version v0.22.1 ***

Trying to determine the type of mapping from the SAM header line of file …/SRR8633878.bam
Treating file(s) as single-end data (as extracted from @PG line)

Summarising Bismark methylation extractor parameters:

Bismark single-end SAM format specified (default)
Number of cores to be used: 1
Output will be written to the current directory

Checking file >>…/SRR8633878.bam<< for signs of file truncation…

Writing result file containing methylation information for C in CpG context from the original top strand to CpG_OT_SRR8633878.txt

Now reading in Bismark result file …/SRR8633878.bam
skipping SAM header line: @HD VN:1.4 GO:none SO:coordinate
skipping SAM header line: @SQ SN:1 LN:249250621 UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly19.fasta AS:GRCh37 M5:1b22b98cdeb4a9304cb5d48026a85128 SP:Homo Sapiens
skipping SAM header line: @SQ SN:2 LN:243199373 UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly19.fasta AS:GRCh37 M5:a0d9851da00400dec1098a9255ac712e SP:Homo Sapiens
skipping SAM header line: @SQ SN:3 LN:198022430 UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly19.fasta AS:GRCh37 M5:fdfd811849cc2fadebc929bb925902e5 SP:Homo Sapiens
skipping SAM header line: @SQ SN:4 LN:191154276 UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly19.fasta AS:GRCh37 M5:23dccd106897542ad87d2765d28a19a1 SP:Homo Sapiens
skipping SAM header line: @SQ SN:5 LN:180915260 UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly19.fasta AS:GRCh37 M5:0740173db9ffd264d728f32784845cd7 SP:Homo Sapiens
skipping SAM header line: @SQ SN:6 LN:171115067 UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly19.fasta AS:GRCh37 M5:1d3a93a248d92a729ee764823acbbc6b SP:Homo Sapiens
skipping SAM header line: @SQ SN:7 LN:159138663 UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly19.fasta AS:GRCh37 M5:618366e953d6aaad97dbe4777c29375e SP:Homo Sapiens

skipping SAM header line: @PG ID:maq.3 PN:maq VN:0.7.1-9 CL:maq map -D -s 0 -M c -e 100 D25LYACXX.5.3.out.aln.map Homo_sapiens_assembly19.restrict.MspI.28-340.bfa D25LYACXX.5.3.1.bfq
skipping SAM header line: @PG ID:samtools PN:samtools PP:maq.3 VN:1.14 CL:samtools view -bS -
skipping SAM header line: @PG ID:samtools.1 PN:samtools PP:samtools VN:1.14 CL:/cluster/tools/software/centos7/samtools/1.14/bin/samtools view -h …/SRR8633878.bam
skipping SAM header line: @CO aggregation_version=1
Use of uninitialized value $meth_call in split at /cluster/tools/software/centos7/bismark/0.22.1/bismark_methylation_extractor line 4306, line 98.
Use of uninitialized value $strand in string eq at /cluster/tools/software/centos7/bismark/0.22.1/bismark_methylation_extractor line 4376, line 98.
Use of uninitialized value $strand in string eq at /cluster/tools/software/centos7/bismark/0.22.1/bismark_methylation_extractor line 4826, line 98.
Use of uninitialized value $strand in string eq at /cluster/tools/software/centos7/bismark/0.22.1/bismark_methylation_extractor line 4874, line 98.
Use of uninitialized value $strand in concatenation (.) or string at /cluster/tools/software/centos7/bismark/0.22.1/bismark_methylation_extractor line 4924, line 98.
The strand information was neither + nor -:

Hi Ian,

The CCLE paper says that the bam files from SRA accession number PRJNA523380 are raw data, so I’m not sure that they have been processed by bismark. I have regenerated the methylation data recently by converting the bam files into fastqs and rerunning bismark on those fastqs. There may be a way to do it without converting the bams into fastqs (which I know you want to do), but I don’t know much about how to do that at the moment.

Best,
David