1

10x Genomics data are stored in three FASTQ files, besides the standard R1 and R2 reads, there is also a I1 file with some metadata. Sometimes however they are shipped in a single bam/cram file (e.g. the data from the Darwin Tree of Life). How can I convert the BAM or CRAM files to FASTQ?

-- edit --

Apparently, the I3 file is not 10x specific thing. It is an index generated when demultiplexing any Illumina data. See this question for a nice explanation.

Kamil S Jaron
  • 5,542
  • 2
  • 25
  • 59

2 Answers2

1

All can be done with samtools. This is how the Darwin Tree of Life folks convert it:

samtools fastq -@4 -i \
  -1 ${sample}_S${tag}_L%03s_R1_001.fastq.gz \
  -2 ${sample}_S${tag}_L%03s_R2_001.fastq.gz \
  --i1 ${sample}_S${tag}_L%03s_I1_001.fastq.gz \
  --index-format i8 ${lane}.cram
Kamil S Jaron
  • 5,542
  • 2
  • 25
  • 59
  • I don't know if that solution will work with 10xGenomics files. You have to recreate R2 out of the contents of tags; there will be no reads with the read2 binary flag set. – swbarnes2 Jun 09 '21 at 19:41
  • @swbarnes2 I just have seen this thread: https://github.com/darwintreeoflife/darwintreeoflife.data/issues/2#event-4859159249 and tried to make it more googlable (because I do think there will be quite a few people who will be downloading the tDToL data). – Kamil S Jaron Jun 10 '21 at 09:39
  • I saw that thread too. I don't think it's right. samtools fastq -2 will work fine for bams from many applications, but not 10XGenomics. – swbarnes2 Jun 10 '21 at 16:45
  • @kamil, can you post a couple of lines of the BAM file showing how tags for R1 and R2 appear in 10x data? I can probably quite easily add support for this in https://genozip.com/sam2fq.html – Divon Jul 25 '21 at 01:21
1

You'll want to use the 10x provided bamtofastq tool to preserve the indices properly: https://github.com/10XGenomics/bamtofastq

chrisamiller
  • 530
  • 4
  • 6