3

I am working on ONT data. Initially, I have worked on data from 1D using Minimap2 aligner. I came across 1D^2 from ONT website. I was wondering how different is the data from two techniques? What is the difference between 1D and 2D? and do we get 2 fastq files (one per strand) for each sample?

TIA

I would greatly appreciate any help regarding my question.

KVC_bioinfo
  • 91
  • 1
  • 4

1 Answers1

2

1D² data that is called by Albacore generates two overlapping sets of files. Here's what Devon O'Rourke mentioned on the ONT community forums and on Twitter:

Following Albacore basecalls on a 1d2 library I get two sets of .fq files, summary stats, etc.: one for the 1D basecalling script, and one for the 1D2 script. I'd like to use the reads generated from these scripts in a genome assembly.

After a little bit of grep searching, it seems like there are overlapping reads among the 1D and 1D2 directories - namely, those 1d reads that generated 1d2 reads!

Link here, for those who have ONT community access.

I've recommended generating a list of the 1D^2 reads, then excluding those reads from the 1D fastq files. I have a couple of scripts that can help for this:

$ cat 1d2_reads.fq | ~/scripts/fastx-length.pl | perl -lane '$F[1] =~ s/(.{36})/$1\n/; print $F[1];' > readNames.txt
$ cat 1d_reads.fq | ~/scripts/fastx-fetch.pl -v -i readNames.txt > filtered_1d_reads.fq

More information on this can also be found in this StackExchange answer.

gringer
  • 14,012
  • 5
  • 23
  • 79