3

I have this fastq data from GEO:

zcat SRR1658526.fastq.gz | head -n 20
@SRR1658526.1 HWI-ST398:296:C1MP4ACXX:1:1101:1093:2094 length=102
GATCTCTATTACTTTTTGAAGGATTNNNNNNNNNNAANTTTTGAATCANNNNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNNNNNNNNNNNNNNNNNNNNN
+SRR1658526.1 HWI-ST398:296:C1MP4ACXX:1:1101:1093:2094 length=102
@<@FFDFDHHFHHIIIIGHBGGGIG##########10#0:BGDDHHII######################################################
@SRR1658526.2 HWI-ST398:296:C1MP4ACXX:1:1101:1167:2107 length=102
ATAATATTGTAGATATAAATGTTATCTAATCTTATCTGATCAGCTTGCTNNATANNNNNNNNNNNNNNNNNNACNTATGNNNNNNNNNNNNNNNNNNNNNCC
+SRR1658526.2 HWI-ST398:296:C1MP4ACXX:1:1101:1167:2107 length=102
CCCFFFFFHHFHHJJIIIIJIJJHJHJJJJIJJIJJJJIJHIJJJJGII#####################################################

It is supposed to be paired-end sequences. Is the prefix @ and + are the R1 and R2? What's the convention here?

0x90
  • 1,437
  • 9
  • 18

2 Answers2

4

Entries in a fastq file occupy 4 lines each and R1 and R2 are typically in different files. Since that SRA project is 2x51 you seem to have run fastq-dump without the --split-3 option, so both R1 and R2 are merged together. Make your life easier and never use SRA, but instead just download the individual files from ENA.

Devon Ryan
  • 19,602
  • 2
  • 29
  • 60
2

The data shown in the fastq files are paired end data.

In fastQ files @ and + do not correspond to R1 and R2.

The wikipedia page for fastq files describe the following conventions:

A FASTQ file normally uses four lines per sequence.

Line 1 begins with a '@' character and is followed by a sequence identifier and an optional description (like a FASTA title line).
Line 2 is the raw sequence letters.
Line 3 begins with a '+' character and is optionally followed by the same sequence identifier (and any description) again.
Line 4 encodes the quality values for the sequence in Line 2, and must contain the same number of symbols as letters in the sequence.

When using fastq-dump ".1" and ".2" stand for read suffices assuming u used the -I flag.

In this case you can see that R1 (".1") and R2 (".2") are within the same file.

Mack123456
  • 574
  • 3
  • 5