I need to merge sequencing data from different sequencing runs but for the same ChiP-seq library (HiSeq 2000).
Are there any potential advantages or disadvantages when merging files at .fastq or .BAM stage (alignment with Bowite/1.1.2)?
I don’t think it matters. Both are easy to merge (BAM via samtools merge, and (gzipped) FASTQ via cat), and neither method has specific disadvantages, unless your FASTQ files are sorted for some reason (but they generally shouldn’t be).
One advantage of keeping the FASTQ files separate is that it makes it slightly easier to parallelise the mapping step: just run the mapper in parallel on the separate FASTQ files. Although bowtie has an option (-p) for this, throughput from that is slightly worse than running the mapping on split files.
For ChIP-seq it shouldn't really matter. But do be aware that by default, samtools merge retains read group information (the @RG field in the header) from each input file. This could pose a problem for some downstream analyses (e.g. for the GATK HaplotypeCaller) if you want the merged data to be considered as all part of the same sample. You can change this behaviour using the -c option.
Agree with the others that it doesn't really matter. One thing to note though - if you're deduplicating your BAM files (you probably should for ChIP-seq data), make sure that you do this after merging.. :)
@RGinformation for a ChIP-Seq , I mean it is very unlikely that someone would like to do a variants calling with ChIP-Seq. So in any case it it would hardly matter. I would just not mention about the@RGhere since people might get confused. – ivivek_ngs Jun 06 '17 at 14:12samtools mergeis widely used. – Sarah Carl Jun 06 '17 at 20:19