I was given a list of target regions in BED and many exome alignments in BAM. I was asked to extract on-target alignments from these BAMs to save disk space. I know I can use bedtools to extract sub-BAMs. I am thinking to write a script to apply bedtools to all BAMs at my hand, but I speculate there may be some more convenient command lines to achieve my goal. How would you do? Thanks.
Asked
Active
Viewed 507 times
1 Answers
2
It sounds like bedtools intersect will work for you:
bedtools intersect -wa -a <alignment.bam> -b <region.bed> > <intersect_alignment.bam>
When a BAM file is used for the A file, the alignment is retained if overlaps exist, and exlcuded if an overlap cannot be found. If multiple overlaps exist, they are not reported, as we are only testing for one or more overlaps.
It's easy enough to wrap this up into a for loop for batch processing (line breaks can be removed if desired):
for alnBam in alignmentFiles*.bam;
do echo ${alnBam};
bedtools intersect -wa -a ${alnBam} -b <region.bed> > intersect_${alnBam};
done
There are a lot of other options which should cover most related uses. For more details, just run "bedtools intersect" (with no arguments).
gringer
- 14,012
- 5
- 23
- 79
${x}should be${alnBam}. – Matt Bashton Jun 04 '17 at 11:29