4

I am trying to do mapping with multiple sequences against a reference genome. I tried the following code: The outputfile is .fastq.sam. I need only .sam. How do I get it?

for i in test/*.fastq; 
do
 ./minimap2 -ax map-ont reference.fna $i > $i.sam;
 done
hgf
  • 43
  • 3

3 Answers3

5

The command is fine, it is just a suffix. If you want to trim the suffix then use ${i%.fastq}.sam for the output, or even better convert the output from minimap2 to bam directly as this saves space:

./minimap2 -ax map-ont reference.fna "${i}" | samtools view -o "${i%.fastq}".bam
ATpoint
  • 1,207
  • 2
  • 10
1

Honestly, I wouldn't do it with a one-liner like this. Make a proper bash script, so you can spend a couple of lines to get the naming just right. Then you can use the bash script over and over again. Easier to document too if the command line used is in a file, not just in terminal history.

Also, yes, piping to convert to bam is almost certainly what you want to do. But piping into a sort commmand is better. And I think samtools sort will output a bam by itself if you ask it to.

swbarnes2
  • 1,925
  • 7
  • 6
1

I would recommend

ls test/*.fastq|sed s,.fastq$,,|xargs -i echo ./minimap2 -ax map-ont reference.fna {}.fastq \> {}.sam|sh

Replace trailing "sh" with GNU parallel for concurrent jobs on the same node; replace with asub for submitting array jobs to Slurm/LSF/SGE.

Once you get used to this, you can do a variety of tasks quickly, like renaming files in batch, gzipping all files in parallel, format conversion in batch, aligning 100 files on a cluster, etc.

user172818
  • 6,515
  • 2
  • 13
  • 29