I have a folder with 150 pairs of fastq files from an illumina sequence run. I need to precess them with mixcr, how can I do this in Bash with a single command?
Asked
Active
Viewed 347 times
4
2 Answers
4
Here's how I'd approach this:
for READ1 in /Documents/Data/Input/*R1*
do
READ2=${READ1/R1/R2}
mixcr analyze amplicon --s hs \
--starting-material dna \
--5-end v-primers \
--3-end j-primers \
--adapters no-adapters \
--receptor-type trb \
--only-productive "$READ1" "$READ2"
done
Basically, loop over all the R1 FASTQ files in a directory, and calculate the R2 names as needed.
This is using BASH string manipulation, and assumes that substituting R2 for R1 will correctly calculate the R2 filename from R1. It also assumes that all the FASTQ files are in a single directory.
Scot
- 782
- 4
- 8
-
Thanks. This is essentially the approach that I took. My original problem actually was the result of trying to run a shell command that wasn't compatible with the Linux version of shell. When I used $ batch instead of $ sh to initiate my script, all the recommendations worked. I should have started by providing the bit of information that I am working in Linux (Ubuntu 18.04 to be exact). – Lou_A Jul 01 '19 at 17:13
-
Ah - OK. Yes, you may want to use
bashrather thanshif possible. Glad it's working for you. – Scot Jul 01 '19 at 18:25
2
I would also suggest using GNU parallel as described in our guides :
https://docs.milaboratories.com/mixcr/guides/generic-multiplex-bcr/#one-command-solution
Briefly:
ls /Documents/Data/Input/*_R1* | \
parallel -j 4 \
'mixcr analyze
…
{} \
{=s:R1:R2:=} \
{=s:.*/:/Documents/Data/result/:;s:_R.*::=}'
That will take all fastq pairs from /Documents/Data/Input and put the result output files into /Documents/Data/result folder.
Stanislav Poslavsky
- 105
- 3
Mark Izraelson
- 219
- 1
- 2
The S1_S1_ component of the file name changes with each sample to S2_S2_ S3_S3_ all the way to 75. I'm having a little problem setting the $variable for the file name in the loop.
– Lou_A Jun 28 '19 at 12:11