Mixcr batch processing

Question

I have a folder with 150 pairs of fastq files from an illumina sequence run. I need to precess them with mixcr, how can I do this in Bash with a single command?

What is your command for processing a single pair of FASTQ files? — Scot, Jun 27 '19 at 01:54
Here's an example of the command line that I'm using to run Mixcr for a pair of samples: mixcr analyze amplicon --s hs --staring-material dna --5-end v-primers --3-end j-primers --adapters no-adapters --receptor-type trb --only-productive /Documents/Data/Input/S1_S1_L001_R1_001.fastq.gz /Documents/Data/Input/S1_S1_L001_R2_001.fastq.gz
The S1_S1_ component of the file name changes with each sample to S2_S2_ S3_S3_ all the way to 75. I'm having a little problem setting the $variable for the file name in the loop. — Lou_A, Jun 28 '19 at 12:11
One solution I thought is to try to iterate through the two types of files (R1 and R2) with a FOR loop, but I can't figure out how to do this. I don't think a nested loop will do the trick. — Lou_A, Jun 28 '19 at 13:19
Create a text file with two columns, one for R1 and the other for R2. Loop through this file. Shell loops are not a good thing to begin with, and nested loops are almost never required. — Ram RS, Jun 28 '19 at 16:21

Scot · Accepted Answer · 2019-06-29T01:07:55.447

4

Here's how I'd approach this:

for READ1 in /Documents/Data/Input/*R1*
do
    READ2=${READ1/R1/R2}
    mixcr analyze amplicon --s hs \
        --starting-material dna \
        --5-end v-primers \
        --3-end j-primers \
        --adapters no-adapters \
        --receptor-type trb \
        --only-productive "$READ1" "$READ2"
done

Basically, loop over all the R1 FASTQ files in a directory, and calculate the R2 names as needed.

This is using BASH string manipulation, and assumes that substituting R2 for R1 will correctly calculate the R2 filename from R1. It also assumes that all the FASTQ files are in a single directory.

edited Jun 29 '19 at 01:07

answered Jun 28 '19 at 18:12

Scot

782
4
8

Thanks. This is essentially the approach that I took. My original problem actually was the result of trying to run a shell command that wasn't compatible with the Linux version of shell. When I used $ batch instead of $ sh to initiate my script, all the recommendations worked. I should have started by providing the bit of information that I am working in Linux (Ubuntu 18.04 to be exact). – Lou_A Jul 01 '19 at 17:13
Ah - OK. Yes, you may want to use bash rather than sh if possible. Glad it's working for you. – Scot Jul 01 '19 at 18:25

score 2 · Answer 2 · edited Jan 26 '23 at 16:27

I would also suggest using GNU parallel as described in our guides :

https://docs.milaboratories.com/mixcr/guides/generic-multiplex-bcr/#one-command-solution

Briefly:

ls /Documents/Data/Input/*_R1* | \
    parallel -j 4 \
    'mixcr analyze 
    …
    {} \
    {=s:R1:R2:=} \             
{=s:.*/:/Documents/Data/result/:;s:_R.*::=}'

That will take all fastq pairs from /Documents/Data/Input and put the result output files into /Documents/Data/result folder.

Mixcr batch processing

2 Answers2