This is the typical kind of job for snakemake.
Assuming your have one file per replicate named for instance T9/Infected/Rep1/Rep1.fastq.gz, you can prepare a file that you call Snakefile with the following content:
timepoints = list(range(10))
conditions = ["control", "infected"]
replicates = [1, 2, 3]
rule all:
input:
expand(
"T{time}/{cond}/Rep{rep}/Rep{rep}_fastqc.html",
time=timepoints,
cond=conditions,
rep=replicates)
rule do_fastqc:
input:
fastq = "T{time}/{cond}/Rep{rep}/Rep{rep}.fastq.gz"
output:
html = "T{time}/{cond}/Rep{rep}/Rep{rep}_fastqc.html"
shell:
"""
fastqc {input.fastq}
"""
Put this file in the directory that contains the T* directories and run snakemake from there.
The top all rule explains which files you want. The do_fastqc rule explains how to make one fastqc report from one fastq.gz file.
With a bit more work, this can be used to submit jobs to a computing cluster. Snakemake has some tools for this.
If you don't know the exact names of the fastq files but they all follow the same pattern, you will need to use the glob python module and do a little bit of programming to determine the possible values for rep, cond and time. The "snakefile" can contain any python code you want.
If there are no regular pattern in the file names, fix this issue first ;)
find . -name '*.fastq.gz' | awk '{printf("fastqc \"%s\"\n", $0)}'but that still fails in the (even more unlikely) case where a file name contains a newline. This should work for anything (but requires a version offindwith-printf, like GNUfind):find . -name '*.fastq.gz' -printf '"%p"\n' | parallel -j 25 --verbose. – terdon Oct 26 '18 at 17:18