Mix globbing and wildcards when specifying rule input

Question

Consider the following scenario.

├── sample-alice
│   ├── sequence_1.fastq
│   ├── sequence_2.fastq
│   ├── ...
│   └── sequence_n.fastq
└── sample-bob
    ├── sequence_1.fastq
    ├── sequence_2.fastq
    ├── ...
    └── sequence_m.fastq

I'm trying to write a single rule in my Snakefile that will preprocess each sample generically. The rule will execute a single command that takes all of Fastq files associated with the sample simultaneously. The samples do not have the same number of Fastq files.

If it was a single sample, I could do something like this.

rule preprocess:
    input: glob('sample-alice/*.fastq')
    output: 'alice-clean.fastq'
    shell: 'mypreproccmd {input} > {output}'

On the other hand, if there was a single Fastq file for each sample, I could use a wildcard to write a generic rule like this.

rule preprocess:
    input: 'sample-{samp}/sequence_1.fastq'
    output: '{samp}-clean.fastq'
    shell: 'mypreproccmd {input} > {output}'

Is it possible to combine these two approaches to write a rule for handling multi-file samples generically?

I'm not sure if this exactly solves your question, but in one of my workflows I use the directory as input (as specified in the config.yaml). The shell command then uses {input}/*.fastq.gz See some examples here: https://github.com/wdecoster/nano-snakemake/blob/master/rules/align.smk — Wouter De Coster, Mar 05 '19 at 20:29

Daniel Standage · Accepted Answer · 2023-12-07T20:21:35.277

5

A nice solution comes from a colleague of mine, as seen in this Snakemake workflow. The trick is to access the wildcards programmatically using an anonymous (lambda) function. In my example above, it would be implemented as follows.

from glob import glob
rule preprocess:
    input: lambda wildcards: glob('sample-{samp}/*.fastq'.format(samp=wildcards.samp))
    output: '{samp}-clean.fastq'
    shell: 'mypreproccmd {input} > {output}'

edited Dec 07 '23 at 20:21

answered Mar 05 '19 at 20:38

Daniel Standage

5,080
15
50

Note that your Snakefile needs a from glob import glob for this to work – Michael Schubert Feb 10 '23 at 13:13
And it's a bit more succinct using fstrings: glob(f'sample-{wildcards.samp}/*.fastq') – Michael Schubert Feb 10 '23 at 13:33

Mix globbing and wildcards when specifying rule input

1 Answers1