How to specify resources for cluster in snakemake

Question

I am trying to run snakemake version 3.5.4 on my cluster, where jobs are partially defined by a common template build according manual and also defined though rule-specific parameters. But for some reason cluster configuration is not expanding wildcards (and command does). When I run

snakemake data/genome.fa.gz.bwt -p --jobs 10 --cluster-config cluster.json --cluster "bsub -J "{rule}.{wildcards}" -q {cluster.queue} -n {threads} -M {resources.M} -R {cluster.one_host} -o {cluster.output} -e {cluster.error}"

-J "{rule}.{wildcards}" -q {cluster.queue} -n {threads} -M {resources.M} -R {cluster.one_host} expands as expected. But -o {cluster.output} -e {cluster.error} is replaced by literally logs/cluster/{rule}.{wildcards}.out and logs/cluster/{rule}.{wildcards}.err, i.e. there is no expansion in config file (the config file is attached bellow).

I am puzzled since exactly this lines are in the documentation. What am I doing wrong? Why snakemake is not expanding wildcards in the config file?

Here is my cluster.json

{
    "__default__" :
    {
        "queue"    : "bgee",
        "one_host" : "\"span[hosts=1]\"",
        "name"      : "{rule}.{wildcards}",
        "output"    : "logs/cluster/{rule}.{wildcards}.out",
        "error"     : "logs/cluster/{rule}.{wildcards}.err"
    },
}

Here is relevant part of Snakefile :

rule index_reference :
        threads: 1
        resources: M=20000000
        input : "data/{reference}/genome.fa.gz"
        output : "data/{reference}/genome.fa.gz.bwt"
        shell : "scripts/index_genome.sh {input}"

score 5 · Accepted Answer · edited Nov 23 '17 at 09:32

This feature was added in version 3.6.0. Look at release notes.

{wildcards} normally expands to a something that looks vaguely like a dictionary. In other words, something like key1=value1,key2=value2. Whether you really want to use that as a log file name is up to you.

Let's use a very contrived example to demonstrate this. Here is the Snakefile:

rule all:
    input: "input1.human.txt"

rule blah:
    output: "{sample}.{reference}.txt"
    shell: "touch {output}"

I'm using your cluster.json, so with that let's use a contrived cluster submission program:

snakemake --cluster-config cluster.json --cluster "echo 'rule.wildcards: {rule}.{wildcards} cluster: {cluster} -o {cluster.output} -e {cluster.error} -n {cluster.name}' >> log && " --jobs 1

If you run that and look in log, you can see that {rule} is expanded as expected, {wildcards} is expanded to sample=input1,reference=human. The output and error parts from cluster.json are expanded correctly for me. Below are the contents of log:

rule.wildcards: blah.sample=input1,reference=human cluster: one_host="span[hosts=1]",output=logs/cluster/blah.sample=input1,reference=human.out,name=blah.sample=input1,reference=human,queue=bgee,error=logs/cluster/blah.sample=input1,reference=human.err -o logs/cluster/blah.sample=input1,reference=human.out -e logs/cluster/blah.sample=input1,reference=human.err -n blah.sample=input1,reference=human

I'm not a big fan of using {wildcards} in paths unless you're doing a parameter sweep, but to each their own. Note that the above results are from snakemake-4.3.1.

Kamil S Jaron · Answer 2 · 2017-11-23T11:57:16.147

The solutions are either, upgrade snakemake or just take all wildcards out of cluster.json file and write them directly to snakemake execution command :

snakemake -p --jobs 10 --cluster-config cluster.json --cluster "bsub -J "{rule}" -q {cluster.queue} -n {threads} -M {resources.M} -R {cluster.one_host} -o {"logs/cluster/{rule}.{wildcards}.out" -e "logs/cluster/{rule}.{wildcards}.err""

However, I ended up instead of using of json config file in writing a simple bash wrapper snakemake_cluster.sh that remembers the execution command for me in more readable format :

#!/usr/bin/env bash

snakemake $@ -p --jobs 10 --cluster "bsub \
-J {rule} \
-q bgee \
-n {threads} \
-M {resources.M} \
-R \"span[hosts=1]\" \
-o logs/cluster/{rule}.{wildcards}.out \
-e logs/cluster/{rule}.{wildcards}.err"

Edit : All spaces after --cluster command got to be escaped, otherwise it would add arguments to snakemake not to bsub.

Edit 2 : $@ means all arguments, therefore it can be either targets, flags or combinations (targets got to be specified first). So, snakemake_cluster.sh my_file builds my_file, snakemake_cluster.sh --quiet builds default target quietly, and snakemake_cluster.sh my_file --quiet builds my_file quietly.

score 1 · Answer 3 · answered Mar 14 '20 at 14:03

1

As of the most recent snakemake (v5.10.0) the recommended way of defining resources is through a profile.

It seems you are trying to submit a job on an LSF cluster system - there is a snakemake profile for LSF to save you having to do this work yourself.

It reduces your snakemake submission to something like

snakemake --profile "lsf"

answered Mar 14 '20 at 14:03

Michael Hall

663
4
11

Luckily I don't have to deal with lsf anymore, but thanks anyway, I will take a look at this feature. – Kamil S Jaron Mar 15 '20 at 13:38

How to specify resources for cluster in snakemake

3 Answers3