5

I am an avid user of Snakemake. Recently we have been refreshing our pipelines and I saw that a cluster.json file is no longer the recommended way to store the cluster configuration.

I used to start my pipelines like this:

snakemake --cluster-config cluster.json --cluster "qsub -W umask=002 -d cluster_outputs  -l nodes={cluster.nodes}:ppn={cluster.ppn} -N {rule}" --jobs 245 --use-conda --rerun-incomplete --keep-going --latency-wait 60

With the cluster.json looking like this:

{
    "__default__" :
    {
        "cluster"   : "qsub",
        "jobs"      : 70 ,
        "nodes"     : 1,
        "ppn"       : 1
    },
    "rsem_quantify" :
    {
        "ppn"       : 8
    },
    "star_map_human" :
    {
        "ppn"       : 4
    },
}

Now, I understand that some of this goes in to the new profiles.yaml like this:

cluster: qsub
jobs: 70

But how do I set the ppn values per rule? And how do I set the other configuration options into profile.yaml (like: snakemake "qsub -W umask=002 -d cluster_outputs -l nodes={cluster.nodes}:ppn={cluster.ppn} -N {rule}", etc.)

Any examples are appreciated, https://github.com/snakemake-profiles is not really helping

Freek
  • 563
  • 4
  • 11

1 Answers1

1

It looks like you can just bundle your cluster config in its own YAML file, and reference that from within a profile YAML. Here's a clumsy little example using a fake job submission script: https://github.com/ressy/example-snakemake-profiles


The current documentation makes it sound like cluster configuration support is going away and being replaced by profiles:

While still being possible, cluster configuration has been deprecated by the introduction of Profiles.

...but from those example profiles, that might be misleading; here's an example template for a profile YAML that references a separate cluster-config YAML:

https://github.com/Snakemake-Profiles/generic/blob/master/%7B%7Bcookiecutter.profile_name%7D%7D/config.yaml

...
cluster-config: "$((INSTALDIR))/overwrite_cluster_param.yaml" #abs path
...

That cluster-config YAML has the same default and per-rule entries just your JSON version does:

https://github.com/Snakemake-Profiles/generic/blob/master/%7B%7Bcookiecutter.profile_name%7D%7D/overwrite_cluster_param.yaml

...
# specific parameters for certain rules, which need more time/memory

#run_assembler:
#  queue: bigmem
#  time: 100
#   threads and memory definen in config file

So I think you can just set up a pair of YAML files accordingly including options for your rsem_quantify and star_map_human rules. From my own test it looks like it handles the cluster argument templating just as it did before when using a profile. (If this is correct we should nag them to make the new documentation more explicit though.)

Jesse
  • 947
  • 6
  • 10