4

I have read this question (Running Snakemake in one single conda env) but still have some doubt.

I am using the '--conda-prefix /some/dir' option and I have a rule in my snakemake file with the directive:

rule xxx:
  ...
  conda:    
    "envs/some.yaml"
  ...

The first call of that rule will create the conda environment according to the yaml file in /some/dir/envs with some random name (in my case 'bf55a79e'). If I run the workflow in a different directory, the same conda environment (/some/dir/envs/bf55a79e) gets activated directly instead of being built again. So this is what I expected.

My question is:

  • is there some way of skipping the building and specifying that rule xxx should use an already existing conda environment somewhere?
  • does anyone know where the mapping between rule and directory gets stored?
guibar
  • 143
  • 3

1 Answers1

3
  1. If the environment already exists it will simply be used. You cannot, however, tell it to use an environment with a normal name (e.g., "my_env").
  2. I assume you mean, "where the heck does the random looking conda environment name come from?!?", which is a very good question. The answer to that is that it's a hash of the --conda-prefix setting and the contents of the environment yaml file you specify. The actual code for doing this (I'm "stealing" it from snakePipes) is:
import hashlib
md5hash = hashlib.md5()
md5hash.update("what you gave to --conda-prefix".encode())
f = open("your environment yaml file", 'rb')
md5hash.update(f.read())
f.close()
h = md5hash.hexdigest()

h is then the directory name. There are actually two lengths of those, the shorter 8 character version that snakeMake uses by default or the full length version that the code above will produce. snakeMake will actually use either of these. If you're wondering, no I don't think any of this is documented anywhere, you have to dig through the snakeMake code to find it.

Devon Ryan
  • 19,602
  • 2
  • 29
  • 60
  • Ok, that answers the core of my question! So if I change the way I refer to the conda-prefix (via symlinks or relative/absolute paths), the environment will get rebuilt. Which is perfectly acceptable as long as you understand what is happening. – guibar Sep 11 '19 at 09:53
  • Exactly, I'll suggest to Johannes that he document this somewhere :) – Devon Ryan Sep 11 '19 at 11:37
  • 1
    and just to be absolutely clear, if you don't specify a conda-prefix, the hash uses the yaml file content only, right? – guibar Sep 11 '19 at 14:35
  • I actually don't know that off-hand, but I presume so. The conda environments end up under .snakemake in the output directory, so it's possible that it uses that path somewhere. – Devon Ryan Sep 11 '19 at 14:37
  • upon the first workflow run I had the conda directives in all rules, so that the virtual env was created (relative path to Snakefile is ../outSnakemake_ngs_bngs05b/.snakemake/conda/a87f68b4). In order to suppress the creation of another (identical) environment I commented all the conda directives and called snakemake -p -s Snakefile_v4_ngs_bngs05b --cluster "qsub -q researshq" -j 5 --use-conda --conda-prefix ../outSnakemake_ngs_bngs05b/.snakemake/conda/a87f68b4, though the jobs failed. Is this the expected behaviour i.e. can I suppress the conda directives by specifying the env? – BCArg Apr 08 '20 at 09:57
  • If the --conda-prefix doesn't change then an identical env won't be made (it'd have the same hash anyway). – Devon Ryan Apr 08 '20 at 11:19