How to resolve in snakemake error : "Target rules may not contain wildcards."

Question

I would like to do easily reproducible analysis using publicly available data from NCBI, so I have chosen a snakemake.

I would like to write a single rule, that would be able to download any genome given a species code name and separated table of species and their NCBI IDs. So I wrote a script scripts/download.sh that takes a species code and download the genome to data/<species_code>/genome.fa.gz. The script internally reads the table tables/download_table.tsv, where are corresponding species code names and NCBI IDs.

So I tried to do a snakemake like this :

species='Cbir Avag Fcan Lcla Dcor Dpac Pdav Psp62 Psp79 Minc1 Minc2 Mjav Mare Mflo Mhap Pant'

rule download:
    input :
        "tables/download_table.tsv"
    output :
        "data/{sp}/genome.fa.gz"
    shell :
        "scripts/download.sh {sp}"

However, snakemake returned an error message I do not really understand :

Target rules may not contain wildcards. Please specify concrete files or a rule without wildcards.`.

Is there a way to write a single rule for downloading all the genomes?

score 9 · Accepted Answer · edited Nov 05 '17 at 11:52

9

The problem is that you need a master rule that requires all of your desired outputs as inputs, in your case it would be :

rule all:
    input:
        expand("data/{sp}/genome.fa.gz", sp=species.split(' '))

You'll also need separate download link inputs for each species. You could make a separate download_table.tsv for each species, but it would probably be easier to make a config file with this information, and add a params keyword to your rule. Something like:

rule download:
   params:
       url=config['locations']['sp']
   output :
       "data/{sp}/genome.fa.gz"
   shell :
       "scripts/download.sh {params.url}"

edited Nov 05 '17 at 11:52

Kamil S Jaron

5,542
2
25
59

answered Nov 03 '17 at 12:49

heathobrien

1,816
7
16

3

I wouldn't erase it if I were you. A lot of people get confused about Snakemake error messages, so its helpful to have some examples on here. – heathobrien Nov 03 '17 at 15:45
1

Then I will at least dramatically edit it. (I try to do it in a way so your answer still fits) – Kamil S Jaron Nov 03 '17 at 15:47
@KamilSJaron I agree the question should be kept, but maybe change the title to reflect the error message, this has nothing to do with NCBI – Chris_Rands Nov 03 '17 at 19:13
Also note that Snakemake has an NCBI remote provider: http://snakemake.readthedocs.io/en/stable/snakefiles/remote_files.html#genbank-ncbi-entrez – Johannes Köster Nov 06 '17 at 10:51
@JohannesKöster I was looking at it. But since I could not make it work without remote I have not even tried with remote. And then the problem I originally had with non-remote solution is very much unrelated to downloading of sequences. It's a good tip though, maybe work an answer? – Kamil S Jaron Nov 10 '17 at 15:53

score 3 · Answer 2 · answered Nov 03 '17 at 18:01

3

Given the Makefile you provided and which someone then deleted from the question, you should probably just add this line (this should be the first rule):

rule all: input: ["data/{}/genome.fa.gz".format(x) for x in species.split()]

This rule just specifies a list of expected output files and corresponds to the following lines from the original Makefile:

GENOMES=$(patsubst %, data/%/genome.fa.gz, $(SPECIES)) all : $(GENOMES)

answered Nov 03 '17 at 18:01

Karel Břinda

1,909
9
19

I have deleted Make equivalent since was not really related to the problem. – Kamil S Jaron Nov 05 '17 at 11:50

How to resolve in snakemake error : "Target rules may not contain wildcards."

2 Answers2