4

I am trying to remove samples from a chromosome vcf file. I wrote a function that takes chromosome number and a list of samples to remove. When I try to run bcftools using subprocess module it only runs bcftools, as if I was running subprocess.run([bcftools], shell=True, env=my_env) instead of:

subprocess.run(['bcftools', 'view', in_vcf, '--samples', drop_samples, '-Oz', '>', out_vcf], shell=True, env=my_env)

The function:

def remove_individuals(chr_samples):
    my_env = os.environ.copy()
    my_env['PATH'] = f'/home/aydar/bin/bcftools_1.11/bin/:{my_env["PATH"]}'
    def remove_indiv(chr, indiv):
        in_vcf = f'data.filtered-updated-{chr}.vcf.gz'
        out_vcf = f'data.filtered-updated-{chr}-faulty-samp-rem.vcf.gz'
        drop_samples = "\'^"+",".join(indiv)+"\'"
        proc = subprocess.run(
            ['bcftools', 'view', in_vcf, 
             '--samples', drop_samples, '-Oz', '>', out_vcf],
            shell=True,
            env=my_env
        )
        print(proc.args)
    chr_samples.apply(lambda x: remove_indiv(x.index[0], list(x.values[0])))

function input:

13,[2500000075_2500000075,2000000171_2000000171,1800000179_1800000179,330134048_330134048]

When I run the same command in the terminal everything works.

The command I test with in terminal:

bcftools view data.filtered-updated-13.vcf.gz --samples '^2500000075_2500000075,2000000171_2000000171,1800000179_1800000179,330134048_330134048' -Oz > out_test_vcf.gz
Ram RS
  • 2,297
  • 10
  • 29
YKY
  • 171
  • 5
  • 1
    Thanks for your question and answer. Using popen is the standard approach here than using run. Anyway could you very kindly mark your answer as "accepted"? This help site stats and prevents the question from being re-introduced as unresolved later. – M__ Aug 27 '23 at 22:22

1 Answers1

3

Never mind, I figured it out. As it turned out I had to write to file like so and extra quotes ("\'") around samples were not needed. Also shell=True does not work well with lists, only strings:

def remove_individuals(chr_samples):
    my_env = os.environ.copy()
    my_env['PATH'] = f'/home/aydar/bin/bcftools_1.11/bin/:{my_env["PATH"]}'
    def remove_indiv(chr, indiv):
        in_vcf = f'data.filtered-updated-{chr}.vcf.gz'
        out_vcf = f'data.filtered-updated-{chr}-faulty-samp-rem.vcf.gz'
        drop_samples = "^"+",".join(indiv)
        out = subprocess.run(
            ['bcftools', 'view', in_vcf, 
             '--samples', drop_samples, '-Oz'],
            env=my_env,
            capture_output=True
        )
        with open(out_vcf, 'wb') as f:
            f.write(out.stdout)
    chr_samples.apply(lambda x: remove_indiv(x.index[0], list(x.values[0])))

Remember to use 'wb' if you like me and use Oz output option, because it returns bgzipped file.

Ram RS
  • 2,297
  • 10
  • 29
YKY
  • 171
  • 5
  • thanks for accepting your answer and please feel free to ask a question any time here. – M__ Sep 01 '23 at 12:59