0

I am new to using the subprocess module and I was wondering how to handle this situation. I'm working with the GCTA module(https://yanglab.westlake.edu.cn/software/gcta/#MakingaGRM) using --make-grm-part. After making the x parts needed I am supposed to concatenate all the parts into one file. See the example provided on the modules page

gcta64 --bfile test --make-grm-part 3 1 --thread-num 5 --out test
gcta64 --bfile test --make-grm-part 3 2 --thread-num 5 --out test
gcta64 --bfile test --make-grm-part 3 3 --thread-num 5 --out test
# Merge all the parts together (Linux, Mac)
cat test.part_3_*.grm.id > test.grm.id
cat test.part_3_*.grm.bin > test.grm.bin
cat test.part_3_*.grm.N.bin > test.grm.N.bin

My issue arises in that I am incorporating this into a pipeline. I thought I had it working but I'm not actually outputting any files into my working directory.

This is a simplified version of code I currently have

gcta_cmd1 = []
for i in range(1, part + 1):
    gcta_cmd1.append("gcta --bfile {0} --autosome --maf 0.05 --make-grm-part {1} {2} --out {3}".format(geno_path, part, i,grm1))                                                                                                                                                                                             

# merge
grm1_id = "cat {}.part_{}_*.grm.id > {}.grm.id".format(grm1, part, grm1)
grm1_bin = "cat {}.part_{}_*.grm.bin > {}.grm.bin".format(grm1, part, grm1)
grm1_nbin = "cat {}.part_{}_*.grm.N.bin > {}.grm.N.bin".format(grm1, part, grm1)

cmds = [gcta_cmd1, grm1_id, grm1_bin, grm1_nbin]
        cmd_len = len(cmds)

        for i in range(0, cmd_len):
            if isinstance(cmds[i], str):
                shell_do(cmds[i], make_part=True)
            else:
                for sub in cmds[i]:
                    shell_do(sub)

and the shell_do function is

def shell_do(command, log=False, return_log=False, make_part = False):
    print(f'Executing: {(" ").join(command.split())}', file=sys.stderr)

    if make_part == False:
        res = subprocess.run(command.split(), stdout=subprocess.PIPE)
    else:
        res = subprocess.run(command, shell = True, stdout=subprocess.PIPE)

    if log:
        print(res.stdout.decode('utf-8'))
    if return_log:
        return(res.stdout.decode('utf-8'))

From my understanding, when using shell=True the command should be run as one input and not split up as is done when not using shell=False. When I have run it I get an error that says

FileNotFoundError: [Errno 2] No such file or directory: '/FILEPATH/UKBB_raw_data_no_cousins_plusX_callrate_sex_ancestry_EUR_related_unrelated_grm.grm.id'.

I'm thinking the issue is with running the shell command with the wildcard.

I appreciate any input! Thanks

avocx4
  • 1

0 Answers0