I am new to using the subprocess module and I was wondering how to handle this situation. I'm working with the GCTA module(https://yanglab.westlake.edu.cn/software/gcta/#MakingaGRM) using --make-grm-part. After making the x parts needed I am supposed to concatenate all the parts into one file. See the example provided on the modules page
gcta64 --bfile test --make-grm-part 3 1 --thread-num 5 --out test
gcta64 --bfile test --make-grm-part 3 2 --thread-num 5 --out test
gcta64 --bfile test --make-grm-part 3 3 --thread-num 5 --out test
# Merge all the parts together (Linux, Mac)
cat test.part_3_*.grm.id > test.grm.id
cat test.part_3_*.grm.bin > test.grm.bin
cat test.part_3_*.grm.N.bin > test.grm.N.bin
My issue arises in that I am incorporating this into a pipeline. I thought I had it working but I'm not actually outputting any files into my working directory.
This is a simplified version of code I currently have
gcta_cmd1 = []
for i in range(1, part + 1):
gcta_cmd1.append("gcta --bfile {0} --autosome --maf 0.05 --make-grm-part {1} {2} --out {3}".format(geno_path, part, i,grm1))
# merge
grm1_id = "cat {}.part_{}_*.grm.id > {}.grm.id".format(grm1, part, grm1)
grm1_bin = "cat {}.part_{}_*.grm.bin > {}.grm.bin".format(grm1, part, grm1)
grm1_nbin = "cat {}.part_{}_*.grm.N.bin > {}.grm.N.bin".format(grm1, part, grm1)
cmds = [gcta_cmd1, grm1_id, grm1_bin, grm1_nbin]
cmd_len = len(cmds)
for i in range(0, cmd_len):
if isinstance(cmds[i], str):
shell_do(cmds[i], make_part=True)
else:
for sub in cmds[i]:
shell_do(sub)
and the shell_do function is
def shell_do(command, log=False, return_log=False, make_part = False):
print(f'Executing: {(" ").join(command.split())}', file=sys.stderr)
if make_part == False:
res = subprocess.run(command.split(), stdout=subprocess.PIPE)
else:
res = subprocess.run(command, shell = True, stdout=subprocess.PIPE)
if log:
print(res.stdout.decode('utf-8'))
if return_log:
return(res.stdout.decode('utf-8'))
From my understanding, when using shell=True the command should be run as one input and not split up as is done when not using shell=False. When I have run it I get an error that says
FileNotFoundError: [Errno 2] No such file or directory: '/FILEPATH/UKBB_raw_data_no_cousins_plusX_callrate_sex_ancestry_EUR_related_unrelated_grm.grm.id'.
I'm thinking the issue is with running the shell command with the wildcard.
I appreciate any input! Thanks