20

I can run this normally on the command line in Linux:

$ tar c my_dir | md5sum

But when I try to call it with Python I get an error:

>>> subprocess.Popen(['tar','-c','my_dir','|','md5sum'],shell=True)
<subprocess.Popen object at 0x26c0550>
>>> tar: You must specify one of the `-Acdtrux' or `--test-label'  options
Try `tar --help' or `tar --usage' for more information.
Greg
  • 42,540
  • 88
  • 222
  • 293
  • 1
    Why are you hashing a tar file? Do you mean to be looking for changes in file contents? or verify an externally created tar file? – tMC Sep 06 '11 at 18:27
  • Perhaps see also https://stackoverflow.com/questions/24306205/file-not-found-error-when-launching-a-subprocess-containing-piped-commands – tripleee Feb 07 '21 at 09:22

5 Answers5

20

You have to use subprocess.PIPE, also, to split the command, you should use shlex.split() to prevent strange behaviours in some cases:

from subprocess import Popen, PIPE
from shlex import split
p1 = Popen(split("tar -c mydir"), stdout=PIPE)
p2 = Popen(split("md5sum"), stdin=p1.stdout)

But to make an archive and generate its checksum, you should use Python built-in modules tarfile and hashlib instead of calling shell commands.

mdeous
  • 16,601
  • 7
  • 55
  • 58
  • tarfile, and hashlib would be preferable. But how do I hash a tarfile object? – Greg Sep 06 '11 at 17:55
  • 1
    @Greg don't hash the tarfile object, open the resulting file like any other file using `open()` and then hash its content. – mdeous Sep 06 '11 at 18:02
  • Makes sense. That works but I get a different hash value than from the original command. Is that to be expected? – Greg Sep 06 '11 at 18:17
  • 1
    @Greg, this should do the same exact thing as `tar -c mydir | md5sum`. Perhaps you could start a new question, including an interactive terminal session where you run this command, start Python, and run the Python commands, displaying the output. – Mike Graham Sep 06 '11 at 18:49
8

Ok, I'm not sure why but this seems to work:

subprocess.call("tar c my_dir | md5sum",shell=True)

Anyone know why the original code doesn't work?

Greg
  • 42,540
  • 88
  • 222
  • 293
  • 2
    the pipe | is a character the shell understands to connect command inputs and outputs together. It is not an argument that tar understands, nor a command. You're trying to execute everything as arguments to the tar command, unless you create a subshell. – tMC Sep 06 '11 at 17:51
  • 3
    The works because the entire command is passed to the *shell* and the *shell* understands the `|`. Popen calls the process and passes in the arguments directly. For Popen this is controlled with `shell=` and passing a string (not a list), IIRC. –  Sep 06 '11 at 17:52
4

What you actually want is to run a shell subprocess with the shell command as a parameter:

>>> subprocess.Popen(['sh', '-c', 'echo hi | md5sum'], stdout=subprocess.PIPE).communicate()
('764efa883dda1e11db47671c4a3bbd9e  -\n', None)
Dag
  • 684
  • 5
  • 10
  • Incidentally, `shell=True` [does something similar.](https://github.com/python/cpython/blob/4d2957c1b9a915f76da418e89bf9b5add141ca3e/Lib/subprocess.py#L1708) – pianoJames Sep 24 '21 at 19:20
1

i would try your on python v3.8.10 :

import subprocess
proc1 = subprocess.run(['tar c my_dir'], stdout=subprocess.PIPE, shell=True)
proc2 = subprocess.run(['md5sum'], input=proc1.stdout, stdout=subprocess.PIPE, shell=True)
print(proc2.stdout.decode())

key points (like outline in my solution on related https://stackoverflow.com/a/68323133/12361522):

  • subprocess.run()
  • no splits of bash command and parameters, i.e. ['tar c my_dir']or ["tar c my_dir"]
  • stdout=subprocess.PIPE for all processes
  • input=proc1.stdout chain of output of previous one into input of the next one
  • enable shell shell=True
1
>>> from subprocess import Popen,PIPE
>>> import hashlib
>>> proc = Popen(['tar','-c','/etc/hosts'], stdout=PIPE)
>>> stdout, stderr = proc.communicate()
>>> hashlib.md5(stdout).hexdigest()
'a13061c76e2c9366282412f455460889'
>>> 
tMC
  • 16,865
  • 10
  • 59
  • 96