How can I create a .tar.gz file with compression in Python?
8 Answers
To build a .tar.gz (aka .tgz) for an entire directory tree:
import tarfile
import os.path
def make_tarfile(output_filename, source_dir):
with tarfile.open(output_filename, "w:gz") as tar:
tar.add(source_dir, arcname=os.path.basename(source_dir))
This will create a gzipped tar archive containing a single top-level folder with the same name and contents as source_dir.
- 2,661
- 3
- 20
- 45
- 14,957
- 7
- 39
- 38
-
44Just as a note to readers, if you leave out `arcname=os.path.basename(source_dir)` then it'll give you the entire path structure of `source_dir` in the tar file (in most situations, that's probably inconvenient). – Brōtsyorfuzthrāx Jan 30 '16 at 06:50
-
28A second note; using `arcname=os.path.basename(source_dir)` still means that the archive contains a folder which contains the contents of `source_dir`. If you want the root of the archive to contain the contents themselves, and not contents within a folder, use `arcname=os.path.sep` instead. – Jonathan H Mar 22 '17 at 01:05
-
6@Sheljohn unfortunately, this is not fully correct, because if one uses `os.path.sep`, then the archive will contain service "." or "/" folder which is not a problem usually, but sometimes it can be an issue if you later process this archive programmatically. It seems the only real clean way is to do `os.walk` and add files individually – The Godfather Feb 01 '19 at 10:10
-
6To get rid of all the directory structure, just use `arcname='.'`. No need to use `os.walk`. – edthrn Aug 16 '19 at 20:59
-
If I generate this tarfile on Linux, will this open successfully on other platforms say, Windows & Mac? – jrp Jul 06 '21 at 16:48
import tarfile
tar = tarfile.open("sample.tar.gz", "w:gz")
for name in ["file1", "file2", "file3"]:
tar.add(name)
tar.close()
If you want to create a tar.bz2 compressed file, just replace file extension name with ".tar.bz2" and "w:gz" with "w:bz2".
- 1,268
- 3
- 9
- 9
-
20You should really use `with tarfile.open( ..` in Python, instead of calling `open` and `close` manually. This is also the case when opening regular files. – Jonathan H Mar 13 '17 at 17:15
-
@CNBorn I just want to compress to sample.gz. import tarfile tar = tarfile.open("sample.gz", "r:gz") for name in ["file1", "file2", "file3"]: tar.add(name) tar.close() It's Ok? – thach.nv92 Mar 04 '21 at 02:59
You call tarfile.open with mode='w:gz', meaning "Open for gzip compressed writing."
You'll probably want to end the filename (the name argument to open) with .tar.gz, but that doesn't affect compression abilities.
BTW, you usually get better compression with a mode of 'w:bz2', just like tar can usually compress even better with bzip2 than it can compress with gzip.
- 811,175
- 162
- 1,198
- 1,373
-
7Just a quick note that the filename for bzip2-compressed tarballs should end with ".tar.bz2". – Ignacio Vazquez-Abrams Jan 09 '10 at 05:23
Previous answers advise using the tarfile Python module for creating a .tar.gz file in Python. That's obviously a good and Python-style solution, but it has serious drawback in speed of the archiving. This question mentions that tarfile is approximately two times slower than the tar utility in Linux. According to my experience this estimation is pretty correct.
So for faster archiving you can use the tar command using subprocess module:
subprocess.call(['tar', '-czf', output_filename, file_to_archive])
- 127,031
- 74
- 384
- 431
- 1,119
- 14
- 20
In addition to @Aleksandr Tukallo's answer, you could also obtain the output and error message (if occurs). Compressing a folder using tar is explained pretty well on the following answer.
import traceback
import subprocess
try:
cmd = ['tar', 'czfj', output_filename, file_to_archive]
output = subprocess.check_output(cmd).decode("utf-8").strip()
print(output)
except Exception:
print(f"E: {traceback.format_exc()}")
- 2,299
- 4
- 36
- 73
In this tar.gz file compress in open view directory In solve use os.path.basename(file_directory)
import tarfile
with tarfile.open("save.tar.gz","w:gz") as tar:
for file in ["a.txt","b.log","c.png"]:
tar.add(os.path.basename(file))
its use in tar.gz file compress in directory
- 49
- 3
Minor correction to @THAVASI.T's answer which omits showing the import of the 'tarfile' library, and does not define the 'tar' object which is used in the third line.
import tarfile
with tarfile.open("save.tar.gz","w:gz") as tar:
for file in ["a.txt","b.log","c.png"]:
tar.add(os.path.basename(file))
- 1
- 4
-
You should consider expanding this answer to include detail about what was wrong with the other answer and explain why this snippet works. – Alex Reinking May 04 '21 at 05:57
Perfect answer
best performance and without the . and .. in compressed file!
NOTICE (thanks MaxTruxa):
this answer is vulnerable to shell injections. Please read the security considerations from the docs. Never pass unescaped strings to
subprocess.run,subprocess.call, etc. ifshell=True. Useshlex.quoteto escape (Unix shells only).I'm using it locally - so it's good for my needs.
subprocess.call(f'tar -cvzf {output_filename} *', cwd=source_dir, shell=True)
the cwd argument changes directory before compressing - which solves the issue with the dots.
the shell=True allows wildcard usage (*)
WORKS also for a directory recursively
- 2,858
- 3
- 28
- 42
-
2The "perfect answer" is vulnerable to shell injections. Please read the [security considerations](https://docs.python.org/3/library/subprocess.html#security-considerations) from the docs. Never pass unescaped strings to `subprocess.run`, `subprocess.call`, etc. if `shell=True`. Use `shlex.quote` to escape (Unix shells only). – Max Truxa Apr 11 '22 at 09:37
-