4

I created a Git repo that will exclusively be stored locally and I ask myself, if I really need Git LFS for binaries? As far as I can see, the .gitattributes is properly configured as in:

*.psd binary

And yes, the files land in .git/objects/..., but they are compressed and don't take much space. So to sum it up, what are the benefits of Git LFS in a local repository if I never push/pull from/to a remote repo?

Thanks!

Daniel Stephens
  • 1,649
  • 6
  • 23
  • 59

3 Answers3

6

While Git commits are complete "snapshots" of the repository contents, it doesn't actually store the files again. Git stores the contents of files as "blob objects" identified by a checksum of the contents; only the contents, the filename and permissions are stored in a tree object. If you commit a change to a file, Git stores the whole file again (compressed) as a new blob; any unchanged files will reuse existing blobs. And if you have two files with the same contents they will share the same blob.

Without git-lfs, every time you commit a change to a binary file the entire file (compressed) must be stored again in a new blob. Since a Git repository is the complete history of the project, after a while this can potentially eat up a lot of disk space. If space is tight, that might be important to you.

Fortunately, you do not have to make this decision now. If the repository gets too large, you can always retroactively apply git-lfs later using the BFG Repo Cleaner.

Schwern
  • 139,746
  • 23
  • 170
  • 313
  • Thanks! Do you know why? If the file didn’t change, it should result in the same commit file sha, no? – Daniel Stephens Sep 13 '20 at 00:30
  • 1
    @DanielStephens No, if a file does not change it will result in the same blob ID and the contents will not be duplicated. [The commit ID is made up of a bunch of things](https://stackoverflow.com/a/29107504/14660). See https://git-scm.com/book/en/v2/Git-Internals-Git-Objects – Schwern Sep 13 '20 at 00:36
  • Thanks! Can you elaborate your first sentence then? Just so i understand the differences. I will accept this then as an answer – Daniel Stephens Sep 13 '20 at 01:47
  • @DanielStephens How's that? – Schwern Sep 13 '20 at 01:54
  • Thanks! That pretty much answers my question. One last-side question though I am still trying to understand. What would be different with Git LFS? If Git "alone" stores the object as a compressed version, what does Git LFS add in value to this? The test-file I see in my Git LFS test repo is not even compressed as far as I can see – Daniel Stephens Sep 13 '20 at 03:31
  • 1
    @DanielStephens Instead of storing the historic content of large files on disk, it stores them in cloud storage and fetches them on demand. Since you're making the commits locally, I think it will cache them. You'll probably have to manually purge the lfs cache. – Schwern Sep 13 '20 at 14:41
3

It depends on your workflow and the facilities you have access to.

Git stores versions of files as blobs. These blobs are diff compressed, whereby only differences are stored. Therefore, the file size increases only marginally.

The situation is different if the versioned file is a binary or a file where a single change restructures the whole file. In that case, Git stores a copy of each file, whereby the repository grows rapidly.

Comparison between Git and Git-LFS blob sizes

Git does a good job in diff compressing even big files. I've found that the compression of large files can be excellent (total change in size):

type change file as git blob after git gc as git-lfs blob
Vectorworks (.vwx) added geometry 28,8 MB +26,5 MB +1,8 MB +26,5 MB
GeoPackage (.gpkg) added geometry 16,9 MB +3,7 MB +3,5 MB +16,9 MB
Affinity Photo (.afphoto) toggled layers 85,8 MB +85,6 MB +0,8 MB +85,6 MB
FormZ (.fmz) added geometry 66,3 MB +66,3 MB +66,3 MB +66,3 MB
Photoshop (.psd) toggled layers 25,8 MB +15,8 MB +15,4 MB +25,8 MB
Movie (mp4) trimmed 13,1 MB +13,2 MB +0 MB +13,1 MB
delete a file -13,1 MB +0 MB +0 MB +0 MB

If you don't have a remote to push to, it is better to not use Git-LFS, because Git-LFS versioned files seem to add no additional compression at all (see above).

Also one important lesson learnt here is that Git's diff compression method doesn't work with real binary files like .fmz. These would be the best candidates for putting under Git-LFS versioning.

For other file types that seem to be non-textual, but their structure is text-like (.vwx or .afphoto) the diff method performs well. In a single user scenario, where overal repository size and not committing speed is important, I wouldn't put these under Git-LFS versioning because the Git blob size is significantly smaller than the LFS blob, thus saving space at the local and the remote.

Benefits of Git-LFS

Git-LFS provides a solution to this problem by storing older version of large binary files at a place outside of the repository (the Remote) and replacing it by a pointer file. If an older version is needed, then the client pulls it from the remote. Therefore, if a designer pulls the latest state from the remote, he will only download the latest state and the pointer files.

Therefore, Git-LFS can only be facilitated if you have access to a remote that is located at a LFS-enabled server. If there is no server to push the blobs to, then LFS-tracked blobs will stay in the local repo, therefore the advantage of decreasing local storage consumption is not utilised.

Usually, the remote is a LFS enabled git provider, which can be too expensive for some projects. However, there are solutions to also host a Git-LFS remote locally.

How to integrate Git-LFS in a local repository

Natively, Git-LFS allows transferring data through HTTPs only. Therefore, you need a separate Git-LFS server for storing the large files. However, there is ''no official server'' implementation for local hosting. But there are some unofficial ways like Git-LFS Folderstore to do that.

Git-LFS Folderstore provides a way to manage a Git-LFS remote locally. It works on a local machine and on a network drive. If you are on Mac OS X, then you can set it up by copying the lfs-folderstore executable lfs-folderstore to /usr/local/bin and then:

# Creating a remote repository on a volume (attached drive or NAS)
cd path/to/remote
mkdir origin

# create a bare git repository in origin
cd origin
git init origin --bare

# Add remote to local repository
cd path/to/local/repository
git remote add origin <path/to/remote/origin>

# Enable Git-LFS in local repository
git lfs install

# Track filetype psd
git lfs track "*.psd"

# Configure lfs of the local repository
git config --add lfs.customtransfer.lfs-folder.path lfs-folderstore
git config --add lfs.standalonetransferagent lfs-folder
git config --add lfs.customtransfer.lfs-folder.args "Volumes/path/to/remote/origin"

# Commit changes
git commit -am "commit message"

# Push media to remote
`git push origin master`

Use "' if your remote path contains spaces.

How to cleanup the local repository

You can compress the size of your git repository by calling the Git Garbage Collector git gc. It won't touch the Git-LFS blobs tough.

Git-LFS will only remove blobs from the local repository .git/lfs/objects/ if they have been pushed to a remote AND if the commit containing the blobs is older than recent (3 days). Here are the commands if you want to do it manually:

# remove lfs duplicates
# https://github.com/git-lfs/git-lfs/blob/main/docs/man/git-lfs-dedup.1.ronn
git lfs dedup

# clean old local lfs files (>3 days of commit)
# https://github.com/git-lfs/git-lfs/blob/main/docs/man/git-lfs-prune.1.ronn
git lfs prune
2

To add to the excellent answer already provided by @Schwern and address OP's comment.

Here is a link to the documentation of GIT LFS from atlassian - the company that stands behind this extension.

The idea is that the binaries are downloaded from the "remote" repository lazily, Namely during the checkout process rather than during cloning or fetching.

Technically git lfs stores "lazily" evaluated pointers to the binaries.

This makes a lot of sense because git has a "commitment" to be able to provide you an access to the state of the code-base after each and every commit, so the following situation is possible:

  1. commit A: added large binary file a.bin (lets say a.bin is in version 1)
  2. push the changes
  3. commit B: made changes in the binary file a.bin (a.bin is in version 2 now)
  4. push the changes
  5. Now checkout the SHA1 of the commit A - the git has to provide you a.bin in version 1.

This is true even if you've decided to remove the a.bin and commit it, there should still be a possibility to access the file-system state after "commit A". So At least locally there is no point to store version 1 if you explicitly don't need that.

One more note, to directly address the question and clarify: yes you have to enable git lfs support locally, but in addition you also have to enable git lfs support on your remote repo (I did that with bit bucket once, I'm sure its competitors support that as well).

Mark Bramnik
  • 35,370
  • 4
  • 46
  • 77
  • Thank you! That is an excellent addition! But to me it seems Git LFS is only essential and beneficial if remote repos are involved, is that correct? If the repo never leaves the machine, it's not as beneficial, correct? – Daniel Stephens Sep 13 '20 at 05:19
  • 1
    Well, yes, I guess so, although I've never seen that someone uses git lfs only locally... – Mark Bramnik Sep 13 '20 at 05:26