9

Assume project where no add and commit has been done for a long time. I do git add . but it takes too much time. I would like to estimate which files/directories are most expensive in the current case. I have a good .gitignore file which works sufficiently but, still sometimes, I have too much and/or something too difficult to be added and committed to Git.

I have often directories which size is from 300GB to 2 TB in my directories. Although excluding them by directory/* and directory/ in .gitignore, the addition is slow.

How can you estimate which directories/files are too expensive to be committed?

Léo Léopold Hertz 준영
  • 126,923
  • 172
  • 430
  • 675

2 Answers2

17

Git slowness is generally from large binary files. This isn't because they're binary, just because binary files tend to be large and more complex to compress & diff.

Based on your edit indicating the file sizes, I suspect this is your problem.

The answers to this question offer a few solutions: removing them from source control, manually running git gc, etc.

Community
  • 1
  • 1
Aaron Brager
  • 64,148
  • 18
  • 155
  • 274
1

"git add" needs to internally run "diff-files" equivalent,

With Git 2.20 (Q4 2018), the codepath learned the same optimization as "diff-files" has to run lstat(2) in parallel to find which paths have been updated in the working tree.

See commit d1664e7 (02 Nov 2018) by Ben Peart (benpeart).
(Merged by Junio C Hamano -- gitster -- in commit 9235a6c, 13 Nov 2018)

add: speed up cmd_add() by utilizing read_cache_preload()

During an "add", a call is made to run_diff_files() which calls check_removed() for each index-entry.
The preload_index() code distributes some of the costs across multiple threads.

Because the files checked are restricted to pathspec, adding individual files makes no measurable impact but on a Windows repo with ~200K files, 'git add .' drops from 6.3 seconds to 3.3 seconds for a 47% savings.

VonC
  • 1,129,465
  • 480
  • 4,036
  • 4,755