11

In trying to mirror a repo to a remote server, the server is rejecting tree object 4e8f805dd45088219b5662bd3d434eb4c5428ec0. This is not a top-level tree, by the way but a subdirectory.

How can I find out which commit(s) indirectly reference that tree object so I can avoid pushing the refs that link to those commits in order to get all the rest of my repo to push properly?

Andrew Arnott
  • 77,719
  • 25
  • 129
  • 167
  • I've considered deleting the tree object then running `git fsck` hoping it would remove all references to it as part of recovery. But I don't know how to delete an object from a packfile either. – Andrew Arnott Dec 11 '16 at 15:58
  • How about finessing the problem: Use "git bisect" to find the commit that introduced the bad tree reference, and then you can git ls-tree that commit to find the bad tree. – Raymond Chen Dec 11 '16 at 18:03
  • @RaymondChen That might not work. Besides taking so long (bisect is awesome, but not so much on a tree this large) it may fail because the tree itself may fail to checkout on the relevant commit. Also, I need a "good" and a "bad" sample commit for bisect to get started, and I don't know which commit is bad. – Andrew Arnott Dec 12 '16 at 00:16

2 Answers2

16

As you noted, you just need to find the commit(s) with the desired tree. If it could be a top level tree you would need one extra test, but since it's not, you don't.

You want:

  • for some set of commits (all those reachable from a given branch name, for instance)
  • if that commit has, as a sub-tree, the target tree hash: print the commit ID

which is trivial with two Git "plumbing" commands plus grep:

#! /bin/sh
#
#  set these:
searchfor=4e8f805dd45088219b5662bd3d434eb4c5428ec0

startpoints="master"  # branch names or HEAD or whatever
# you can use rev-list limiters too, e.g., origin/master..master

git rev-list $startpoints |
    while read commithash; do
        if git ls-tree -d -r --full-tree $commithash | grep $searchfor; then
            echo " -- found at $commithash"
        fi
    done

To check top-level trees you would git cat-file -p $commithash as well and see if it has the hash in it.

Note that this same code will find blobs (assuming you take out the -d option from git ls-tree). However, no tree can have the ID of a blob, or vice versa. The grep will print the matching line so you'll see, e.g.:

040000 tree a3a6276bba360af74985afa8d79cfb4dfc33e337    perl/Git/SVN/Memoize
 -- found at 3ab228137f980ff72dbdf5064a877d07bec76df9

To clean this up for general use, you might want to use git cat-file -t on the search-for blob-or-tree to get its type.

torek
  • 389,216
  • 48
  • 524
  • 664
  • Thanks! This should work, albeit since I don't know which branch/tag has the bad tree, I'd have to run the whole thing in a loop over each one of my several thousand branches (it's a big repo with lots of users). So I'll have to craft some way to get a list of every single commit in the repo across branches and eliminate duplicates first. But this is a great start. – Andrew Arnott Dec 12 '16 at 00:33
  • It looks like `git rev-list` returns a set of commits for any number of refs. And it takes `--stdin` as a parameter. So I can do `git branch -r | git rev-list --stdin` and otherwise keep using your script. :) Except `git branch` adds whitespace in front of each branch name, which `git rev-list` doesn't like, so I wrote to a file, cleaned it up, then piped it into your script. Now it's very busy searching. – Andrew Arnott Dec 12 '16 at 00:42
  • I was even able to change `origin/master` to `^origin/master` to greatly cut down the number of commits since I know that the tree in question isn't anywhere on the master branch. – Andrew Arnott Dec 12 '16 at 00:43
  • 1
    `git rev-list` takes the *same* arguments as `git log`. In fact, they're basically the same command! They are built from one source file that just changes the default settings when run as `git log` vs `git rev-list`. Rev-list is intended for use in scripts, though, while log is intended for use by humans. In any case `A..B` "means" `B ^A` so `origin/master..master` and `master ^origin/master` are exactly the same thing here. In this case you can use `git rev-list --branches ^origin/master` (or maybe `--branches --tags`). – torek Dec 12 '16 at 02:50
  • So, it worked! I did find that the script subtly would only find trees that are subdirectories of the current one (as opposed to starting at the root). I had fixed that, but then it took so long to complete, and I had a pretty good idea of which directory the tree represented so I took advantage of that as an optimization. I got several commits now to work with. :) – Andrew Arnott Dec 12 '16 at 03:57
  • Oh, right, `git ls-tree` likes to start from your current subdirectory (not sure why) and we need `--full-tree` to prevent that. – torek Dec 12 '16 at 07:39
0

Variation of great answer by torek in case you want to speed things up via GNU Parallel:

#!/bin/bash    
searchfor="$1"
startpoints="${2-HEAD}"

git rev-list "$startpoints" |
    parallel "if git ls-tree -d -r --full-tree '{}' | grep '$searchfor'; then echo ' -- found at {}'; fi"
shawkinaw
  • 3,150
  • 2
  • 26
  • 30