1

git hash-object command somehow detects if content of a blob is a text file or a binary.

There is also a question of git configuration context (https://help.github.com/articles/dealing-with-line-endings/). If you configure git to treat certain types of files as binary content then git will act differently. Not knowing of the context you may generate a wrong hash code. Right ?

I think that the most secure way is to call git hash-object some_file in context of your project and then you can be 100% sure that it will give correct result.

Am I right or do I miss something ?

Below is the code that is a way of reproducing the situation.

import org.apache.commons.codec.digest.DigestUtils
import org.apache.commons.lang3.ArrayUtils
class Test3 {
    public static void main(String[] args) {
        def bytesU = "this \n is a text".bytes
        def fileU = File.createTempFile("someFileU", ".tmp")
        fileU << bytesU;
        println DigestUtils.sha1Hex(ArrayUtils.addAll("blob ${bytesU.length}\0".bytes, bytesU))
        println "git hash-object ${fileU.absolutePath}".execute().text
        def bytesW = "this \r\n is a text".bytes
        def fileW = File.createTempFile("someFileW", ".tmp")
        fileW << bytesU;
        println DigestUtils.sha1Hex(ArrayUtils.addAll("blob ${bytesW.length}\0".bytes, bytesW))
        println "git hash-object ${fileW.absolutePath}".execute().text
        println DigestUtils.sha1Hex(ArrayUtils.addAll("blob 0\0".bytes, [] as byte[]))
        println DigestUtils.sha1Hex(ArrayUtils.addAll("blob 7\0foobar".bytes, [] as byte[]))
    }
}

Below is the output of the program. The third line,result of git hash-obeject, is different because of the line endings.

  1. 792e2834867278884eeb8b5ff5f1954e1aa68660
  2. 792e2834867278884eeb8b5ff5f1954e1aa68660
  3. 7005d7429c4d219c73900f1a02e7980004614ac3
  4. 792e2834867278884eeb8b5ff5f1954e1aa68660
  5. e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
  6. 3a9f0b1970d7ed8d742dc3b9b36736eb03150766

The is an older post on this which is locked for me so i've decided to create separate question. Please merge this into How to assign a Git SHA1's to a file without Git?

Community
  • 1
  • 1
Andrzej
  • 11
  • 3

1 Answers1

1

I think that the most secure way is to call git hash-object some_file in context of your project and then you can be 100% sure that it will give correct result.

That's right. Besides newline conventions you can provide custom filters to canonicalize/localize content, to e.g. substitute in and strip out repo-specific settings and whatnot, and git hash-object hunts those down and runs them too.

jthill
  • 48,781
  • 4
  • 72
  • 120