46

I have 2 files with a list of numbers (telephone numbers).

I'm looking for a method of listing the numbers in the second file that is not present in the first file.

I've tried the various methods with:

comm (getting some weird sorting errors)
fgrep -v -x -f second-file.txt first-file.txt (unsure of the result, there should be more)
pb2q
  • 56,563
  • 18
  • 143
  • 144
mvrasmussen
  • 475
  • 1
  • 5
  • 4
  • Have you checked this answer: http://stackoverflow.com/a/1617326/15165 ? BTW: before doing anything make sure you have got all the trailing lines and extra blank spaces removed. This could be the reason you have not found all of them... – bcelary Jun 19 '12 at 11:28

4 Answers4

86
grep -Fxv -f first-file.txt second-file.txt

Basically looks for all lines in second-file.txt which don't match any line in first-file.txt. Might be slow if the files are large.

Also, once you sort the files (Use sort -n if they are numeric), then comm should also have worked. What error does it give? Try this:

comm -23 second-file-sorted.txt first-file-sorted.txt
Hari Menon
  • 31,521
  • 13
  • 78
  • 107
29

You need to use comm:

comm -13 first.txt second.txt

will do the job.

ps. order of first and second file in command line matters.

also you may need to sort files before:

comm -13 <(sort first.txt) <(sort second.txt)

in case files are numerical add -n option to sort.

rush
  • 2,424
  • 2
  • 17
  • 30
  • That results in: comm: file 2 is not in sorted order comm: file 1 is not in sorted order And a list with exactly the same number of lines as file2 – mvrasmussen Jun 19 '12 at 11:31
  • so you can try to sort them before. i've just added variant with `comm` + `sort`. – rush Jun 19 '12 at 11:44
  • 2
    Keep in mind that sorting the files numerically may not work, as comm expects them to be sorted lexicographically. – chepner Jun 19 '12 at 12:34
12

This should work

comm -13 <(sort file1) <(sort file2)

It seems sort -n (numeric) cannot work with comm, which uses sort (alphanumeric) internally

f1.txt

1
2
21
50

f2.txt

1
3
21
50

21 should appear in third column

#WRONG
$ comm <(sort -n f1.txt) <(sort -n f2.txt)   
                1
2
21
        3
        21
                50

#OK
$ comm <(sort f1.txt) <(sort f2.txt)
                1
2
                21
        3
                50
Nahuel Fouilleul
  • 17,834
  • 1
  • 28
  • 34
1
cat f1.txt f2.txt | sort |uniq > file3
David Arenburg
  • 89,637
  • 17
  • 130
  • 188
tom
  • 27
  • 1
  • 1
    Unfortunately, this provides the unique list of all lines in both files, and the requester is seeking only different lines from file 2. – ingyhere May 01 '15 at 00:50