188

I'm trying to write a simple script that will list the contents found in two lists. To simplify, let's use ls as an example. Imagine "one" and "two" are directories.

one=`ls one`
two=`ls two`
intersection $one $two

I'm still quite green in Bash, so feel free to correct how I am doing this. I just need some command that will print out all files in "one" and "two". They must exist in both. You might call this the "intersection" between "one" and "two".

Peter Mortensen
  • 30,030
  • 21
  • 100
  • 124
User1
  • 37,405
  • 65
  • 180
  • 257

5 Answers5

323
comm -12  <(ls 1) <(ls 2)
ghostdog74
  • 307,646
  • 55
  • 250
  • 337
  • 47
    Can't believe I had no knowledge of `comm` until today. This just made my whole week :) – Darragh Enright Aug 19 '14 at 17:49
  • 29
    `comm` requires the inputs to be sorted. In this case, `ls` automatically sorts its output, but other uses may need to do this: `comm -12 – Alexander Bird Jan 15 '15 at 21:11
  • 14
    DO NOT USE ls' output for anything. ls is a tool for interactively looking at directory metadata. Any attempts at parsing ls' output with code are broken. Globs are much more simple AND correct: ''for file in *.txt''. Read http://mywiki.wooledge.org/ParsingLs – Rany Albeg Wein Jan 25 '16 at 03:49
  • 2
    I just used this in an effort to find usages of a `public` method `error()` provided by a trait, in combination with `git grep`, and it was awesome! I ran `$ comm -12 error(" -- "*.php") – localheinz Apr 07 '17 at 15:45
  • 3
    This is hilarious. I was trying to do some crazy stuff with awk. – Rolf May 08 '17 at 23:36
81

Solution with comm

comm is great, but indeed it needs to work with sorted lists. And fortunately here we use ls which from the ls Bash man page:

Sort entries alphabetically if none of -cftuSUX nor --sort.

comm -12  <(ls one) <(ls two)

Alternative with sort

Intersection of two lists:

sort <(ls one) <(ls two) | uniq -d

Symmetric difference of two lists:

sort <(ls one) <(ls two) | uniq -u

Bonus

Play with it ;)

cd $(mktemp -d) && mkdir {one,two} && touch {one,two}/file_{1,2}{0..9} && touch two/file_3{0..9}
Peter Mortensen
  • 30,030
  • 21
  • 100
  • 124
Jean-Christophe Meillaud
  • 1,831
  • 1
  • 20
  • 27
32

Use the comm command:

ls one | sort > /tmp/one_list
ls two | sort > /tmp/two_list
comm -12 /tmp/one_list /tmp/two_list

"sort" is not really needed, but I always include it before using "comm" just in case.

Peter Mortensen
  • 30,030
  • 21
  • 100
  • 124
DVK
  • 123,561
  • 31
  • 206
  • 320
3

A less efficient (than comm) alternative:

cat <(ls 1 | sort -u) <(ls 2 | sort -u) | uniq -d
Benubird
  • 17,201
  • 25
  • 87
  • 134
  • 2
    If you are using Debian's /bin/dash or some other non-Bash shell in your scripts, you can chain commands' output using parentheses: `(ls 1; ls 2) | sort -u | uniq -d`. – nitrogen Oct 08 '14 at 20:19
  • 2
    @MikaëlMayer You should flag the name of the person you are replying to, otherwise it is assumed you mean me. – Benubird Feb 23 '15 at 08:34
  • 1
    @nitrogen MikaëlMayer is correct - chainging `sort -u | uniq -d` does nothing, because the sort has removed the duplicates before uniq starts to look for them. I think you have not understood what my command is doing. – Benubird Feb 23 '15 at 08:36
  • 1
    @Benubird I was not able to get your command `cat – nitrogen Feb 24 '15 at 09:21
  • @nitrogen The reason why I'm using cat, is because I want this to be a generalizable solution, so that you can replace `ls` with something else, e.g. `find`. Your solution does not allow this, because if one of the commands returns two lines the same, it picks it up as a duplicate. Mine works even if the user wants to do `ls 1/*` and compare all files across subdirectories. Otherwise, yes, it works as well. It's possible mine is bash-specific. – Benubird Feb 24 '15 at 09:50
  • If anyone is interested you can try my version of "comm" which I called "common". It does not need sorting and supports "-123" switches just like "comm". https://github.com/toni-rmc/common – toni rmc May 29 '17 at 15:31
2

Join is another good option depending on the input and desired output

join -j1 -a1 <(ls 1) <(ls 2)
frogstarr78
  • 860
  • 1
  • 7
  • 11
  • 3
    An explanation would be in order. E.g., why is it a good option? How is it different from `comm`? Why and when should it be used over `comm`? What is it supposed to do? Why options `-j1` and `-a1`? - why are they needed and what is their significance/meaning? Please respond by [editing (changing) your answer](https://stackoverflow.com/posts/22977016/edit), not here in comments (***without*** "Edit:", "Update:", or similar - the answer should appear as if it was written today). – Peter Mortensen Nov 02 '21 at 01:40