25

Is there any efficient way to get intersection of two slices in Go?

I want to avoid nested for loop like solution
slice1 := []string{"foo", "bar","hello"}
slice2 := []string{"foo", "bar"}

intersection(slice1, slice2)
=> ["foo", "bar"]
order of string does not matter
Flimzy
  • 68,325
  • 15
  • 126
  • 165
bandi shankar
  • 517
  • 2
  • 6
  • 8

6 Answers6

37

How do I get the intersection between two arrays as a new array?

  • Simple Intersection: Compare each element in A to each in B (O(n^2))
  • Hash Intersection: Put them into a hash table (O(n))
  • Sorted Intersection: Sort A and do an optimized intersection (O(n*log(n)))

All of which are implemented here

https://github.com/juliangruber/go-intersect

KeksArmee
  • 1,209
  • 14
  • 21
  • 1
    do we have any in build library for it in go ? – bandi shankar Jul 06 '17 at 18:27
  • No. There is no such build in package in go for this. – Mayank Patel Jul 06 '17 at 18:31
  • 1
    Remember that although an $O(n)$ solution is better than an $O(n^2)$ solution for big $n$, if $n$ is small it's probably better to go with the $O(n^2)$ solution than with the $O(n)$ solution since the latter might have more overhead. – md2perpe Jul 06 '17 at 18:55
  • 2
    @md2perpe you can't make a generalization like that based on what "might" have more overhead. It's probably better to go with whatever is most efficient, and best to actually test with each and see what works meets requirements for a specific situation. – Adrian Jul 06 '17 at 20:03
  • 5
    @Adrian, tell Rob Pike (https://en.wikipedia.org/wiki/Rob_Pike) that. See his Rule 3 here: http://users.ece.utexas.edu/~adnan/pike.html – md2perpe Jul 06 '17 at 20:15
  • I know who RP is and that his word, like everyone else's, is not gospel. Just because an algorithm is faster, does not mean it's "fancy". And "Rule 2" is exactly what I was saying - test the available options and choose what works best in that situation based on empirical observation. – Adrian Jul 06 '17 at 20:20
  • @bandishankar Just run `go get "github.com/juliangruber/go-intersect"`. You can then use the same path to access its code. Don't forget to give credit to the guy :) – KeksArmee Jul 07 '17 at 15:09
4

It's a best method for intersection two slice. Time complexity is too low.

Time Complexity : O(m+n)

m = length of first slice.

n = length of second slice.

func intersection(s1, s2 []string) (inter []string) {
    hash := make(map[string]bool)
    for _, e := range s1 {
        hash[e] = true
    }
    for _, e := range s2 {
        // If elements present in the hashmap then append intersection list.
        if hash[e] {
            inter = append(inter, e)
        }
    }
    //Remove dups from slice.
    inter = removeDups(inter)
    return
}

//Remove dups from slice.
func removeDups(elements []string)(nodups []string) {
    encountered := make(map[string]bool)
    for _, element := range elements {
        if !encountered[element] {
            nodups = append(nodups, element)
            encountered[element] = true
        }
    }
    return
}
  • 1
    you don't need `removeDups` if you use the boolean value of the hash map as an indication of whether you've already added the same element to the resulting arary. – Dan Markhasin Dec 05 '20 at 08:51
2

if there exists no blank in your []string, maybe you need this simple code:

func filter(src []string) (res []string) {
    for _, s := range src {
        newStr := strings.Join(res, " ")
        if !strings.Contains(newStr, s) {
            res = append(res, s)
        }
    }
    return
}

func intersections(section1, section2 []string) (intersection []string) {
    str1 := strings.Join(filter(section1), " ")
    for _, s := range filter(section2) {
        if strings.Contains(str1, s) {
            intersection = append(intersection, s)
        }
    }
    return
}
Nevercare
  • 81
  • 3
1

https://github.com/viant/toolbox/blob/master/collections.go

Another O(m+n) Time Complexity solution that uses a hashmap. It has two differences compared to the other solutions discussed here.

  • Passing the target slice as a parameter instead of new slice returned
  • Faster to use for commonly used types like string/int instead of reflection for all
0

simple, generic and mutiple slices ! (Go 1.18)

Time Complexity : may be linear

package main
import (
  "fmt"
  "golang.org/x/exp/constraints"
)
    
func interSection[T constraints.Ordered](pS ...[]T) (result []T) {
  hash := make(map[T]*int) // value, counter
  for _, slice := range pS {
    for _, value := range slice {
      if counter := hash[value]; counter != nil {
        *counter++
      } else {
        i := 1
        hash[value] = &i
      }
    }
  }
  result = make([]T, 0)
  for value, counter := range hash {
    if *counter >= len(pS) {
      result = append(result, value)
    }
  }
  return
}
    
func main() {
  slice1 := []string{"foo", "bar", "hello"}
  slice2 := []string{"foo", "bar"}
  fmt.Println(interSection(slice1, slice2))
  // [foo bar]

  ints1 := []int{1, 2, 3, 9}
  ints2 := []int{10, 4, 2, 4, 8, 9}
  ints3 := []int{2, 4, 8, 1}
  fmt.Println(interSection(ints1, ints2, ints3))
  // [2 4]
}

playground : https://go.dev/play/p/AFpnUvWHV50

kkleejoe
  • 1
  • 1
-6

Yes there are a few different ways to go about it.. Here's an example that can be optimized.

package main

import "fmt"

func intersection(a []string, b []string) (inter []string) {
    // interacting on the smallest list first can potentailly be faster...but not by much, worse case is the same
    low, high := a, b
    if len(a) > len(b) {
        low = b
        high = a
    }

    done := false
    for i, l := range low {
        for j, h := range high {
            // get future index values
            f1 := i + 1
            f2 := j + 1
            if l == h {
                inter = append(inter, h)
                if f1 < len(low) && f2 < len(high) {
                    // if the future values aren't the same then that's the end of the intersection
                    if low[f1] != high[f2] {
                        done = true
                    }
                }
                // we don't want to interate on the entire list everytime, so remove the parts we already looped on will make it faster each pass
                high = high[:j+copy(high[j:], high[j+1:])]
                break
            }
        }
        // nothing in the future so we are done
        if done {
            break
        }
    }
    return
}

func main() {
    slice1 := []string{"foo", "bar", "hello", "bar"}
    slice2 := []string{"foo", "bar"}
    fmt.Printf("%+v\n", intersection(slice1, slice2))
}

Now the intersection method defined above will only operate on slices of strings, like your example.. You can in theory create a definition that looks like this func intersection(a []interface, b []interface) (inter []interface), however you would be relying on reflection and type casting so that you can compare, which will add latency and make your code harder to read. It's probably easier to maintain and read to write a separate function for each type you care about.

func intersectionString(a []string, b []string) (inter []string),

func intersectionInt(a []int, b []int) (inter []int),

func intersectionFloat64(a []Float64, b []Float64) (inter []Float64), ..ect

You can then create your own package and reuse once you settle how you want to implement it.

package intersection

func String(a []string, b []string) (inter []string)

func Int(a []int, b []int) (inter []int)

func Float64(a []Float64, b []Float64) (inter []Float64)
reticentroot
  • 3,436
  • 2
  • 21
  • 38