3

The premise is simple: I have two integers, a and b and I want to find i s.t. a + i and b + i are both in a given list. The list rs is very large (10e9 items). I have the following code:

def getlist(a,b):
    a1 = set([i - a for i in rs if i>a])
    b1 = set([i-b for i in rs if i>b]) 

    tomp = list(a1.intersection(b1))
    return tomp

The issue at hand is that a1 and b1 are pre-computed first which creates a memory problem. Can I optimize my code somehow? General comments about the method are also welcome.

Example input:

rs = [4,9,16]
a = 3
b = 8

Expected output:

getlist(3,8) = [1]
S. L.
  • 620
  • 6
  • 18
  • Does this gives the expected output? – Dani Mesejo Feb 06 '19 at 14:08
  • yes, as long as len(rs) <=10e6 – S. L. Feb 06 '19 at 14:10
  • 1
    I guess you don't need to create two sets, you could try: `tomp = [i - b for i in rs if i - b in a1]` – Dani Mesejo Feb 06 '19 at 14:12
  • How would you do it without 2 sets? – S. L. Feb 06 '19 at 14:14
  • 1
    How about `a1 = set([i - a for i in rs if i>a])` `b1 = set([i-b for i in a1 if i>b])` – Ravi Teja Feb 06 '19 at 14:17
  • In the ends I believe this boils down to finding all pairs that add up to a given sum, there very efficient ways of finding that. Including one with `O(n)` complexity. See this: https://stackoverflow.com/questions/4720271/find-a-pair-of-elements-from-an-array-whose-sum-equals-a-given-number – Dani Mesejo Feb 06 '19 at 14:23
  • @Daniel It's not exactly that, the sum is not a given one but can rather be one in the large array. But performance wise I already see an improvement with the optimized snippet. – S. L. Feb 06 '19 at 14:30
  • Could you add some sample input (small) together with the expected output? – Dani Mesejo Feb 06 '19 at 14:41

2 Answers2

4

You can optimize the memory usage by skipping the creation of the second set (and intermediate lists):

def getlist(a, b):
    a1 = {i - a for i in rs if i > a}
    return [i - b for i in rs if i > b and i - b in a1]

The time and space complexity of this solution is O(n).

Eugene Yarmash
  • 131,677
  • 37
  • 301
  • 358
2

If rs is already a set, this would be faster:

def getlist(a, b):
    return [i - a for i in rs if i > a and b + (i - a) in rs]

If it is not, then you have to make the set first (otherwise the above algorithm would be very slow) and the performance is essentially the same as before:

def getlist(a, b):
    rs_set = set(rs)
    return [i - a for i in rs_set if i > a and b + (i - a) in rs_set]

However, if you are going to use the same function many times for different a and b values but the same rs, you can convert rs to a set once and reuse it every time.

jdehesa
  • 55,673
  • 6
  • 74
  • 107
  • This is in fact at least 2.5x faster than any other answer, even accounting for the conversion to set(). – S. L. Feb 06 '19 at 15:31