0

I'm looking for an efficient way to find different strings in a list of string lists and return their indices. Here is the code:

inp = [ 'ans1', 'ans2', 'ans3' ]
output = [ [ 'aaa', 'ans1', 'bbb', 'ccc', 'ans2', 'ddd' ],
           [ 'bbb', 'aaa', 'ans2', 'ddd', 'ans1', 'aaa' ],
           [ 'ddd', 'ccc', 'ans2', 'ans1', 'aaa', 'bbb' ] ]

# expected result
# result = [ [ 1, 4, 3 ], [ 4, 2, 2 ], [ -1, -1, -1 ] ]

Those reported in the result are the indices for the position in the output list of each string in the inp list. For example, ans2 is at index 4 in the first sublist, index 2 in the second sublist, and index 2 in the third sublist. Similarly for ans1. ans3, however, does not appear in any sublist and, therefore, the returned index is -1.

What I'm looking for is an efficient way to do this computation (possibly in parallel?) while avoiding the classic for loops that this can clearly be done with.

Some considerations:

  • output has shape equal to [ len( inp ), L ], where L is the size of the dictionary. In this case L = 5.
L4plac3
  • 29
  • 6
  • Welcome to Stack Overflow! Please take the [tour], read [what's on-topic here](/help/on-topic), [ask], and the [question checklist](//meta.stackoverflow.com/q/260648/843953), and provide a [mre]. "Implement this feature for me" is off-topic for this site. You have to _make an honest attempt_, and then ask a _specific question_ about your algorithm or technique. – Pranav Hosangadi Jul 07 '21 at 15:44
  • I'm sorry, I tried the usual nested for loops to do so but I was looking for performances and that's what I asked for, since I sincerely do not know where to start. – L4plac3 Jul 07 '21 at 15:46
  • 1
    https://stackoverflow.com/questions/9786102/how-do-i-parallelize-a-simple-python-loop – Pranav Hosangadi Jul 07 '21 at 15:49
  • @PranavHosangadi Thanks! I'll give it a chance – L4plac3 Jul 07 '21 at 15:54

2 Answers2

1

You can try list comprehension:

result = [[o.index(s) if s in o else -1 for o in output] for s in inp]
print(result) # [[1, 4, 3], [4, 2, 2], [-1, -1, -1]]

Update:

Also it's probably not the best idea to store -1 as an index for strings, which are not presented in the output list. -1 is a valid index in Python, which may potentially lead to errors in the future if you plan to do something with indexes, stored in the result.

alexnik42
  • 168
  • 8
0

You can create dictionary index first to speed-up the search:

inp = ["ans1", "ans2", "ans3"]
output = [
    ["aaa", "ans1", "bbb", "ccc", "ans2", "ddd"],
    ["bbb", "aaa", "ans2", "ddd", "ans1", "aaa"],
    ["ddd", "ccc", "ans2", "ans1", "aaa", "bbb"],
]

tmp = [{v: i for i, v in enumerate(subl)} for subl in output]

result = [[d.get(i, -1) for d in tmp] for i in inp]
print(result)

Prints:

[[1, 4, 3], [4, 2, 2], [-1, -1, -1]]
Andrej Kesely
  • 118,151
  • 13
  • 38
  • 75