Alternate optimized ways to check for presence of an element in a list?

Question

I have an simple brute force AI code for the 8 puzzle problem. One of the steps involves checking if the current state has already been visited, and all the visited states are stored in a list called visited = [].

I noticed my program was taking too long to find the solution, and started using wall time checks to see what functions were taking the longest. To my surprise, the built in sorting algorithm was using almost 35 seconds, and had to sort through a maximum of 21000 elements at a time.

On the other hand, my visited list reached a maximum length of 42000. (I'm appending Class objects to this visited list as explained below)

Every time a new state is generated, it is compared with all the states in the visited list to check if it already exists there using the not_in method.

def not_in(self, visited):

        for node in visited:
            if (self.data == node.data):
                return 0
        return 1

This checks if the currently passed in Node has its Node.data equal to any other Node in the visited list.

The caller function looks like this -

if Node.not_in(temp_right, visited):
    visited.append(temp_right)

where self refers to a class object of the following type-

class Node:
    def __init__(self, data, parent):
        self.data = data
        self.parent = parent

where self.data is a list, and self.parent refers to another class object Node, like a linked list.

This not_in function has to perform over 1366241896 comparisons to reach the solution from my current starting state.

Is there a way to optimize this function? Because that's a lot of comparisons that I'd like to avoid. The whole program exclusively uses lists to store data, if that helps my case.

If you want to efficiently check for containment, use a `set` instead of a `list`. (Assuming your `data` is hashable.) — MisterMiyagi, Apr 14 '22 at 14:30
Does this answer your question? [Fastest way to check if a value exists in a list](https://stackoverflow.com/questions/7571635/fastest-way-to-check-if-a-value-exists-in-a-list) — MisterMiyagi, Apr 14 '22 at 14:31

score 0 · Answer 1 · answered Apr 14 '22 at 16:11

"my visited list reached a maximum length of 42000". But in theory the possible states are 9!, or 362880.

One possibility would be to use a dict to create "buckets" where to store the lists of states that pertain to a given bucket. Let's assume you store the state as a 9-char string, so the solved state would be something like 12345678_. Then you may use e.g. the first 3 chars to identify the bucket - you would have 504 buckets (the permutations of 9 values in 3 slots), each with a maximum list lenght of 6!, or 720.

The not_in method would change accordingly. I made two non-object versions of your method, with the list and the dict/bucket approach respectively, and compared the perfomances on a random sample of 10000 states:

from collections import defaultdict
from itertools import permutations
from random import sample
from timeit import timeit

##functions
def not_in_l(data, visited_l):
    for node in visited_l:
        if (data == node):
            return 0
    return 1

def not_in_d(data, visited_d):
    for node in visited_d[data[:3]]:
        if (data == node):
            return 0
    return 1

##setup sample data
visited_l = sample(list(permutations('12345678_')),10000)
visited_d = defaultdict(list)
for x in visited_l:
    visited_d[x[:3]].append(x)

##test that functions return same value
for i in sample(range(10000),100):
    assert not_in_l(visited_l[i], visited_l) == not_in_d(visited_l[i], visited_d)

print('list function')
print(timeit('for i in range(0,10000,999):not_in_l(visited_l[i],visited_l)', globals=globals(), number=1000))
print('dict function')
print(timeit('for i in range(0,10000,999):not_in_d(visited_l[i],visited_d)', globals=globals(), number=1000))

The results on my PC are

list function
3.9646465
dict function
0.02327440000000003

Can you please write it in a less "pythonic" way? I'm still learning python and can't understand what --for x in visited_l: visited_d[x[:3]].append(x)-- means. I know about default dictionaries, but some things I don't understand yet. — Birinder Singh, Apr 14 '22 at 21:25
Ok, it's just a way to setup the data for the functions. In the previous step we built a 10000 items list of states (where the state is a `tuple` of 1-char `string`s). So now we re-use that list, and for each element of it we create or update a `key:value` pair in our `dict`, where the `value` is a `list` of states, and the key are the first 3 chars of the state, thus effectively creating the "buckets" we need — gimix, Apr 15 '22 at 06:58
In other words, when the loop meets the first state pertaining to a bucket, a new entry gets created with the bucket code as the key, and a `list` with only that state as the value. When it finds a state for an already existing bucket, this new state is just `append`ed to the `list` — gimix, Apr 15 '22 at 07:03

Alternate optimized ways to check for presence of an element in a list?

1 Answers1