-1

What is the fastest way to check in Python if a word or several words separated by spaces occur in a string.

e.g. (berlin|los angeles|paris)

I have a dataframe with about 5 million entries and want to check if there are about 8 thousand words in it. If a word is found, it should be returned. Alternatively true/false would be possible, if this improves the performance.

The approach via re.findall takes several hours. Does anyone know a good solution?

yllwpr
  • 89
  • 7
  • You can create a map based on unique words from the text and then you can search using a key this operation costs only O(1), but I wonder if your memory can handle it. – Rafał May 13 '22 at 20:28

0 Answers0