Check if multiple strings exist in another string

Question

How can I check if any of the strings in an array exists in another string?

Like:

a = ['a', 'b', 'c']
str = "a123"
if a in str:
  print "some of the strings found in str"
else:
  print "no strings found in str"

That code doesn't work, it's just to show what I want to achieve.

I'm surprised there aren't (yet) any answers comparing to a compiled regex in terms of perf, especially compared to size of the string and number of "needles" to search for. — Pat, Apr 22 '15 at 23:21
@Pat I am not surprised. The question is not about performance. Today most programmers care more for getting it done and readability. The performance question is valid, but a different question. — guettli, Jul 13 '16 at 06:42
Using str as a variable is confusing and may result in unexpected behavior as it is a reserved word; see [link](https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str). — Nomen Nescio, Feb 16 '18 at 21:16
regex `[abc]` also works perfectly well and will be faster if there are more than a couple of candidates to test. But if the strings are arbitrary and you don't know them in advance to construct a regex, you will have to use the `any(x in str for x in a)` approach. — smci, Jan 08 '20 at 13:15
@CleverGuy You're right, though it's not a reserved word, otherwise you wouldn't be able to assign to it. It's a builtin. — wjandrea, May 18 '20 at 00:08

score 1076 · Accepted Answer · edited Apr 29 '20 at 00:34

1076

You can use any:

a_string = "A string is more than its parts!"
matches = ["more", "wholesome", "milk"]

if any(x in a_string for x in matches):

Similarly to check if all the strings from the list are found, use all instead of any.

edited Apr 29 '20 at 00:34

rjurney

4,505
5
35
58

answered Aug 02 '10 at 16:15

Mark Byers

767,688
176
1,542
1,434

16

any() takes an iterable. I am not sure which version of Python you are using but in 2.6 you will need to put [] around your argument to any(). any([x in str for x in a]) so that the comprehension returns an iterable. But maybe later versions of Python already do this. – emispowder Mar 27 '13 at 01:06
8

@Mark Byers: Sorry for the late comment, but is there a way to print the string that was found? How would you do this. Thank you. – Shankar Kumar Aug 01 '13 at 01:26
4

Not sure I understand, if a is the list, and str is the thing to match against, what is the x? Python newbie ftw. :) – red Nov 13 '13 at 14:01
2

@red: you can read `for x in a` like "for each element in list". Since `a` is a list of strings, and `x` is an element of that list, `x` is a string (one of 'a', 'b', 'c' in original example) – User Jan 27 '14 at 20:50
7

@emispowder It works fine for me as-is in Python 2.6.9. – MPlanchard Jul 10 '15 at 18:25
6

@emispowder: [Generator expressions](https://www.python.org/dev/peps/pep-0289/) were introduced in 2.4. – zondo Apr 22 '17 at 03:07
What is x representing in this ans – CodeGuru Jun 11 '18 at 00:17
One caveat of using 'any' is that it verifies if each string of 'a' is in 'str'. You might want to return True as soon as one string of 'a' is in 'str'. This would be faster. – huseyin39 Feb 06 '20 at 22:03
1

@CodeGuru `x` is just a temporary variable when iterating through `a`. You could use a different name like `[number*2 for number in [7, 13]]`, but `[x*2 for x in [7, 13]]` seems to be a kind of convention. – Qaswed Feb 26 '20 at 14:15
2

Basically [x in str for x in a] returns a list of True/False depending on whether the string is found. The any() function iterates true that list of True/False and returns True if there is one True in the list. @CodeGuru – chia yongkang Apr 28 '20 at 03:07
1

I cleaned up the code to use `a_string` instead of Python reserved word `str` and `matches` instead of `a` because the reserved word was needlessly confusing (I know the poster had it, but still) and nested list comprehensions are impenetrable with single letter variables. – rjurney Apr 29 '20 at 00:35
Thank you for solving my life issues sir @Mark Byers. – NoahVerner Sep 08 '21 at 18:19
I can't figure out a way to match a capitalized word. a_string = "A string is MORE than its parts!" matches = ["MORE", "wholesome", "milk"] This does not match 'MORE'. What's the issue here? – Vipul Priyadarshi Apr 02 '22 at 07:46

score 94 · Answer 2 · answered May 23 '16 at 22:10

94

any() is by far the best approach if all you want is True or False, but if you want to know specifically which string/strings match, you can use a couple things.

If you want the first match (with False as a default):

match = next((x for x in a if x in str), False)

If you want to get all matches (including duplicates):

matches = [x for x in a if x in str]

If you want to get all non-duplicate matches (disregarding order):

matches = {x for x in a if x in str}

If you want to get all non-duplicate matches in the right order:

matches = []
for x in a:
    if x in str and x not in matches:
        matches.append(x)

answered May 23 '16 at 22:10

zondo

19,040
7
42
82

please add example for the last match too – Oleg Kokorin Apr 02 '18 at 21:46
@OlegKokorin: It creates a list of matching strings in the same order it finds them, but it keeps only the first one if two are the same. – zondo Apr 04 '18 at 00:35
Using an `OrderedDict` is probably more performant than a list. See [this answer on "Removing duplicates in lists"](https://stackoverflow.com/a/7961390/4518341) – wjandrea May 18 '20 at 00:11
Can you provide an example? – Herwini Nov 16 '20 at 14:18

score 57 · Answer 3 · answered Aug 02 '10 at 19:04

57

You should be careful if the strings in a or str gets longer. The straightforward solutions take O(S*(A^2)), where S is the length of str and A is the sum of the lenghts of all strings in a. For a faster solution, look at Aho-Corasick algorithm for string matching, which runs in linear time O(S+A).

answered Aug 02 '10 at 19:04

jbernadas

2,432
17
12

can Aho-Corasick also find substrings instead of prefixes ? – RetroCode Sep 26 '16 at 19:58
3

Some python Aho-Corasick libraries are [here](https://pypi.python.org/pypi/pyahocorasick/) and [here](https://github.com/JanFan/py-aho-corasick) – vorpal Sep 27 '17 at 10:54

score 33 · Answer 4 · answered May 23 '16 at 21:45

33

Just to add some diversity with regex:

import re

if any(re.findall(r'a|b|c', str, re.IGNORECASE)):
    print 'possible matches thanks to regex'
else:
    print 'no matches'

or if your list is too long - any(re.findall(r'|'.join(a), str, re.IGNORECASE))

answered May 23 '16 at 21:45

Shankar ARUL

11,199
10
64
66

1

This works for the given use case of the question. If the you search for `(` or `*` this fails, since quoting for the regex syntax needs to be done. – guettli Jul 12 '16 at 10:13
3

You can escape it if necessary with `'|'.join(map(re.escape, strings_to_match))`. You sould probably `re.compile('|'.join(...))` as well. – Artyer Nov 04 '17 at 21:50
1

And What's the time complexity ? – DachuanZhao Apr 30 '21 at 01:51

score 16 · Answer 5 · answered Mar 19 '19 at 15:26

A surprisingly fast approach is to use set:

a = ['a', 'b', 'c']
str = "a123"
if set(a) & set(str):
    print("some of the strings found in str")
else:
    print("no strings found in str")

This works if a does not contain any multiple-character values (in which case use any as listed above). If so, it's simpler to specify a as a string: a = 'abc'.

score 14 · Answer 6 · edited May 23 '16 at 22:03

14

You need to iterate on the elements of a.

a = ['a', 'b', 'c']
str = "a123"
found_a_string = False
for item in a:    
    if item in str:
        found_a_string = True

if found_a_string:
    print "found a match"
else:
    print "no match found"

edited May 23 '16 at 22:03

zondo

19,040
7
42
82

answered Aug 02 '10 at 16:15

Seamus Campbell

17,568
3
50
60

2

Yes i knew how to do that but compared to Marks answer, that's horrible code. – jahmax Aug 02 '10 at 16:24
14

Only if you understand Mark's code. The problem you were having is that you weren't examining the elements of your array. There are a lot of terse, pythonic ways to accomplish what you want that would hide the essence of what was wrong with your code. – Seamus Campbell Aug 02 '10 at 16:38
12

It may be 'horrible code' but it's [exactly what any() does](http://docs.python.org/2/library/functions.html#any). Also, this gives you the actual string that matched, whereas any() just tells you there is a match. – alldayremix Apr 01 '13 at 15:21

Domi W · Answer 7 · 2017-07-20T20:48:07.803

jbernadas already mentioned the Aho-Corasick-Algorithm in order to reduce complexity.

Here is one way to use it in Python:

Download aho_corasick.py from here
Put it in the same directory as your main Python file and name it aho_corasick.py

Try the alrorithm with the following code:

from aho_corasick import aho_corasick #(string, keywords)

print(aho_corasick(string, ["keyword1", "keyword2"]))

Note that the search is case-sensitive

score 5 · Answer 8 · answered Jan 12 '21 at 04:27

A compact way to find multiple strings in another list of strings is to use set.intersection. This executes much faster than list comprehension in large sets or lists.

>>> astring = ['abc','def','ghi','jkl','mno']
>>> bstring = ['def', 'jkl']
>>> a_set = set(astring)  # convert list to set
>>> b_set = set(bstring)
>>> matches = a_set.intersection(b_set)
>>> matches
{'def', 'jkl'}
>>> list(matches) # if you want a list instead of a set
['def', 'jkl']
>>>

score 3 · Answer 9 · answered Aug 02 '10 at 16:16

3

a = ['a', 'b', 'c']
str =  "a123"

a_match = [True for match in a if match in str]

if True in a_match:
  print "some of the strings found in str"
else:
  print "no strings found in str"

answered Aug 02 '10 at 16:16

mluebke

8,170
7
33
30

score 2 · Answer 10 · answered Jun 25 '18 at 13:51

2

Just some more info on how to get all list elements availlable in String

a = ['a', 'b', 'c']
str = "a123" 
list(filter(lambda x:  x in str, a))

answered Jun 25 '18 at 13:51

Nilesh Birari

873
7
13

score 2 · Answer 11 · answered Sep 09 '20 at 15:16

Yet another solution with set. using set.intersection. For a one-liner.

subset = {"some" ,"words"} 
text = "some words to be searched here"
if len(subset & set(text.split())) == len(subset):
   print("All values present in text")

if subset & set(text.split()):
   print("Atleast one values present in text")

balki · Answer 12 · 2020-11-09T15:29:21.570

2

The regex module recommended in python docs, supports this

words = {'he', 'or', 'low'}
p = regex.compile(r"\L<name>", name=words)
m = p.findall('helloworld')
print(m)

output:

['he', 'low', 'or']

Some details on implementation: link

edited Nov 09 '20 at 15:29

answered Nov 09 '20 at 15:21

balki

24,438
28
97
142

I can't find any documentation on \L. Can you point me to it? – Danilo Souza Morães Nov 29 '21 at 19:39
1

@DaniloSouzaMorães https://github.com/mrabarnett/mrab-regex#named-lists-hg-issue-11 – balki Dec 07 '21 at 00:47

score 1 · Answer 13 · answered Nov 30 '16 at 05:17

It depends on the context suppose if you want to check single literal like(any single word a,e,w,..etc) in is enough

original_word ="hackerearcth"
for 'h' in original_word:
      print("YES")

if you want to check any of the character among the original_word: make use of

if any(your_required in yourinput for your_required in original_word ):

if you want all the input you want in that original_word,make use of all simple

original_word = ['h', 'a', 'c', 'k', 'e', 'r', 'e', 'a', 'r', 't', 'h']
yourinput = str(input()).lower()
if all(requested_word in yourinput for requested_word in original_word):
    print("yes")

What would be yourinput? I can recognise two things: the sentence where I'm looking for something. The array of words I'm looking for. But you describe three variables and I can't get what the third one is. — mayid, May 19 '19 at 22:43

score 1 · Answer 14 · edited Nov 27 '17 at 00:27

1

flog = open('test.txt', 'r')
flogLines = flog.readlines()
strlist = ['SUCCESS', 'Done','SUCCESSFUL']
res = False
for line in flogLines:
     for fstr in strlist:
         if line.find(fstr) != -1:
            print('found') 
            res = True


if res:
    print('res true')
else: 
    print('res false')

edited Nov 27 '17 at 00:27

Stephen Rauch

44,696
30
102
125

answered Nov 26 '17 at 23:58

LeftSpace

74
1
7

score 1 · Answer 15 · answered Jan 25 '18 at 13:48

1

I would use this kind of function for speed:

def check_string(string, substring_list):
    for substring in substring_list:
        if substring in string:
            return True
    return False

answered Jan 25 '18 at 13:48

Ivan Mikhailov

11
1

score 0 · Answer 16 · answered Jun 15 '18 at 21:17

data = "firstName and favoriteFood"
mandatory_fields = ['firstName', 'lastName', 'age']


# for each
for field in mandatory_fields:
    if field not in data:
        print("Error, missing req field {0}".format(field));

# still fine, multiple if statements
if ('firstName' not in data or 
    'lastName' not in data or
    'age' not in data):
    print("Error, missing a req field");

# not very readable, list comprehension
missing_fields = [x for x in mandatory_fields if x not in data]
if (len(missing_fields)>0):
    print("Error, missing fields {0}".format(", ".join(missing_fields)));

Spirit of the Void · Answer 17 · 2022-03-28T15:44:17.410

If you want exact matches of words then consider word tokenizing the target string. I use the recommended word_tokenize from nltk:

from nltk.tokenize import word_tokenize

Here is the tokenized string from the accepted answer:

a_string = "A string is more than its parts!"
tokens = word_tokenize(a_string)
tokens
Out[46]: ['A', 'string', 'is', 'more', 'than', 'its', 'parts', '!']

The accepted answer gets modified as follows:

matches_1 = ["more", "wholesome", "milk"]
[x in tokens for x in matches_1]
Out[42]: [True, False, False]

As in the accepted answer, the word "more" is still matched. If "mo" becomes a match string, however, the accepted answer still finds a match. That is a behavior I did not want.

matches_2 = ["mo", "wholesome", "milk"]
[x in a_string for x in matches_1]
Out[43]: [True, False, False]

Using word tokenization, "mo" is no longer matched:

[x in tokens for x in matches_2]
Out[44]: [False, False, False]

That is the additional behavior that I wanted. This answer also responds to the duplicate question here.

Check if multiple strings exist in another string

17 Answers17

Linked

Related