3590

I'm looking for a string.contains or string.indexof method in Python.

I want to do:

if not somestring.contains("blah"):
   continue
Peter Mortensen
  • 30,030
  • 21
  • 100
  • 124
Blankman
  • 248,432
  • 309
  • 736
  • 1,161

10 Answers10

7787

Use the in operator:

if "blah" not in somestring: 
    continue
Mateen Ulhaq
  • 21,459
  • 16
  • 82
  • 123
Michael Mrozek
  • 161,243
  • 28
  • 165
  • 171
  • 366
    Under the hood, Python will use `__contains__(self, item)`, `__iter__(self)`, and `__getitem__(self, key)` in that order to determine whether an item lies in a given contains. Implement at least one of those methods to make `in` available to your custom type. – BallpointBen Aug 17 '18 at 07:02
  • 61
    Just make sure that somestring won't be None. Otherwise you get a `TypeError: argument of type 'NoneType' is not iterable` – Big Pumpkin Oct 10 '18 at 22:44
  • 14
    For strings, does the Python `in` operator use the Rabin-Carp algorithm? – Sam Chats Dec 18 '18 at 20:23
  • 2
    This is inconsistent and ugly in code like `".so." in filename or filename.endswith(".blah")`. – Kaz Feb 12 '19 at 20:24
  • ^ I meant `filename.endswith(".so")`. – Kaz Feb 14 '19 at 17:54
  • 10
    @SamChats see https://stackoverflow.com/questions/18139660/python-string-in-operator-implementation-algorithm-and-time-complexity for the implementation details (in CPython; afaik the language specification does not mandate any particular algorithm here). – Christoph Burschka Feb 28 '19 at 15:34
  • 4
    @Kaz It should be ugly, since you're thinking at the wrong abstraction level. On the other hand, `'.so' in filepath.suffixes` is quite beautiful and explicitly saying what you really want to do. – Veky Jul 20 '19 at 20:28
  • @Veky File suffixes are not an abstraction level; they're just a hack. – Kaz Jul 20 '19 at 21:24
  • 2
    @Kaz Possibly, but can you explain why do you need to check such a weird condition on the filename? To me, it only makes sense if you want to check whether the name has a certain suffix - but maybe I'm not imaginative enough. :-) – Veky Jul 22 '19 at 02:33
  • 4
    This Python overload of `in` for strings looks a tad inconsistent and ugly to me (although undoubtly practical) as I'm used to interpret "in" as "is an element of" and that breaks here -- compare `"blah" in mystring` with `"blah" in list(mystring)` ... – Fred vdP Aug 30 '19 at 10:29
  • @FredvdP it was so a long time ago. Previous versions of Python had only `in` for singleton strings. You had to use .find for substrings. But of course, this was much more practical. – Veky Sep 17 '19 at 08:50
  • err: "you can use the "in" operator", followed by an example with the "not in" operator. Maybe this should be changed. – Jean-François Fabre Sep 22 '19 at 17:10
  • @FredvdP Strings are a bit different than other containers anyway, since there is no type for single characters, in python. I don't think it's unreasonable that the `in` operator effectively searches the collection of all substrings, rather than the collection of all characters, since there are no characters. – Stef Oct 29 '21 at 19:38
  • but is there an inverse to in? I.e. "if a in b" is nice, but what if I want b before a for syntax reasons, i.e. "if b contains a"... the reason I want this: is that I have 50 asserts for b, b is always first in the order, hate to have 1/50 asserts where it isn't. – gunslingor Dec 10 '21 at 10:16
  • Side note: Instead of `continue` to continue to the next iteration you can `break` a loop, which means going to the next code line outside the loop and so stop the looping. – Timo Mar 03 '22 at 08:04
845

If it's just a substring search you can use string.find("substring").

You do have to be a little careful with find, index, and in though, as they are substring searches. In other words, this:

s = "This be a string"
if s.find("is") == -1:
    print("No 'is' here!")
else:
    print("Found 'is' in the string.")

It would print Found 'is' in the string. Similarly, if "is" in s: would evaluate to True. This may or may not be what you want.

eldarerathis
  • 34,279
  • 10
  • 88
  • 93
  • 94
    +1 for highlighting the gotchas involved in substring searches. the obvious solution is `if ' is ' in s:` which will return `False` as is (probably) expected. – aaronasterling Aug 09 '10 at 03:22
  • 126
    @aaronasterling Obvious it may be, but not entirely correct. What if you have punctuation or it's at the start or end? What about capitalisation? Better would be a case insensitive regex search for `\bis\b` (word boundaries). – Bob Nov 08 '12 at 00:07
  • 2
    Why would this not be what the OP wants – uh_big_mike_boi Feb 18 '22 at 03:55
437

Does Python have a string contains substring method?

99% of use cases will be covered using the keyword, in, which returns True or False:

'substring' in any_string

For the use case of getting the index, use str.find (which returns -1 on failure, and has optional positional arguments):

start = 0
stop = len(any_string)
any_string.find('substring', start, stop)

or str.index (like find but raises ValueError on failure):

start = 100 
end = 1000
any_string.index('substring', start, end)

Explanation

Use the in comparison operator because

  1. the language intends its usage, and
  2. other Python programmers will expect you to use it.
>>> 'foo' in '**foo**'
True

The opposite (complement), which the original question asked for, is not in:

>>> 'foo' not in '**foo**' # returns False
False

This is semantically the same as not 'foo' in '**foo**' but it's much more readable and explicitly provided for in the language as a readability improvement.

Avoid using __contains__

The "contains" method implements the behavior for in. This example,

str.__contains__('**foo**', 'foo')

returns True. You could also call this function from the instance of the superstring:

'**foo**'.__contains__('foo')

But don't. Methods that start with underscores are considered semantically non-public. The only reason to use this is when implementing or extending the in and not in functionality (e.g. if subclassing str):

class NoisyString(str):
    def __contains__(self, other):
        print(f'testing if "{other}" in "{self}"')
        return super(NoisyString, self).__contains__(other)

ns = NoisyString('a string with a substring inside')

and now:

>>> 'substring' in ns
testing if "substring" in "a string with a substring inside"
True

Don't use find and index to test for "contains"

Don't use the following string methods to test for "contains":

>>> '**foo**'.index('foo')
2
>>> '**foo**'.find('foo')
2

>>> '**oo**'.find('foo')
-1
>>> '**oo**'.index('foo')

Traceback (most recent call last):
  File "<pyshell#40>", line 1, in <module>
    '**oo**'.index('foo')
ValueError: substring not found

Other languages may have no methods to directly test for substrings, and so you would have to use these types of methods, but with Python, it is much more efficient to use the in comparison operator.

Also, these are not drop-in replacements for in. You may have to handle the exception or -1 cases, and if they return 0 (because they found the substring at the beginning) the boolean interpretation is False instead of True.

If you really mean not any_string.startswith(substring) then say it.

Performance comparisons

We can compare various ways of accomplishing the same goal.

import timeit

def in_(s, other):
    return other in s

def contains(s, other):
    return s.__contains__(other)

def find(s, other):
    return s.find(other) != -1

def index(s, other):
    try:
        s.index(other)
    except ValueError:
        return False
    else:
        return True



perf_dict = {
'in:True': min(timeit.repeat(lambda: in_('superstring', 'str'))),
'in:False': min(timeit.repeat(lambda: in_('superstring', 'not'))),
'__contains__:True': min(timeit.repeat(lambda: contains('superstring', 'str'))),
'__contains__:False': min(timeit.repeat(lambda: contains('superstring', 'not'))),
'find:True': min(timeit.repeat(lambda: find('superstring', 'str'))),
'find:False': min(timeit.repeat(lambda: find('superstring', 'not'))),
'index:True': min(timeit.repeat(lambda: index('superstring', 'str'))),
'index:False': min(timeit.repeat(lambda: index('superstring', 'not'))),
}

And now we see that using in is much faster than the others. Less time to do an equivalent operation is better:

>>> perf_dict
{'in:True': 0.16450627865128808,
 'in:False': 0.1609668098178645,
 '__contains__:True': 0.24355481654697542,
 '__contains__:False': 0.24382793854783813,
 'find:True': 0.3067379407923454,
 'find:False': 0.29860888058124146,
 'index:True': 0.29647137792585454,
 'index:False': 0.5502287584545229}

How can in be faster than __contains__ if in uses __contains__?

This is a fine follow-on question.

Let's disassemble functions with the methods of interest:

>>> from dis import dis
>>> dis(lambda: 'a' in 'b')
  1           0 LOAD_CONST               1 ('a')
              2 LOAD_CONST               2 ('b')
              4 COMPARE_OP               6 (in)
              6 RETURN_VALUE
>>> dis(lambda: 'b'.__contains__('a'))
  1           0 LOAD_CONST               1 ('b')
              2 LOAD_METHOD              0 (__contains__)
              4 LOAD_CONST               2 ('a')
              6 CALL_METHOD              1
              8 RETURN_VALUE

so we see that the .__contains__ method has to be separately looked up and then called from the Python virtual machine - this should adequately explain the difference.

Russia Must Remove Putin
  • 337,988
  • 84
  • 391
  • 326
  • 10
    Why should one avoid `str.index` and `str.find`? How else would you suggest someone find the index of a substring instead of just whether it exists or not? (or did you mean avoid using them in place of contains - so don't use `s.find(ss) != -1` instead of `ss in s`?) – coderforlife Jun 10 '15 at 03:35
  • 4
    Precisely so, although the intent behind the use of those methods may be better addressed by elegant use of the `re` module. I have not yet found a use for str.index or str.find myself in any code I have written yet. – Russia Must Remove Putin Jun 10 '15 at 03:39
  • Please extend your answer to advice against using `str.count` as well (`string.count(something) != 0`). *shudder* – cs95 Jun 05 '19 at 03:05
  • How does the [`operator` module version](https://docs.python.org/3/library/operator.html#operator.contains) perform? – jpmc26 Aug 18 '19 at 19:30
  • @jpmc26 it's the same as `in_` above - but with a stackframe around it, so it's slower than that: https://github.com/python/cpython/blob/3.7/Lib/operator.py#L153 – Russia Must Remove Putin Aug 18 '19 at 23:34
  • 2
    This is an excellent answer to a universal need in Python. Thanks for providing some detailed explanations ! – Rich Lysakowski PhD Aug 29 '20 at 14:12
  • How can `in` be faster than `__contains__` if `in` uses `__contains__`? – burningfennec May 30 '21 at 07:59
  • 1
    @burningfennec I addressed your follow-on question at the end of the answer above. – Russia Must Remove Putin May 30 '21 at 22:35
190

if needle in haystack: is the normal use, as @Michael says -- it relies on the in operator, more readable and faster than a method call.

If you truly need a method instead of an operator (e.g. to do some weird key= for a very peculiar sort...?), that would be 'haystack'.__contains__. But since your example is for use in an if, I guess you don't really mean what you say;-). It's not good form (nor readable, nor efficient) to use special methods directly -- they're meant to be used, instead, through the operators and builtins that delegate to them.

Cristian Ciupitu
  • 19,240
  • 7
  • 48
  • 73
Alex Martelli
  • 811,175
  • 162
  • 1,198
  • 1,373
123

in Python strings and lists

Here are a few useful examples that speak for themselves concerning the in method:

>>> "foo" in "foobar"
True
>>> "foo" in "Foobar"
False
>>> "foo" in "Foobar".lower()
True
>>> "foo".capitalize() in "Foobar"
True
>>> "foo" in ["bar", "foo", "foobar"]
True
>>> "foo" in ["fo", "o", "foobar"]
False
>>> ["foo" in a for a in ["fo", "o", "foobar"]]
[False, False, True]

Caveat. Lists are iterables, and the in method acts on iterables, not just strings.

If you want to compare strings in a more fuzzy way to measure how "alike" they are, consider using the Levenshtein package

Here's an answer that shows how it works.

firelynx
  • 28,634
  • 8
  • 83
  • 95
53

If you are happy with "blah" in somestring but want it to be a function/method call, you can probably do this

import operator

if not operator.contains(somestring, "blah"):
    continue

All operators in Python can be more or less found in the operator module including in.

Jeffrey04
  • 5,768
  • 10
  • 41
  • 65
47

So apparently there is nothing similar for vector-wise comparison. An obvious Python way to do so would be:

names = ['bob', 'john', 'mike']
any(st in 'bob and john' for st in names) 
>> True

any(st in 'mary and jane' for st in names) 
>> False
Ufos
  • 2,740
  • 1
  • 29
  • 34
  • 1
    That's because there is a bajillion ways of creating a Product from atomic variables. You can stuff them in a tuple, a list (which are forms of Cartesian Products and come with an implied order), or they can be named properties of a class (no a priori order) or dictionary values, or they can be files in a directory, or whatever. Whenever you can uniquely identify (iter or getitem) something in a 'container' or 'context', you can see that 'container' as a sort of vector and define binary ops on it. https://en.wikipedia.org/wiki/Monoidal_category#Free_strict_monoidal_category – Niriel Aug 10 '15 at 09:50
  • 1
    Worth nothing that `in` should not be used with lists because it does a linear scan of the elements and is slow compared. Use a set instead, especially if membership tests are to be done repeatedly. – cs95 Jun 05 '19 at 03:06
32

You can use y.count().

It will return the integer value of the number of times a sub string appears in a string.

For example:

string.count("bah") >> 0
string.count("Hello") >> 1
Peter Mortensen
  • 30,030
  • 21
  • 100
  • 124
Brandon Bailey
  • 773
  • 6
  • 12
  • 13
    counting a string is costly when you just want to _check_ if it's there... – Jean-François Fabre May 16 '19 at 05:53
  • thats why I provided multiple methods – Brandon Bailey May 16 '19 at 09:24
  • 3
    methods that exist in the original post from 2010 so I ended up editing them out, with consensus from community (see meta post https://meta.stackoverflow.com/questions/385063/popular-question-answers-cleanup) – Jean-François Fabre May 16 '19 at 11:38
  • Well there's only a finite number of ways to achieve what OP asked. Are you planning on developing the Python language and introducing a NEW method for sub-string querying? – Brandon Bailey May 16 '19 at 11:46
  • 19
    no. My point is "why answering the exact same thing as others did 9 years ago" ? – Jean-François Fabre May 16 '19 at 11:48
  • 1
    I've posted 3 valid methods to achieve the goal OP and other viewers intend(ed) to achieve. Given the nature of the information provided, its unreasonable to expect all answers to be %100 unique. The accepted answer didn't contain the 2 other methods I provided and so I expanded the valid answers list to further help anyone coming across this issue. Why are you breathing when there are already people that breath? – Brandon Bailey May 16 '19 at 11:54
  • 11
    because I'm moderating the site... I've asked the question on meta https://meta.stackoverflow.com/questions/385063/popular-question-answers-cleanup – Jean-François Fabre May 16 '19 at 11:55
  • 2
    then If you have the authority to remove it then remove it, else do what you must and move on. IMO this answer adds value, which is reflected by up-votes from users. – Brandon Bailey May 16 '19 at 12:06
  • 1
    I agree, I had an in depth answer that proposed 3 possible solutions. but this was changed by Jean-Francois Fabre to be what it currently is. Not sure why he would change it so. – Brandon Bailey Jun 05 '19 at 09:34
  • 7
    Shifting right is almost certainly not what you want to do here. – rsandwick3 Mar 28 '20 at 03:53
26

Here is your answer:

if "insert_char_or_string_here" in "insert_string_to_search_here":
    #DOSTUFF

For checking if it is false:

if not "insert_char_or_string_here" in "insert_string_to_search_here":
    #DOSTUFF

OR:

if "insert_char_or_string_here" not in "insert_string_to_search_here":
    #DOSTUFF
rassa45
  • 3,408
  • 1
  • 28
  • 43
11

You can use regular expressions to get the occurrences:

>>> import re
>>> print(re.findall(r'( |t)', to_search_in)) # searches for t or space
['t', ' ', 't', ' ', ' ']
Jean-François Fabre
  • 131,796
  • 23
  • 122
  • 195
Muskovets
  • 441
  • 8
  • 16