5

I wanted to count the number of times that a string like 'aa' appears in 'aaa' (or 'aaaa').

The most obvious code gives the wrong (or at least, not the intuitive) answer:

'aaa'.count('aa')
1 # should be 2
'aaaa'.count('aa')
2 # should be 3

Does anyone have a simple way to fix this?

nivk
  • 665
  • 1
  • 6
  • 20

4 Answers4

10

From str.count() documentation:

Return the number of non-overlapping occurrences of substring sub in the range [start, end]. Optional arguments start and end are interpreted as in slice notation.

So, no. You are getting the expected result.

If you want to count number of overlapping matches, use regex:

>>> import re
>>> 
>>> len(re.findall(r'(a)(?=\1)', 'aaa'))
2

This finds all the occurrence of a, which is followed by a. The 2nd a wouldn't be captured, as we've used look-ahead, which is zero-width assertion.

Rohit Jain
  • 203,151
  • 43
  • 392
  • 509
6
haystack = "aaaa"
needle   = "aa"

matches  = sum(haystack[i:i+len(needle)] == needle 
               for i in xrange(len(haystack)-len(needle)+1))

# for Python 3 use range instead of xrange
kindall
  • 168,929
  • 32
  • 262
  • 294
1

The solution is not taking overlap into consideration.

Try this:

big_string = "aaaa"
substring = "aaa"
count = 0 

for char in range(len(big_string)):
    count += big_string[char: char + len(subtring)] == substring

print count
Lucas Ribeiro
  • 5,864
  • 2
  • 23
  • 28
0

You have to be careful, because you seem to looking for non-overlapping substrings. To fix this I'd do:

len([s.start() for s in re.finditer('(?=aa)', 'aaa')])

And if you don't care about the position where the substring starts you can do:

len([_ for s in re.finditer('(?=aa)', 'aaa')])

Although someone smarter than myself might be able to show that there are performances differences :)

mlnyc
  • 2,566
  • 2
  • 22
  • 29