4

I am trying to compare to strings in Python and noticed that when a dash/hyphen is present in the string it will not equate identical strings. For example:

>>>teststring = 'newstring'
>>>teststring is 'newstring'
True

Then, if I add a dash

>>>teststring = 'new-string'
>>>teststring is 'new-string'
False

Why is that the case, and what would be the best way to compare strings with dashes?

Cœur
  • 34,719
  • 24
  • 185
  • 251
samuelschaefer
  • 594
  • 1
  • 8
  • 26

2 Answers2

4

you should never use is to compare equality anyway. is tests for identity. Use ==.

Frankly I don't know why 'newstring' is 'newstring'. I'm sure it varies based on your Python implementation as it seems like a memory-saving cache to re-use short strings.

However:

teststring = 'newstring'
teststring == 'newstring' # True

nextstring = 'new-string'
nextstring == 'new-string' # True

basically all is does is test ids to make sure they're identical.

id('new-string') # 48441808
id('new-string') # 48435352
# These change
id('newstring') # 48441728
id('newstring') # 48441728
# These don't, and I don't know why.
Adam Smith
  • 48,602
  • 11
  • 68
  • 105
  • 2
    See [About the changing id of a Python immutable string](http://stackoverflow.com/a/24245514) for why `is` works *sometimes*. – Martijn Pieters Jun 17 '14 at 18:03
  • 3
    From my answer there: *[T]he Python compiler will also intern any Python string stored as a constant, provided it is a valid identifier. The Python code object factory function PyCode_New will intern any string object that contains only letters, digits or an underscore*. – Martijn Pieters Jun 17 '14 at 18:05
  • Here's a deeper dive in what get's interned by default: http://guilload.com/python-string-interning/ – Ray Jun 06 '15 at 20:18
0

You should not use is for string comparison. Is checks if both objects are same. You should use equality operator == here. That compares the values of objects, rather than ids of objects.

In this case, looks like Python is doing some object optimizations for string objects and hence the behavior.

>>> teststring = 'newstring'
>>> id(teststring)
4329009776
>>> id('newstring')
4329009776
>>> teststring = 'new-string'
>>> id(teststring)
4329009840
>>> id('new-string')
4329009776
>>> teststring == 'new-string'
True
>>> teststring is 'new-string'
False
ronakg
  • 3,771
  • 20
  • 39
  • 1
    See [About the changing id of a Python immutable string](http://stackoverflow.com/a/24245514) about when Python interns strings (and identity tests work). – Martijn Pieters Jun 17 '14 at 18:04
  • Makes sense. So this is similar to what python does with 0-255 integer objects, which are always present in the memory all the time. Python never creates new objects for these ints, just adds ref counts as and when needed. – ronakg Jun 17 '14 at 18:07
  • 1
    Indeed. It is an implementation detail however, not something your code should rely on. – Martijn Pieters Jun 17 '14 at 18:08