Removing all non-numeric characters from string in Python

Question

How do we remove all non-numeric characters from a string in Python?

Possible duplicate: http://stackoverflow.com/questions/947776/strip-all-non-numeric-characters-except-for-from-a-string-in-python — ChristopheD, Aug 08 '09 at 17:15

score 360 · Accepted Answer · answered Aug 08 '09 at 17:25

360

>>> import re
>>> re.sub("[^0-9]", "", "sdkjh987978asd098as0980a98sd")
'987978098098098'

answered Aug 08 '09 at 17:25

Ned Batchelder

345,440
70
544
649

99

that could be re.sub(r"\D", "", "sdkjh987978asd098as0980a98sd") – newacct Aug 08 '09 at 19:07
3

and that could be: from re import sub – James Koss May 06 '19 at 21:34
How do I apply sub to a string? @JamesKoss – Lev Slinsen Mar 01 '21 at 12:52
how can I preserve negative numbers? – Ranjan Kumar Apr 20 '22 at 10:01

score 121 · Answer 2 · answered Aug 08 '09 at 17:16

121

Not sure if this is the most efficient way, but:

>>> ''.join(c for c in "abc123def456" if c.isdigit())
'123456'

The ''.join part means to combine all the resulting characters together without any characters in between. Then the rest of it is a list comprehension, where (as you can probably guess) we only take the parts of the string that match the condition isdigit.

answered Aug 08 '09 at 17:16

Mark Rushakoff

238,196
44
399
395

1

That does the opposite. I think you mean "not c.isdigit()" – Ryan R. Rosario Aug 08 '09 at 17:19
9

Remove all non-numeric == keep only numeric. – Mark Rushakoff Aug 08 '09 at 17:21
14

I like that this approach doesn't require pulling in re, for this simple function. – triunenature May 25 '15 at 03:09
Note that unlike implementations using str.translate, this solution works in both python 2.7 and 3.4. Thank you! – Alex Jan 20 '16 at 14:47
1

I prefer this alternative. Using a regex seems overkill to me. – alfredocambera Nov 23 '16 at 15:30
Important to note that this will error if the string you're trying to clean happens to already be an int. You'll get `TypeError: 'int' object is not iterable` – Preston Badeer Dec 02 '20 at 19:30

tzot · Answer 3 · 2020-07-11T11:29:33.330

22

This should work for both strings and unicode objects in Python2, and both strings and bytes in Python3:

# python <3.0
def only_numerics(seq):
    return filter(type(seq).isdigit, seq)

# python ≥3.0
def only_numerics(seq):
    seq_type= type(seq)
    return seq_type().join(filter(seq_type.isdigit, seq))

edited Jul 11 '20 at 11:29

answered Sep 07 '09 at 03:01

tzot

87,612
28
135
198

score 16 · Answer 4 · answered Nov 09 '18 at 15:49

16

@Ned Batchelder and @newacct provided the right answer, but ...

Just in case if you have comma(,) decimal(.) in your string:

import re
re.sub("[^\d\.]", "", "$1,999,888.77")
'1999888.77'

answered Nov 09 '18 at 15:49

kennyut

3,401
26
28

score 10 · Answer 5 · answered Sep 07 '12 at 10:37

Just to add another option to the mix, there are several useful constants within the string module. While more useful in other cases, they can be used here.

>>> from string import digits
>>> ''.join(c for c in "abc123def456" if c in digits)
'123456'

There are several constants in the module, including:

ascii_letters (abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ)
hexdigits (0123456789abcdefABCDEF)

If you are using these constants heavily, it can be worthwhile to covert them to a frozenset. That enables O(1) lookups, rather than O(n), where n is the length of the constant for the original strings.

>>> digits = frozenset(digits)
>>> ''.join(c for c in "abc123def456" if c in digits)
'123456'

''.join(c for c in "abc123def456" if c.isdigit()) works in my python 3.4 — Eino Mäkitalo, Feb 14 '16 at 00:11

Alberto Ibarra · Answer 6 · 2020-02-12T23:58:07.757

7

Many right answers but in case you want it in a float, directly, without using regex:

x= '$123.45M'

float(''.join(c for c in x if (c.isdigit() or c =='.'))

123.45

You can change the point for a comma depending on your needs.

change for this if you know your number is an integer

x='$1123'    
int(''.join(c for c in x if c.isdigit())

1123

edited Feb 12 '20 at 23:58

answered Feb 12 '20 at 23:46

Alberto Ibarra

71
1
4

score 5 · Answer 7 · answered Aug 08 '09 at 17:35

Fastest approach, if you need to perform more than just one or two such removal operations (or even just one, but on a very long string!-), is to rely on the translate method of strings, even though it does need some prep:

>>> import string
>>> allchars = ''.join(chr(i) for i in xrange(256))
>>> identity = string.maketrans('', '')
>>> nondigits = allchars.translate(identity, string.digits)
>>> s = 'abc123def456'
>>> s.translate(identity, nondigits)
'123456'

The translate method is different, and maybe a tad simpler simpler to use, on Unicode strings than it is on byte strings, btw:

>>> unondig = dict.fromkeys(xrange(65536))
>>> for x in string.digits: del unondig[ord(x)]
... 
>>> s = u'abc123def456'
>>> s.translate(unondig)
u'123456'

You might want to use a mapping class rather than an actual dict, especially if your Unicode string may potentially contain characters with very high ord values (that would make the dict excessively large;-). For example:

>>> class keeponly(object):
...   def __init__(self, keep): 
...     self.keep = set(ord(c) for c in keep)
...   def __getitem__(self, key):
...     if key in self.keep:
...       return key
...     return None
... 
>>> s.translate(keeponly(string.digits))
u'123456'
>>>

(1) Don't hard-code magic numbers; s/65536/sys.maxunicode/ (2) The dict is unconditionally "excessively large" because the input "may potentially" contain `(sys.maxunicode - number_of_non_numeric_chars)` entries. (3) consider whether string.digits may not be sufficient leading to a need to crack open the unicodedata module (4) consider re.sub(r'(?u)\D+', u'', text) for simplicity and potential speed. — John Machin, Aug 08 '09 at 23:31

Removing all non-numeric characters from string in Python

7 Answers7

Linked

Related