38

Given the following data frame:

import pandas as pd
import numpy as np
df = pd.DataFrame({'A':['1a',np.nan,'10a','100b','0b'],
                   })
df

    A
0   1a
1   NaN
2   10a
3   100b
4   0b

I'd like to extract the numbers from each cell (where they exist). The desired result is:

    A
0   1
1   NaN
2   10
3   100
4   0

I know it can be done with str.extract, but I'm not sure how.

Jon Clements
  • 132,101
  • 31
  • 237
  • 267
Dance Party
  • 2,966
  • 10
  • 35
  • 62

3 Answers3

84

Give it a regex capture group:

df.A.str.extract('(\d+)')

Gives you:

0      1
1    NaN
2     10
3    100
4      0
Name: A, dtype: object
Jon Clements
  • 132,101
  • 31
  • 237
  • 267
5

To answer @Steven G 's question in the comment above, this should work:

df.A.str.extract('(^\d*)')
Taming
  • 87
  • 1
  • 5
5

U can replace your column with your result using "assign" function:

df = df.assign(A = lambda x: x['A'].str.extract('(\d+)'))
Mehdi Golzadeh
  • 2,409
  • 1
  • 14
  • 26