-2

I would like to extract the number out of a column and replace the initial column by the numbers. Thanks for any help.

code:

d = {'col1': [1, 2, 3, 4], 'col2': ['3 A', '4', '7 B', '9F']}
df = pd.DataFrame(data=d)
print(df)

output:

   col1 col2
0     1  3 A
1     2    4
2     3  7 B
3     4   9F

desired output:

   col1 col2
0     1    3
1     2    4
2     3    7
3     4    9

2 Answers2

0

Use str.extract:

df['col2'] = df['col2'].str.extract('(\d+)')
# use '^(\d+)' to limit to the leading number

or, for numeric type, combine with pandas.to_numeric:

df['col2'] = pd.to_numeric(df['col2'].str.extract('(\d+)', expand=False),
                           errors='coerce')

output:

   col1 col2
0     1    3
1     2    4
2     3    7
3     4    9
mozway
  • 81,317
  • 8
  • 19
  • 49
0
import pandas as pd

import re

d = {'col1': [1, 2, 3, 4], 'col2': ['3 A', '4', '7 B', '9F']}

df = pd.DataFrame(data=d)

col2 = df['col2'].tolist()

new_val = []


for i in col2:
    new_val.append(re.findall("\d+", i)[0])

df['col2'] = new_val

print(df)
Mayank Porwal
  • 31,737
  • 7
  • 30
  • 50
  • Please include an explanation with your answer to help readers understand how this works, and solves the problem. You can click the edit button at the bottom of your post to add an explanation. – Freddy Mcloughlan Jun 02 '22 at 23:13