1

I have a column with a string containing letter with a number. The number is sometimes with and sometimes without a decimal point. I want to convert the number to float. example dataframe:

df = pd.DataFrame({'colA': ['q7.8', 'g5.3', '4.5r', 'john7']})

The updated column should contain: 7.8, 5.3, 4.5 7.0.

There are no systemic rules for the number of letters and their location.

Thanks

Henry Ecker
  • 31,792
  • 14
  • 29
  • 50
gtomer
  • 3,470
  • 1
  • 6
  • 16

1 Answers1

3

Assuming only one number is in each cell we can use str.extract then astype to convert to float:

df['colA'] = df['colA'].str.extract(r'(\d+(?:\.\d+)?)').astype('float')

There are many regex available at How to extract a floating number from a string if additional considerations like exponentionation or positive/negative are needed like:

df['colA'] = df['colA'].str.extract(
    r'([-+]?(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][-+]?\d+)?)'
).astype('float')

*Note with this approach extract needs exactly 1 capturing group.

df:

   colA
0   7.8
1   5.3
2   4.5
3   7.0
Henry Ecker
  • 31,792
  • 14
  • 29
  • 50