9

I have a data source where all the values are given as strings. When I create a Pandas dataframe from this data, all the columns are naturally of type object. I then want to let Pandas automatically convert any columns that look like numbers into a numeric types (e.g. int64, float64).

Pandas supposedly provides a function to do this automatic type inferencing: pandas.DataFrame.infer_objects(). It's also mentioned in this StackOverflow post. The documentation says:

Attempts soft conversion of object-dtyped columns, leaving non-object and unconvertible columns unchanged. The inference rules are the same as during normal Series/DataFrame construction.

However, the function is not working for me. In the reproducible example below, I have two string columns (value1 and value2) that unambiguously look like int and float values, respectively, but infer_objects() does not convert them from string to the appropriate numeric types.

import pandas as pd

# Create example dataframe.
data = [ ['Alice', '100', '1.1'], ['Bob', '200', '2.1'], ['Carl', '300', '3.1']]
df = pd.DataFrame(data, columns=['name', 'value1', 'value2'])

print(df)

#     name value1 value2
# 0  Alice    100    1.1
# 1    Bob    200    2.1
# 2   Carl    300    3.1

print(df.info())

# Data columns (total 3 columns):
#  #   Column  Non-Null Count  Dtype 
# ---  ------  --------------  ----- 
#  0   name    3 non-null      object
#  1   value1  3 non-null      object
#  2   value2  3 non-null      object
# dtypes: object(3)

df = df.infer_objects() # Should convert value1 and value2 columns to numerics.

print(df.info())

# Data columns (total 3 columns):
#  #   Column  Non-Null Count  Dtype 
# ---  ------  --------------  ----- 
#  0   name    3 non-null      object
#  1   value1  3 non-null      object
#  2   value2  3 non-null      object
# dtypes: object(3)

Any help would be appreciated.

stackoverflowuser2010
  • 34,189
  • 37
  • 159
  • 200

2 Answers2

3

Or further to @wwnde same solution slightly different,

df["value1"] = pd.to_numeric(df["value1"])
df["value2"] = pd.to_numeric(df["value2"])

EDIT: This is an interesting question and I'm surprised that pandas doesn't convert obvious string floats and integers as you show.

However, this small code can get you through the dataframe and convert your columns.

data = [["Alice", "100", "1.1"], ["Bob", "200", "2.1"], ["Carl", "300", "3.1"]]
df = pd.DataFrame(data, columns=["name", "value1", "value2"])

print(df.info(), "\n")

RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   name    3 non-null      object
 1   value1  3 non-null      object
 2   value2  3 non-null      object
dtypes: object(3)

cols = df.columns
for c in cols:
    try:
        df[c] = pd.to_numeric(df[c])
    except:
        pass

print(df.info())

RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   name    3 non-null      object 
 1   value1  3 non-null      int64  
 2   value2  3 non-null      float64
dtypes: float64(1), int64(1), object(1)
run-out
  • 2,979
  • 1
  • 8
  • 23
  • 4
    My question's point is that I would like Pandas to infer the types for me. I wouldn't know that `value1` and `value2` were numbers. – stackoverflowuser2010 May 09 '20 at 22:42
  • @stackoverflowuser2010 ```infer_objects``` wouldnt work the way you want, if you check the documentation [here](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.infer_objects.html) they arent taking the first row which has a string... – Hozayfa El Rifai May 09 '20 at 22:47
  • Interesting question. Also I came across this interesting [article](https://rushter.com/blog/pandas-data-type-inference/) that might help understanding. – run-out May 09 '20 at 23:08
  • The documentation for `infer_objects()` says: `The inference rules are the same as during normal Series/DataFrame construction.`. Whenever I run `pd.read_csv()` to build a new dataframe, that function correctly infers the data types. – stackoverflowuser2010 May 09 '20 at 23:15
1

df_new = df.convert_dtypes() may help. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.convert_dtypes.html

cpearce95
  • 11
  • 1