30

I have a pandas data frame with different data types. I want to convert more than one column in the data frame to string type. I have individually done for each column but want to know if there is an efficient way?

So at present I am doing something like this:

repair['SCENARIO']=repair['SCENARIO'].astype(str)

repair['SERVICE_TYPE']= repair['SERVICE_TYPE'].astype(str)

I want a function that would help me pass multiple columns and convert them to strings.

jpp
  • 147,904
  • 31
  • 244
  • 302
Sayonti
  • 696
  • 2
  • 7
  • 13

3 Answers3

60

To convert multiple columns to string, include a list of columns to your above-mentioned command:

df[['one', 'two', 'three']] = df[['one', 'two', 'three']].astype(str)
# add as many column names as you like.

That means that one way to convert all columns is to construct the list of columns like this:

all_columns = list(df) # Creates list of all column headers
df[all_columns] = df[all_columns].astype(str)

Note that the latter can also be done directly (see comments).

sudonym
  • 3,460
  • 3
  • 31
  • 55
  • 8
    For all columns, how about `df = df.astype(str)` ? – jpp Jun 13 '18 at 23:20
  • Yes, also works, absolutely - I just posted this solution to stick with the concept of lists – sudonym Jun 13 '18 at 23:21
  • 1
    Thanks sudonym... I was actually looking for something like a function that would take columns in a data frame and convert them to string. I should be able to change the column names as required though the first solution works perfectly and I did implement it. – Sayonti Jun 14 '18 at 00:08
  • Is there any performance difference between the two? I tried `df = df.astype(str)` shape (50000, 23000) and it crashed (in interactive mode). Thank you – Long Aug 02 '19 at 02:39
  • Wondering why this doesn't works if the list of columns has a single element... – Gian Arauz Dec 14 '21 at 10:55
8

I know this is an old question, but I was looking for a way to turn all columns with an object dtype to strings as a workaround for a bug I discovered in rpy2. I'm working with large dataframes, so didn't want to list each column explicitly. This seemed to work well for me so I thought I'd share in case it helps someone else.

stringcols = df.select_dtypes(include='object').columns
df[stringcols] = df[stringcols].fillna('').astype(str)

The "fillna('')" prevents NaN entries from getting converted to the string 'nan' by replacing with an empty string instead.

Joe
  • 305
  • 2
  • 6
0

You can also use list comprehension:

df = [df[col_name].astype(str) for col_name in df.columns]

You can also insert a condition to test if the columns should be converted - for example:

df = [df[col_name].astype(str) for col_name in df.columns if 'to_str' in col_name]
melihozbek
  • 143
  • 1
  • 8
Amir F
  • 2,101
  • 14
  • 10