90

I have a pandas dataframe "df". In this dataframe I have multiple columns, one of which I have to substring. Lets say the column name is "col". I can run a "for" loop like below and substring the column:

for i in range(0,len(df)):
  df.iloc[i].col = df.iloc[i].col[:9]

But I wanted to know, if there is an option where I don't have to use a "for" loop, and do it directly using an attribute.I have huge amount of data, and if I do this, the data will take a very long time process.

thenakulchawla
  • 4,675
  • 5
  • 29
  • 39

3 Answers3

191

Use the str accessor with square brackets:

df['col'] = df['col'].str[:9]

Or str.slice:

df['col'] = df['col'].str.slice(0, 9)
ayhan
  • 64,199
  • 17
  • 170
  • 189
  • 4
    This gives me the dreaded `SettingWithCopyWarning:` – demongolem Jun 17 '20 at 11:27
  • great solution! but curious about which one is faster on large dataset... And how about compare to `df['col'] = [x[:9] for x in df['col']]` – Peter Chen Oct 06 '20 at 22:02
  • You can use the suggested solution with `pd.options.mode.chained_assignment = None # default='warn' ` to get rid of the warning. Alternatively, you can look at these topics and lose a few minutes of your life: [link](https://stackoverflow.com/questions/42379818/correct-way-to-set-new-column-in-pandas-dataframe-to-avoid-settingwithcopywarnin) – Charles Mar 22 '21 at 10:04
7

In case the column isn't a string, use astype to convert it:

df['col'] = df['col'].astype(str).str[:9]
rachwa
  • 354
  • 1
  • 11
Elton da Mata
  • 105
  • 1
  • 5
-3

I needed to convert a single column of strings of form nn.n% to float. I needed to remove the % from the element in each row. The attend data frame has two columns.

attend.iloc[:,1:2]=attend.iloc[:,1:2].applymap(lambda x: float(x[:-1]))

Its an extenstion to the original answer. In my case it takes a dataframe and applies a function to each value in a specific column. The function removes the last character and converts the remaining string to float.

Radiumcola
  • 23
  • 2