1

I have a large dataframe (df) and in the last column, all of the elements are showing up as

1055.0000.0

so the last 2 characters are always ".0". Whats the most efficient way to do this? the last columns name is always different so im not sure how to approach this. I have tried to loop over the pandas df but it takes too much memory and breaks the code. is there a way to do something like

df[ last column ] = df[ last column - last 2 characters]

or make a new df then append it in?

3 Answers3

3

Vectorized operations are almost always faster. .str method allows pandas to vectorize strings

df["last_col"].str[:-2]

Can time it using %%timeit magic command in jupyter notebook.

%%timeit
df.iloc[:, -1].str[-2:]
>>> 352 µs ± 4.68 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%%timeit
df["last_col"].str[:-2]
>>> 242 µs ± 4.76 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
haneulkim
  • 3,674
  • 6
  • 21
  • 54
0

Try with the str accessor:

df.iloc[:, -1] = df.iloc[:, -1].astype(str).str[-2:].astype(int)
U12-Forward
  • 65,118
  • 12
  • 70
  • 89
0

You could also use rsplit:

s = '105.0000.0'
s.rsplit('.0', 1)[0]

output:

105.0000
BlackMath
  • 1,392
  • 1
  • 7
  • 13