Fastest way to propagate value through a pandas dataframe

Asked Aug 08 '21 at 15:56

Active Aug 08 '21 at 16:05

Viewed 52 times

I am collecting some data, which I am manipulating (via csv) as a (large-ish) dataframe.

Some of the information for the dataframe is only available to me at a certain time. To get around this, I write the raw data as such:

id    col2   col3  time_sensitive_data
id1  data2  data3        0            (when most data is available)
id1      0    0     time_sens_data

Then, when I analyse the data, I need to propagate this `time_sensitive_datà through the dataframe by the column 'id'. At the moment I do this:

ids = data['id'].unique()
for id in ids:
    current = data.loc[data['id'] == id]
    current_time_sensitive_info = current['time_sensitive_data'].max()
    data.loc[(data['id'] == id), 'time_sensitive_data'] = current_time_sensitive_info

This solution works, but is painfully slow (10 mins+). Is there a faster way to achieve this result?

edited Aug 08 '21 at 16:05

tripleee

158,107
27
234
292

asked Aug 08 '21 at 15:56

James_yf

2

Difficult to say without easily reproducible data (https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples), but a `groupby` with `transform` would probably be faster. – coffeinjunky Aug 08 '21 at 15:59

Fastest way to propagate value through a pandas dataframe

0 Answers0