I am collecting some data, which I am manipulating (via csv) as a (large-ish) dataframe.
Some of the information for the dataframe is only available to me at a certain time. To get around this, I write the raw data as such:
id col2 col3 time_sensitive_data
id1 data2 data3 0 (when most data is available)
id1 0 0 time_sens_data
Then, when I analyse the data, I need to propagate this `time_sensitive_datà through the dataframe by the column 'id'. At the moment I do this:
ids = data['id'].unique()
for id in ids:
current = data.loc[data['id'] == id]
current_time_sensitive_info = current['time_sensitive_data'].max()
data.loc[(data['id'] == id), 'time_sensitive_data'] = current_time_sensitive_info
This solution works, but is painfully slow (10 mins+). Is there a faster way to achieve this result?