I have a time-series df of municipalities called, some of which have split off from another over time. I would like to sum up these pairs for the post-2004 years after they separated so that I can delete the split-off towns and create a real panel dataset without missing values for the earlier years.
These are two lists of town names as they are recorded in the df
past = c("town_1","town_2","town_3") # larger towns recorded throughout the dataset
present= c("town_1_split","town_2_split","town_3_split") # new municipalities that split off over time
This is a list of columns across which I want to sum the pairs above
col_list = c("col1","col2","col3")
I tried to solve the problem with a loop with seq_along
for(i in seq_along(past)){
df[df$town == past[i] & df$year> 2004, col_list] =
sum(df[df$town == past[i] & df$year> 2004, col_list],
df[df$town == present[i] & df$year> 2004,col_list], na.rm = T)}
However, I receive the following error message, even though I use na.rm = T above.
Error: Can't use NA as row index in a tibble for assignment.
Backtrace:
1. base::`[<-`(...)
2. tibble:::`[<-.tbl_df`(...)
3. tibble:::tbl_subassign(x, i, j, value, i_arg, j_arg, substitute(value))
4. tibble:::tbl_expand_to_nrow(x, i)
Any help would be appreciated!