0

I have a time-series df of municipalities called, some of which have split off from another over time. I would like to sum up these pairs for the post-2004 years after they separated so that I can delete the split-off towns and create a real panel dataset without missing values for the earlier years.

These are two lists of town names as they are recorded in the df

past = c("town_1","town_2","town_3") # larger towns recorded throughout the dataset
present= c("town_1_split","town_2_split","town_3_split") # new municipalities that split off over time

This is a list of columns across which I want to sum the pairs above

col_list = c("col1","col2","col3")

I tried to solve the problem with a loop with seq_along

for(i in seq_along(past)){
  df[df$town == past[i] & df$year> 2004, col_list] = 
    sum(df[df$town == past[i] & df$year> 2004, col_list], 
        df[df$town == present[i] & df$year> 2004,col_list], na.rm = T)}

However, I receive the following error message, even though I use na.rm = T above.

Error: Can't use NA as row index in a tibble for assignment.
Backtrace:
 1. base::`[<-`(...)
 2. tibble:::`[<-.tbl_df`(...)
 3. tibble:::tbl_subassign(x, i, j, value, i_arg, j_arg, substitute(value))
 4. tibble:::tbl_expand_to_nrow(x, i)

Any help would be appreciated!

Márton
  • 25
  • 2
  • 3
    Please provide a [minimum reproducible dataset](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for your coding. – Adam Quek May 30 '22 at 11:35

0 Answers0