0

I have a data.frame that has a key, fignum and a data field, codefile, but where fignum may be duplicated.

Where duplicates occur, I want to combine the codefile data fields into a single row, separated by ,. Here's my input:

> cf
   fignum                       codefile
8     4.6           04_6-cholera-water.R
9     P.3 04_P3a-cholera-neighborhoods.R
10    P.3       04_P3b-SnowMap-density.R
11    5.5    05_5-playfair-east-indies.R

> duplicated(cf[,"fignum"])
[1] FALSE FALSE  TRUE FALSE

The desired output combines the two "P.3" codefile values into one observation, to look like this:

> cf-wanted
   fignum                                                  codefile
8     4.6                                      04_6-cholera-water.R
9     P.3  04_P3a-cholera-neighborhoods.R, 04_P3b-SnowMap-density.R
10    5.5                               05_5-playfair-east-indies.R
user101089
  • 3,328
  • 1
  • 24
  • 50

1 Answers1

1

We could group_by by fignum and summarise

library(dplyr)
cf %>% 
  group_by(fignum) %>% 
  summarise(codefile = paste0(codefile, collapse = ', '), .groups = 'drop')
fignum codefile                                                
  <chr>  <chr>                                                   
1 4.6    04_6-cholera-water.R                                    
2 5.5    05_5-playfair-east-indies.R                             
3 P.3    04_P3a-cholera-neighborhoods.R, 04_P3b-SnowMap-density.R
TarJae
  • 43,365
  • 4
  • 14
  • 40