3

When concatenating strings using dplyr, group_by & collapse or summarize, NA values become a string "NA". How to avoid it?

See my example below:

ID <- c(1,1,2,3)
string <- c(' asfdas ', 'sdf', NA, 'NA')
df <- data.frame(ID, string)

Both,

df_conca <-df%>%
 group_by(ID)%>%
 summarize(string = paste(string, collapse = "; "))%>%
 distinct_all()

and

df_conca <-df%>%
 group_by(ID)%>%
 dplyr::mutate(string = paste(string, collapse = "; "))%>%
 distinct_all()

result in:

     ID string               
1     1 " asfdas ; sdf"
2     2 "NA"           
3     3 "NA" 

, but I would like to keep the NA values as such:

     ID string             
1     1 " asfdas ; sdf"
2     2 NA           
3     3 "NA" 

Ideally, I would like to stay within the dplyr workflow.

MsGISRocker
  • 486
  • 4
  • 16

1 Answers1

2

We may use str_c from the stringr package.

library(dplyr)
library(stringr)

df %>%
  group_by(ID)%>%
  summarize(string = str_c(string, collapse = "; "))

#     ID string         
#  <dbl> <chr>          
#1     1 " asfdas ; sdf"
#2     2  NA            
#3     3 "NA"           
Ronak Shah
  • 355,584
  • 18
  • 123
  • 178
  • Some relevant parts of `?str_c`: "whenever a missing value is combined with another string the result will always be missing"; "Missing inputs give missing outputs". (perhaps worth adding to the post?). Cheers – Henrik Sep 23 '21 at 15:03
  • @Henrik: Absolutely! I extended the question to cover also this possibility See [link](https://stackoverflow.com/questions/69303052/concatenating-strings-rows-using-dplyr-group-by-with-mutate-or-summarize)! Looking for solutions. – MsGISRocker Sep 23 '21 at 15:41