When concatenating strings using dplyr, group_by() and mutate() or summarize () with paste() and collapse, NA values are coerced to a string "NA".
When using str_c() instead of paste(), strings that are concatenated with NA are removed (?str_c: whenever a missing value is combined with another string the result will always be missing). When having such combinations of NA & non-NA values, how can I remove the NA instead of the non-NA in the concatenation?
See my example below:
library(dplyr)
library(stringr)
ID <- c(1,1,2,2,3,4)
string <- c(' asfdas ', 'sdf', NA,'sadf', 'NA', NA)
df <- data.frame(ID, string)
# ID string
# 1 1 asfdas
# 2 1 sdf
# 3 2 <NA> # ID 2 has both NA and non-NA values
# 4 2 sadf #
# 5 3 NA
# 6 4 <NA>
Both,
df%>%
group_by(ID)%>%
summarize(string = paste(string, collapse = "; "))%>%
distinct_all()
and
df_conca <-df%>%
group_by(ID)%>%
dplyr::mutate(string = paste(string, collapse = "; "))%>%
distinct_all()
result in
ID string
1 1 " asfdas ; sdf"
2 2 "NA; sadf"
3 3 "NA"
4 4 "NA" # NA coerced to "NA"
I.e. NA becomes "NA":
while
df %>%
group_by(ID)%>%
summarize(string = str_c(string, collapse = "; "))
results in:
ID string
1 1 " asfdas ; sdf"
2 2 NA
3 3 "NA"
4 4 NA
I.e. "sadf" is removed according to the str_c rule: NA combined with string, results in NA.
However, I would like to keep the true NA values (in e.g. 'ID' 4) and the strings only (in e.g. 'ID' 2), as such:
ID string
1 1 " asfdas ; sdf"
2 2 "sadf"
3 3 "NA"
4 4 NA
Ideally, I would like to stay within the dplyr workflow.
This question is an extension of Concatenating strings / rows using dplyr, group_by & collapse or summarize, but maintain NA values