I have a data frame with a character column storing strings separated by commas, such as:
row_name my_col
<chr> <chr>
1 1 n, g, l, t, f, v
2 2 f, n, e
3 3 w
I want to transform that data frame to a data frame in which:
- have one column per each string value (in this example, one column per letter)
- each column is named after one of the letters
- the values in the data frame are either
TRUEorFALSE, corresponding to whether the letter in the (new) column name was in the original string for that row, such as:
## desired output
# A tibble: 3 x 9
row_name n g l t f v e w
<dbl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl>
1 1 TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE
2 2 TRUE FALSE FALSE FALSE TRUE FALSE TRUE FALSE
3 3 FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
Reproducible code
library(dplyr, warn.conflicts = FALSE)
sample_letters <- function() {
sample(letters, size = sample(1:6, 1))
}
set.seed(2021)
row_1 <- tibble(my_col = toString(sample_letters()))
row_2 <- tibble(my_col = toString(sample_letters()))
row_3 <- tibble(my_col = toString(sample_letters()))
my_df <- bind_rows(row_1, row_2, row_3, .id = "row_name")
my_df
#> # A tibble: 3 x 2
#> row_name my_col
#> <chr> <chr>
#> 1 1 n, g, l, t, f, v
#> 2 2 f, n, e
#> 3 3 w
Created on 2021-09-09 by the reprex package (v2.0.0)
I thought this task should be super simple, but it quickly became a nasty script , and I feel there should be a quite simple solution to this. I'll appreciate others' suggestions on this matter.
Thanks!