0

I have a data frame with one column that contains strings of different length. For each row, I need to split the long string based on ', ' separator into individual string elements. Then, for each possible individual string I need to create a new column that contains a 1 if that string is present in the row and a 0 otherwise.

I've done it using loops below. However, maybe there is a more elegant way of doing it - e.g., using some existing data wrangling package? Thanks a lot! Here is my code:

# Create an example data frame with one column with strings:
df = data.frame(a = c("one, two, three",
                      "one, three",
                      "two, three, four, five",
                      "one, four, five",
                      "two"), stringsAsFactors = FALSE)
df
str(df$a)

# Split column 'a' into individual strings:
library(stringr)
split_list <- str_split(df$a, ", ")
split_list  # the result is a list of strings

# Grab unique values of all strings:
unique_strings <- sort(unique(unlist(split_list)))
unique_strings

# For each string in unique_strings create a variable with zeros:
df[unique_strings] <- 0
df

# Replace a zero with a 1 in a column if that row contains that string:
for(row in 1:nrow(df)){             # loop through rows
  for(string in split_list[[row]]){ # split a string; populate relevant columns
    df[row, string] <- 1
  }
}
df
user3245256
  • 1,682
  • 3
  • 22
  • 43

0 Answers0