0

I have a dataset with a column with the symbol '|' (come from the interaction of 2 variables in a model), and I want to split it according this character.

The function separate works well with standard character, do you how I can specific the character '|' ?

library(tidyverse)
df <- data.frame(Interaction = c('var1|var2'))

# as expected
df %>% separate(Interaction, c('var1', 'var2'), sep = '1')
#   var1  var2
# 1  var |var2

# not as expected
df %>% separate(Interaction, c('var1', 'var2'), sep = '|')
#   var1 var2
# 1         v
demarsylvain
  • 1,973
  • 2
  • 12
  • 30

2 Answers2

2

We can either escape (\\) the | as it is a metacharacter for regex specifying for OR and the sep by default is in the regex mode

If we look at the ?separate documentation,

separate(data, col, into, sep = "[^[:alnum:]]+", remove = TRUE, convert = FALSE, extra = "warn", fill = "warn", ...)

and it is described as

sep - If character, is interpreted as a regular expression. The default value is a regular expression that matches any sequence of non-alphanumeric values.

df %>% 
  separate(Interaction, c('var1', 'var2'), sep = '\\|')

or place it in square brackets

df %>% 
   separate(Interaction, c('var1', 'var2'), sep = '[|]')
akrun
  • 789,025
  • 32
  • 460
  • 575
0

Vertical bar is a special character, that's why is not performing as expected:

df %>% separate(Interaction, c('var1', 'var2'), sep = '\\|')

That should solve the problem.

Emma
  • 26,487
  • 10
  • 35
  • 65