4

I have a dataframe as follows

  1     Tertiary seen.
  2     No tertiary seen.
  3     No anything seen.
  4     Tertiary everywhere.

I want to add a column only when Tertiary is seen but not when the regex No.*\. is seen.

  1     Tertiary seen.        Tertiary
  2     No tertiary seen.       NA
  3     No anything seen.       NA
  4     Tertiary everywhere.  Tertiary 

I know I can use | in str_extract but & doesn't seem to be accepted as follows

Mydata$newcol<-str_extract(Mydata$Text,"[Tt]ertiary&!No.*[Tt]ertiary\\.")
Amanda
  • 10,799
  • 17
  • 59
  • 87
Sebastian Zeki
  • 6,430
  • 8
  • 47
  • 113

2 Answers2

2

You can try a Negative lookebehind for that, something like

Mydata$newcol[grepl("(?!No )Tertiary", Mydata$Text, perl = TRUE)] <- "Tertiary"
David Arenburg
  • 89,637
  • 17
  • 130
  • 188
1

"AND" pattern can be represented by "NOT (NOT A OR NOT B)" pattern. See also regex - Regular Expressions: Is there an AND operator? - Stack Overflow.

library(dplyr)
library(stringr)

Mydata <- data_frame(
  Text = c("Tertiary seen.",
           "No tertiary seen.",
           "No anything seen.",
           "Tertiary everywhere.")
  )

Mydata %>% 
  mutate(
    newcol = str_extract(Text, "^(^[Tt]ertiary|^No.*[Tt]ertiary\\.)")
  )
# A tibble: 4 × 2
# Text   newcol
# <chr>    <chr>
# 1       Tertiary seen. Tertiary
# 2    No tertiary seen.     <NA>
# 3    No anything seen.     <NA>
# 4 Tertiary everywhere. Tertiary
Community
  • 1
  • 1
Keiku
  • 6,905
  • 3
  • 33
  • 39