3

I have a list of records with IDs (some of which are usernames and some of which are email addresses). I'd like to know how many are email addresses. I was thinking an easy way to do this would be count how many of the rows contain the @ symbol but I can't get a function to work to do this. Any help is appreciated!

Sample dataset:

x <- c("1234@aol.com", "johnnyApple", "tomb@gmail.com")
Steven Beaupré
  • 20,785
  • 7
  • 54
  • 76
T D
  • 133
  • 2
  • 12
  • See also http://stackoverflow.com/questions/19341554/regular-expression-in-base-r-regex-to-identify-email-address – Sam Firke May 04 '15 at 14:19

3 Answers3

6

Both answers so far are entirely correct, but if you're looking for an email address, a method that's less likely to have false positives is:

x <- c("1234@aol.com", "johnnyApple", "tomb@gmail.com")  
sum(regexpr(".*@.*\\..*",x) != -1)
Eric Brooks
  • 647
  • 5
  • 13
  • You could even go further and require ".com", ".edu" etc, although then you risk false negatives. – Eric Brooks May 04 '15 at 14:06
  • 1
    Good thinking... Though more like `sum(regexpr(".*@.*\\..*",x) != -1)` probably to match OPs desired output. A similar approach could be `sum(sub(".*(@).*\\..*", "\\1", x) == "@")` – David Arenburg May 04 '15 at 14:07
2

Try:

x <- c("1234@aol.com", "johnnyApple", "tomb@gmail.com")
sum(grepl("@", x))
Steven Beaupré
  • 20,785
  • 7
  • 54
  • 76
1

assuming you data is df, you can try

length(grep(pattern="@", df$V1))
[1] 2
Mamoun Benghezal
  • 5,084
  • 7
  • 25
  • 32