0

I want to match the emails in following texts,

uma@cs.stanford.edu - match
uma at cs.Stanford.edu - match
http://infolab.stanford.edu/~widom/yearoff.h
we
genale.stanford.edu
n <A href="mailto:cheriton@cs.stanford.edu - match
hola   @  kirti.edu - match

Now I want to capture 2 parts of email address only like (uma) and (cs.stanford) in the email uma@cs.stanford.edu.

My current pattern is :

(\w+)[(\s+at\s+)|(\s*@\s*)]+(\w+|\w+\.\w+).edu

But it matches the string - infolab.stanford.edu - which I don't want. Can anybody suggest any modification on this?

Surjya Narayana Padhi
  • 7,571
  • 25
  • 77
  • 126
  • What do you want matched out of the `mailto:` line? Which dialect of regex are you using — what's the host language? The answers will differ between JavaScript, Python, C++, Ruby, C, Perl, Java, various dialects of SQL and PHP, to name but a few of the many possibilities. And for C, there are multiple possible regex packages, such as PCRE, or POSIX, or HS, or ... – Jonathan Leffler Oct 25 '15 at 05:35
  • Note that the square brackets form a funny character class in your regex. You use round brackets (parentheses) to enclose alternatives, not square brackets. – Jonathan Leffler Oct 25 '15 at 05:36
  • @JonathanLeffler: i used parentheses but it captures the 'at' or @ which I don't need. Is there any way, i can group and not capturing there? – Surjya Narayana Padhi Oct 25 '15 at 05:51
  • Since you've not identified the dialect of regex you're using, I don't know. It matters; PCRE (Perl compatible regular expressions) have ways of suppressing captures, but many other regex packages don't. I'm far from convinced you need the parentheses around `(\s+at\s+)` or `(\s*@\s*)`, so that capturing should be immaterial. Note that the real regex for matching email addresses is about a mile long. See [Using a regular expression to validate an email address](https://stackoverflow.com/questions/201323)! Note the third answer. – Jonathan Leffler Oct 25 '15 at 05:53

1 Answers1

0

As long as you understand that this regex doesn't verify the correctness of your email address, but merely acts as a quick first line of defense against malformed addresses, an easy fix to your regex is as follows:

([\w.]+)(?:\s+at\s+|\s*@\s*)(\w+|\w+\.\w+).edu

In particular your regex was missing addresses with usernames containing . (which for example my main email address uses), as well as had a messed up middle part (pretending it's a character class and something weird about letting it repeat??). You can see the results here: http://refiddle.com/2js1

Blindy
  • 60,429
  • 9
  • 84
  • 123