69

I have a large list of emails I am running through. A lot of the emails have typos. I am trying to build a string that will check valid emails.

this is what I have for regex.

def is_a_valid_email?(email)
  (email =~ /^(([A-Za-z0-9]*\.+*_+)|([A-Za-z0-9]+\-+)|([A-Za-z0-9]+\+)|([A-Za-z0-9]+\+))*[A-Z‌​a-z0-9]+@{1}((\w+\-+)|(\w+\.))*\w{1,63}\.[a-zA-Z]{2,4}$/i)
end

It passes if an email as underscores and only one period. I have a lot of emails that have more then one periods in the name itself. How do I check that in regex.

hello.me_1@email.com # <~~ valid
foo.bar#gmail.co.uk # <~~~ not valid
f.o.o.b.a.r@gmail.com # <~~~valid 
f...bar@gmail.com # <~~ not valid 
get_at_m.e@gmail  #<~~ valid

Can someone help me rewrite my regex ?

T0ny lombardi
  • 1,683
  • 2
  • 17
  • 34
  • Possible duplicate of http://stackoverflow.com/questions/201323/using-a-regular-expression-to-validate-an-email-address?rq=1 – CAustin Apr 10 '14 at 16:43
  • Refer [here](http://stackoverflow.com/questions/2049502/what-characters-are-allowed-in-email-address) for creating your RegEx. – tenub Apr 10 '14 at 16:58

13 Answers13

122

This has been built into the standard library since at least 2.2.1

URI::MailTo::EMAIL_REGEXP
Joshua Hunter
  • 3,820
  • 2
  • 11
  • 12
  • 14
    `'aa@aaa' =~ URI::MailTo::EMAIL_REGEXP` Doesn't work with this case. – Benjamin Jun 19 '18 at 12:16
  • 2
    If there is a period, but nothing after it, the regex will return nil. If there is something after the period or no period, it will pass. "This requirement is a willful violation of RFC 5322, which defines a syntax for e-mail addresses that is simultaneously too strict (before the "@" character), too vague (after the "@" character), and too lax (allowing comments, whitespace characters, and quoted strings in manners unfamiliar to most users) to be of practical use here." html.spec.whatwg.org/multipage/input.html#valid-e-mail-address – – Joshua Hunter Sep 11 '18 at 17:40
  • [a-zA-Z0-9.!\#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)+ (I removed trailing ? and replaced it with +, and then it works as expected) – Darpan May 28 '19 at 15:09
  • Thank you @Benjamin, I needed to know how to implement Josh Hunter's answer and didn't know the right way of comparing the email. – CWarrington Aug 05 '21 at 22:00
  • @JoshuaHunter Yes, the code is correct. However, since people input too many errors, it seems insufficient to check as that code in the real world. – Benjamin Aug 13 '21 at 04:58
120

TL;DR:

credit goes to @joshuahunter (below, upvote his answer). Included here so that people see it.

URI::MailTo::EMAIL_REGEXP

Old TL;DR

VALID_EMAIL_REGEX = /\A[\w+\-.]+@[a-z\d\-]+(\.[a-z\d\-]+)*\.[a-z]+\z/i

Original answer

You seem to be complicating things a lot, I would simply use:

VALID_EMAIL_REGEX = /\A[\w+\-.]+@[a-z\d\-]+(\.[a-z]+)*\.[a-z]+\z/i

which is taken from michael hartl's rails book

since this doesn't meet your dot requirement it can simply be ammended like so:

VALID_EMAIL_REGEX = /\A([\w+\-]\.?)+@[a-z\d\-]+(\.[a-z]+)*\.[a-z]+\z/i

As mentioned by CAustin, there are many other solutions.

EDIT:

it was pointed out by @installero that the original fails for subdomains with hyphens in them, this version will work (not sure why the character class was missing digits and hyphens in the first place).

VALID_EMAIL_REGEX = /\A[\w+\-.]+@[a-z\d\-]+(\.[a-z\d\-]+)*\.[a-z]+\z/i
Mike H-R
  • 7,486
  • 5
  • 41
  • 63
  • How can I add this validation for `email_field`? Currently, it is only checking for presence of `@`. I want that it verifies presence of `.` as well. – sshah Jan 19 '16 at 13:40
  • @sshahwhat do you mean by `email_field`? this regex checks that the email is `something_valid@somewhere.tld`, (see the `\.` parts in the second part of the regex.) – Mike H-R Jan 21 '16 at 18:28
  • @MikeH-R hmmm, that regex (Michael Hartl's) returns valid for only `@`. Is that a valid email? – Mohamad Jan 29 '16 at 00:34
  • @Mohamad the regex really shouldn't be matching just `@` by itself (though it could be argued as by John Carney below that that would make a more accurate match for an email). all of the groups with `+` require one or more matches. E.g. `[\w+\-.]+` at the start will match `a` or `aaaa` or `a+b.` but not the empty string. See [here for a demonstration](https://regex101.com/r/jJ4gL5/1) – Mike H-R Jan 29 '16 at 00:51
  • 'some@email.with-subdomain.com'.match(VALID_EMAIL_REGEX) => nil – installero Aug 14 '16 at 15:41
  • @installero , right you are. See edit for version that should work. Interestingly, it would have also failed for `"hello@sub.domain9.com"`. – Mike H-R Aug 15 '16 at 10:53
  • The regex validates this, and I'm pretty sure it shouldn't: pre..post@gmail.com This page says it shouldn't, anyway: https://en.wikipedia.org/wiki/Email_address#Syntax Huh: https://davidcel.is/posts/stop-validating-email-addresses-with-regex/ – globewalldesk Jan 27 '17 at 02:35
  • @globewalldesk yes, you've got a good point, see the answer by John Carney below. :) – Mike H-R Jan 27 '17 at 10:54
  • How about `.dot.@mail.com`? it passed. – Dapeng114 Jun 25 '18 at 04:21
  • Your regex does not support `foo+bar@example.org` or `foo@domain` which both are valid emails. And I'm not even talking about `"foo@bar.com"@example.org` – zmo Oct 16 '19 at 21:45
  • @zmo it does match the first thing you posted and doesn't match the second (and definitely doesn't match the third!). So a regular expression can't match the _actual_ email RFC (see the answer by @JohnCarney below). This is a simple regex to match most common domains (explicitly not matching the second example you posted). This is the same behaviour as the `URI::MailTo::EMAIL_REGEXP` below, which I'll update this answer with (this didn't exist when I wrote this answer.) – Mike H-R Oct 17 '19 at 11:47
  • Actually @johnCarney's point is also my point ☺ All one could check with a regex in a mail field is presence of at least one `@`, and illegal patterns (such as `@@` ou `..` in the RHS). You could also enforce a `.` in the RHS excluding local networks mails (or people with mail on a tld which is bad practice). – zmo Oct 18 '19 at 13:19
  • this also matches addresses with domain name **starting** with a hyphen: `e@-dom.com`. not so hot. – gl03 Nov 19 '20 at 14:37
27

Here's a great article by David Celis explaining why every single regular expression you can find for validating email addresses is wrong, including the ones above posted by Mike.

From the article:

The local string (the part of the email address that comes before the @) can contain the following characters:

    `! $ & * - = ` ^ | ~ # % ' + / ? _ { }` 

But guess what? You can use pretty much any character you want if you escape it by surrounding it in quotes. For example, "Look at all these spaces!"@example.com is a valid email address. Nice.

If you need to do a basic check, the best regular expression is simply /@/.

John Carney
  • 641
  • 6
  • 7
  • 4
    at a guess ... even though virtually anything can be in an email provided its quoted properly, in reality 99.99% of emails follow a reasonably standard format, and many systems will barf when handed an address they don't recognise as valid (even if it is). If you've got such a component then making sure the email address is *reasonable* as well as valid is important - particularly if it's part of a legacy system or something that can't be changed/updated. – Dave Smylie Sep 28 '16 at 23:06
  • 2
    That's fair, but if you have a space or dollar in your email address, I don't care if you can't use my system. And I suspect you knew what you were doing when you did it. – Grant Birchmeier Jul 31 '19 at 18:24
21

This one is more short and safe:

/\A[^@\s]+@[^@\s]+\z/

The regular is used in Devise gem. But it has some vulnerabilities for these values:

  ".....@a....",
  "david.gilbertson@SOME+THING-ODD!!.com",
  "a.b@example,com",
  "a.b@example,co.de"

I prefer to use regexp from the ruby library URI::MailTo::EMAIL_REGEXP

There is a gem for email validations

Email Validator

ilgam
  • 3,646
  • 1
  • 32
  • 26
  • 7
    Thanks for pointing me to `URI::MailTo::EMAIL_REGEXP`! Feels like the best approach since that may be better maintained than dumping a custom regexp somewhere in a codebase. – Carsten May 24 '17 at 08:01
  • `/\A[^@\s]+@[^@\s]+\z/.match?("r< – 7urkm3n Nov 05 '20 at 16:06
15

Nowadays Ruby provides an email validation regexp in its standard library. You can find it in the URI::MailTo module, it's URI::MailTo::EMAIL_REGEXP. In Ruby 2.4.1 it evaluates to

/\A[a-zA-Z0-9.!\#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*\z/

But I'd just use the constant itself.

kaikuchn
  • 760
  • 8
  • 15
  • This was written 3 years ago and if I recall, I was still using Ruby1.9. Possibly thats the reason why I didn't know about it? Thank you for your 1 liner though. – T0ny lombardi Oct 16 '17 at 12:27
  • Yeah, but three years later people still answer with their custom regular expressions. Anyway, I did not intend to attack you nor anyone else. I've changed the tone of my answer accordingly. – kaikuchn Jan 31 '18 at 16:35
  • THANK YOU. found this only after testing regexps for 1hr. – gl03 Nov 19 '20 at 14:42
5

I guess the example from the book can be improved to match emails with - in subdomain.

VALID_EMAIL_REGEX = /\A[\w+\-.]+@[a-z\d\-]+(\.[a-z\d\-]+)*\.[a-z]+\z/i

For example:

> 'some@email.with-subdomain.com' =~ VALID_EMAIL_REGEX
=> 0
installero
  • 7,647
  • 3
  • 32
  • 36
2

Yours is complicated indeed.

VALID_EMAIL_REGEX = /\A[\w+\-.]+@[a-z\d\-.]+\.[a-z]+\z/i

The above code should suffice.

Explanation of each piece of the expression above for clarification:

Start of regex:

/

Match the start of a string:

\A

At least one word character, plus, hyphen, or dot:

[\w+\-.]+

A literal "at sign":

@

A literal dot:

\.

At least one letter:

[a-z]+

Match the end of a string:

\z

End of regex:

/

Case insensitive:

i

Putting it back together again:

/\A[\w+\-.]+@[a-z\d\-.]+\.[a-z]+\z/i

Check out Rubular to conveniently test your expressions as you write them.

bdbasinger
  • 175
  • 1
  • 11
  • Nice one, but non-latin characters should be prohibited. For example, protonmail doesn't let me create this email: "helloworld\u20131234@protonmail.com" (helloworld–1234@protonmail.com). But your regexp validates this as a good email. – S.Goswami Apr 21 '22 at 08:19
2

Use

/\A[a-zA-Z0-9.!\#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)+\z/

Explanation below.

Whilst Joshua Hunter's answer is great URI::MailTo::EMAIL_REGEXP has a significant flaw in my opinion.

It matches fred@example which cause Net::SMTPSyntaxError: 501 5.1.3 Bad recipient address syntax errors.

URI::MailTo::EMAIL_REGEXP evaluates to

/\A[a-zA-Z0-9.!\#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*\z/

Changing the last star to a plus makes it better.

Note: this is pointed out in Darpan's comment to Joshua Hunter's answer, but I think it deserves its own answer to make it more visible.

diabolist
  • 3,651
  • 1
  • 10
  • 13
0

If you're using Devise, you can also use their included regex via:

Devise.email_regexp

which returns:

/\A[^@\s]+@[^@\s]+\z/
jeffdill2
  • 3,628
  • 2
  • 26
  • 43
0

The accepted answer suggest using URI::MailTo::EMAIL_REGEXP.

However, this regexp considers 1234@1234 as a valid e-mail address, which is something you probably don't want in a real life app (for instance, AWS SES will throw an exception if you try to send an e-mail to an address like this).

As Darpan points out in the comments, you can simply change the trailing ? in that regexp with +, and it will work as expected. The resulting regex is:

/\A[a-zA-Z0-9.!\#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)+\z/

Since the original URI::MailTo regexp, whilst technically valid according to the spec, is imho useless for our needs, we "fix" it in the Devise initializer.

# in config/initializers/devise.rb, put this at the beginning of the file
URI::MailTo.send(:remove_const, :EMAIL_REGEXP)
URI::MailTo.const_set(:EMAIL_REGEXP, /\A[a-zA-Z0-9.!\#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)+\z/)

# And then find `config.email_regexp` (it will already be there in the file) and change it to:
config.email_regexp = URI::MailTo::EMAIL_REGEXP

If you're wondering why this monkeypatch isn't put in a separate initializer file, you'd have to name the initializer file as 00_xxx.rb to make it load before the devise initializer. This is against Rails docs recommendations, which actually suggests you use a single initializer for cases like this:

If an initializer has code that relies on code in another initializer, you can combine them into a single initializer instead. This makes the dependencies more explicit, and can help surface new concepts within your application. Rails also supports numbering of initializer file names, but this can lead to file name churn.

sandre89
  • 4,019
  • 2
  • 35
  • 58
0

Ruby Multiple Emails validation with regex in the controller

emails = testcontroller@gmail.com,testregex@gmail.com,etc...
unless emails =~ /\A([^@\s]+)@((?:[-a-z0-9]+\.)+[a-z]{2,})\Z/i
   flash[:error] = "Invalid emails"
else
   Here send invitation and create users
end
Mad Physicist
  • 95,415
  • 23
  • 151
  • 231
-1

This works good for me:

if email.match?('[a-z0-9]+[_a-z0-9\.-]*[a-z0-9]+@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4})')
      puts 'matches!'
else
      puts 'it doesn\'t match!'
end
Michele Riva
  • 512
  • 9
  • 23
-1

try this!!!

/\[A-Z0-9._%+-\]+@\[A-Z0-9.-\]+\.\[AZ\]{2,4}/i

only email string selected

"Robert Donhan" <bob@email.com>sadfadf
Robert Donhan <bob@email.com>
"Robert Donhan" abc.bob@email.comasdfadf
Robert Donhan bob@email.comadfd
Kiry Meas
  • 958
  • 11
  • 25
  • 1
    The main concern with this regex would be inability to parse long top-level domains such as `.amsterdam` and many many more recently added ones. – SidOfc Jul 20 '19 at 11:41