3

I am trying to validate email addresses in my google bigquery table. I am using following query to do that but bigquery is throwing an error. I am not sure what is wrong in my code. I found the regex for validation on below story:- What characters are allowed in an email address?

select email
FROM my table
WHERE REGEXP_CONTAINS(email, regex)

here regex is the

Perl's RFC2822 regex

Mikhail Berlyant
  • 146,425
  • 8
  • 116
  • 187
  • 1
    Can you add the error message? it may due to the REGEXP expression or how your specify it. Notice `regex` should be quoted and with a character (`r`) before the quotes. Check [this example](https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#regexp_contains) about REGEXP in Standard SQL – enle lin Jan 25 '19 at 14:58

2 Answers2

3

Try this query

SELECT email
FROM table
where email REGEXP '[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}'
Gordon Linoff
  • 1,198,228
  • 53
  • 572
  • 709
Mad Dog Tannen
  • 6,976
  • 5
  • 30
  • 53
  • 1
    `` is really short for TLDs, see TLD list: https://www.iana.org/domains/root/db ; valid/invalid addresses: https://en.wikipedia.org/wiki/Email_address#Examples ; regex for RFC822 email address: http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html – Toto Jan 25 '19 at 15:12
0

I think this is cleaner...

SELECT email 
FROM table 
where REGEXP_CONTAINS(email, r"@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+")