0

I have a backend that does not support emoji characters in all of its fields, so I want to block them directly in the frontend application. I'm in the register section, and I want to limit the possible characters for the email field. I know that RFC 5322 specifies that many particularities can be found in those addresses, including special characters. Even emoji can be put there (link).

I'm using a whitelist to implement this block.

What character should I whitelist to support the common emails without falling into whitelisting every characters supported by email addresses?

Eradash
  • 111

1 Answers1

0

The vast majority (probably over 99%) of the email addresses still use the pre-2012 rules described in Wikipedia:

The local-part of the email address may use any of these ASCII characters:

  • uppercase and lowercase Latin letters A to Z and a to z;
  • digits 0 to 9;
  • printable characters others than letters and digit !#$%&'*+-/=?^_`{|}~;
  • dot ., provided that it is not the first or last character unless quoted, and provided also that it does not appear consecutively unless quoted (e.g. John..Doe@example.com is not allowed but "John..Doe"@example.com is allowed);

So for starters, you can whitelist all ASCII characters; if you feel a little more fancy, you can use a regular expression, e.g. one of those listed here.

The HTML element input with type email, which is often used in web forms, uses the following regex for validation:

^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$

Yes, this does mean that letters diacritical marks like @JörgMittag's ö cannot be used: see this Regex101 example. Whether it's worth expanding the regex/character validation to non-ASCII characters depends on your user base. I see you're from Canada; there might be French-speaking users with accented letters in their names and email addresses. If those really need to be supported, check which characters your backend supports (e.g. all ISO-8859-1 characters) and just whitelist those.

Glorfindel
  • 3,137
  • 4
    FYI, whitelisting ASCII would make it impossible for me to enter my email address. – Jörg W Mittag Feb 11 '19 at 18:43
  • This is good, but please suggest that + is allowed. It's becoming more common as people realise that they can use it to make unique email addresses for different services and track what services end up spamming them. – Baldrickk Feb 12 '19 at 13:51
  • + is both in the Wikipedia quote as in the listed regex already ... – Glorfindel Feb 12 '19 at 14:03