Monday, June 24, 2013

Email Address Syntax Validation

Over half a decade ago, I read this wonderful article about email address syntax validation. The essence is that most email validation rules are too strict. Here are some email addresses that are valid format but have characters that are frequently excluded:

  • test+test@gmail.com
  • test_test@gmail.com
  • test-test@gmail.com
  • test!test@gmail.com
  • test#test@gmail.com
  • test$test@gmail.com
  • test%test@gmail.com
  • test*test@gmail.com
For more fun, send an email to your gmail email address and add a a plus sign to it like this: 
youraddress+anythingelse@gmail.com
Google will both send and receive that email because the address is valid.

Recently I tried to register an account on Starbucks.com and was told that my email address was invalid. I went to their report a problem page and submitted a ticket. Ironically their issue reporting page did not have a problem with my email address. An account was created for me and the details were emailed to my "invalid" email address.

Today I was booking a hotel reservation on www.ihg.com and guess what? Their validation is too strict and marked my email address as invalid. Sigh.

So how should you validate an email address syntax?

The HTML5 email input type works with all the mails listed above.

If you are using jQuery then use the jQuery validation plugin don't write your own. It will work with all the emails listed above.

If you need to to do your own regular expression validation then you can use this expression that will give the same results as the jQuery validation plugin. This can also be used when one is doing server side validation so that validation is consistent between the server and the client side:
/^((([a-z]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+(\.([a-z]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+)*)|((\x22)((((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(([\x01-\x08\x0b\x0c\x0e-\x1f\x7f]|\x21|[\x23-\x5b]|[\x5d-\x7e]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(\\([\x01-\x09\x0b\x0c\x0d-\x7f]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]))))*(((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(\x22)))@((([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?$/i
If you want a more complete regular expression then read this article:
http://www.regular-expressions.info/email.html
The author goes into how any of the expressions he shows can return bad results. The best one is probably the one that disallows quotes and brackets.