[ACCEPTED]-Is regular expression recognition of an email address hard?-textmatching
For the formal e-mail spec, yes, it is technically 11 impossible via Regex due to the recursion 10 of things like comments (especially if you 9 don't remove comments to whitespace first), and 8 the various different formats (an e-mail 7 address isn't always someone@somewhere.tld). You 6 can get close (with some massive and incomprehensible 5 Regex patterns), but a far better way of 4 checking an e-mail is to do the very familiar 3 handshake:
- they tell you their e-mail
- you e-mail them a confimation link with a Guid
when they click on the link you 2 know that:
- the e-mail is correct
- it exists
- they own it
Far better than blindly accepting 1 an e-mail address.
There are a number of Perl modules (for 8 example) that do this. Don't try and write 7 your own regexp to do it. Look at
Mail::VRFY
will do 6 syntax and network checks (does and SMTP 5 server somewhere accept this address)
https://metacpan.org/pod/Mail::VRFY
RFC::RFC822::Address
- a 4 recursive descent email address parser.
https://metacpan.org/pod/RFC::RFC822::Address
Mail::RFC822::Address
- regexp-based 3 address validation, worth looking at just 2 for the insane regexp
http://ex-parrot.com/~pdw/Mail-RFC822-Address.html
Similar tools exist 1 for other languages. Insane regexp below...
(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:
\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(
?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[
\t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0
31]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\
](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+
(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:
(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)
?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\
r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[
\t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)
?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t]
)*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*
)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)
*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+
|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r
\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t
]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031
]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](
?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?
:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?
:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?
:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?
[ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\]
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|
\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>
@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"
(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t]
)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?
:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[
\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-
\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(
?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;
:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([
^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\"
.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\
]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\
[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\
r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\]
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]
|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0
00-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\
.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,
;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?
:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*
(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[
^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]
]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*(
?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(
?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[
\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t
])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t
])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?
:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|
\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:
[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\
]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)
?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["
()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)
?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>
@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[
\t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,
;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t]
)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?
(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:
\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[
"()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])
*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])
+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\
.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(
?:\r\n)?[ \t])*))*)?;\s*)
Validating e-mail addresses aren't really 12 very helpful anyway. It will not catch common 11 typos or made-up email addresses, since 10 these tend to look syntactically like valid 9 addresses.
If you want to be sure an address 8 is valid, you have no choice but to send 7 an confirmation mail.
If you just want to 6 be sure that the user inputs something that 5 looks like an email rather than just "asdf", then 4 check for an @. More complex validation 3 does not really provide any benefit.
(I know 2 this doesn't answer your questions, but 1 I think it's worth mentioning anyway)
There is a context free grammar in BNF that 7 describes valid email addresses in RFC-2822. It 6 is complex. For example:
" @ "@example.com
is a valid email 5 address. I don't know of any regexps that 4 do it fully; the examples usually given 3 require comments to be stripped first. I 2 wrote a recursive descent parser to do it 1 fully once.
I've now collated test cases from Cal Henderson, Dave 12 Child, Phil Haack, Doug Lovell and RFC 3696. 158 11 test addresses in all.
I ran all these tests 10 against all the validators I could find. The 9 comparison is here: http://www.dominicsayers.com/isemail
I'll try to keep this 8 page up-to-date as people enhance their 7 validators. Thanks to Cal, Dave and Phil 6 for their help and co-operation in compiling 5 these tests and constructive criticism of 4 my own validator.
People should be aware of the errata against RFC 3696 in particular. Three 3 of the canonical examples are in fact invalid 2 addresses. And the maximum length of an 1 address is 254 or 256 characters, not 320.
It's not all nonsense though as allowing 3 characters such as '+' can be highly useful 2 for users combating spam, e.g. myemail+sketchysite@gmail.com (instant disposable Gmail addresses).
Only 1 when a site accepts it though.
Whether or not to accept bizarre, uncommon 32 email address formats depends, in my opinion, on 31 what one wants to do with them.
If you're 30 writing a mail server, you have to be very 29 exact and excruciatingly correct in what 28 you accept. The "insane" regex quoted above 27 is therefore appropriate.
For the rest of 26 us, though, we're mainly just interested 25 in ensuring that something a user types 24 in a web form looks reasonable and doesn't 23 have some sort of sql injection or buffer 22 overflow in it.
Frankly, does anyone really 21 care about letting someone enter a 200-character 20 email address with comments, newlines, quotes, spaces, parentheses, or 19 other gibberish when signing up for a mailing 18 list, newsletter, or web site? The proper 17 response to such clowns is "Come back later 16 when you have an address that looks like 15 username@domain.tld".
The validation I do 14 consists of ensuring that there is exactly 13 one '@'; that there are no spaces, nulls 12 or newlines; that the part to the right 11 of the '@' has at least one dot (but not 10 two dots in a row); and that there are no 9 quotes, parentheses, commas, colons, exclamations, semicolons, or 8 backslashes, all of which are more likely 7 to be attempts at hackery than parts of 6 an actual email address.
Yes, this means 5 I'm rejecting valid addresses with which 4 someone might try to register on my web 3 sites - perhaps I "incorrectly" reject as 2 many as 0.001% of real-world addresses! I 1 can live with that.
Quoting and various other rarely used but 12 valid parts of the RFC make it hard. I don't 11 know enough about this topic to comment 10 definitively, other than "it's hard" - but 9 fortunately other people have written about it at length.
As 8 to a valid regex for it, the Perl Mail::Rfc822::Address 7 module contains a regular expression which will apparently work - but only if any comments 6 have been replaced by whitespace already. (Comments 5 in an email address? You see why it's harder 4 than one might expect...)
Of course, the 3 simplified regexes which abound elsewhere 2 will validate almost every email address 1 which is genuinely being used...
Some flavours of regex can actually match 6 nested brackets (e.g., Perl compatible ones). That 5 said, I have seen a regex that claims to 4 correctly match RFC 822 and it was two pages 3 of text without any whitespace. Therefore, the 2 best way to detect a valid email address 1 is to send email to it and see if it works.
Just to add a regex that is less crazy than 7 the one listed by @mmaibaum:
^[a-zA-Z]([.]?([a-zA-Z0-9_-]+)*)?@([a-zA-Z0-9\-_]+\.)+[a-zA-Z]{2,4}$
It is not bulletproof, and 6 certainly does not cover the entire email 5 spec, but it does do a decent job of covering 4 most basic requirements. Even better, it's 3 somewhat comprehensible, and can be edited.
Cribbed 2 from a discussion at HouseOfFusion.com, a world-class ColdFusion 1 resource.
An easy and good way to check email-adresses 8 in Java is to use the EmailValidator of 7 the Apache Commons Validator library.
I would always check 6 an email-address in an input-form against 5 something like this before sending an email 4 - even if you only catch some typos. You 3 probably don't want to write an automated 2 scanner for "delivery failed" notification 1 mails. :-)
It's really hard because there are a lot 11 of things that can be valid in an email 10 address according to the Email Spec, RFC 2822. Things 9 that you don't normally see such as + are 8 perfectly valid characters for an email 7 address.. according to the spec.
There's 6 an entire section devoted to email addresses 5 at http://regexlib.com, which is a great resource. I'd suggest 4 that you determine what criteria matters 3 to you and find one that matches. Most 2 people really don't need full support for 1 all possibilities allowed by the spec.
If you're running on the .NET Framework, just 16 try instantiating a MailAddress
object and catching 15 the FormatException
if it blows up, or pulling out the 14 Address
if it succeeds. Without getting into any 13 nonsense about the performance of catching 12 exceptions (really, if this is just on a 11 single Web form it is not going to make 10 that much of a difference), the MailAddress
class in 9 the .NET framework goes through a quite 8 complete parsing process (it doesn't use 7 a RegEx). Open up Reflector and search for 6 MailAddress
and MailBnfHelper.ReadMailAddress()
to see all of the fancy stuff it does. Someone 5 smarter than me spent a lot of time building 4 that parser at Microsoft, I'm going to use 3 it when I actually send an e-mail to that 2 address, so I might as well use it to validate 1 the incoming address, too.
Try this one:
"(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])"
Have a look here for the details.
However, rather 8 than implementing the RFC822 standard, maybe 7 it would be better to look at it from another 6 viewpoint. It doesn't really matter what 5 the standard says if mail servers don't 4 mirror the standard. So I would argue that 3 it would be better to imitate what the most 2 popular mail servers do when validating 1 email addresses.
Many have tried, and many come close. You 6 may want to read the wikipedia article, and some others.
Specifically, you'll 5 want to remember that many websites and 4 email servers have relaxed validation of 3 email addresses, so essentially they don't 2 implement the standard fully. It's good 1 enough for email to work all the time though.
This class for Java has a validator in it: http://www.leshazlewood.com/?p=23
This 28 is written by the creator of Shiro (formally 27 Ki, formally JSecurity)
The pros and cons of testing for e-mail address validity:
There are two types 26 of regexes that validate e-mails:
- Ones that are too loose.
Ones that are too strict.
It is not 25 possible for a regular expression to match 24 all valid e-mail addresses and no e-mail 23 addresses that are not valid because some 22 strings might look like valid e-mail addresses 21 but do not actually go to anyone's inbox. The 20 only way to test to see if an e-mail is 19 actually valid is to send an e-mail to that 18 address and see if you get some sort of 17 response. With that in mind, regexes that 16 are too strict at matching e-mails don't 15 really seem to have much of a purpose.
I 14 think that most people who ask for an e-mail 13 regex are looking for the first option, regexes 12 that are too loose. They want to test a 11 string and see if it looks like an e-mail, if 10 it is definitely not an email, then they 9 can say to the user: "Hey, you are 8 supposed to put an e-mail here and this 7 definitely is not a valid e-mail. Perhaps 6 you didn't realize that this field is for 5 an e-mail or maybe there is a typo".
If 4 a user puts in a string that looks a lot 3 like a valid e-mail, but it actually is 2 not one, then that is a problem that should 1 be handled by a different part of the application.
Can anyone provide some insight as to why that is?
Yes, it is an extremely complicated standard 15 that allows lots of stuff that no one really 14 uses today. :)
Are there any known and proven regexps that actually do this fully?
Here is one attempt to parse 13 the whole standard fully...
http://ex-parrot.com/~pdw/Mail-RFC822-Address.html
What are some good alternatives to using regexps for matching email addresses?
Using an existing 12 framework for it in whatever language you 11 are using I guess? Though those will probably 10 use regexp internally. It is a complex string. Regexps 9 are designed to parse complex strings, so 8 that really is your best choice.
Edit: I should 7 add that the regexp I linked to was just 6 for fun. I do not endorse using a complex 5 regexp like that - some people say that 4 "if your regexp is more than one line, it 3 is guaranteed to have a bug in it somewhere". I 2 linked to it to illustrate how complex the 1 standard is.
For completeness of this post, also for 5 PHP there is a language built-in function 4 to validate e-mails.
For PHP Use the nice 3 filter_var with the specific EMAIL validation 2 type :)
No more insane email regexes in php 1 :D
var_dump(filter_var('bob@example.com', FILTER_VALIDATE_EMAIL));
There always seems to be an unaccounted 35 for format when trying to create a regular 34 expression to validate emails. Though there 33 are some characters that are not valid in 32 an email, the basic format is local-part@domain 31 and is roughly 64 chars max on the local 30 part and roughly 253 chars on the domain. Besides 29 that, it's kind like the wild wild west.
I 28 think the answer depends on your definition 27 of a validated email address and what your 26 business process has tolerance for. Regular 25 expressions are great for making sure an 24 email is formatted properly and as you know 23 there are many variations of them that can 22 work. Here are a couple of variations:
Variant 21 1:
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
Variant2:
\A(?:[a-z0-9!#$%&'*+/=?^_‘{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_‘{|}~-]+)*| "(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])\z
Just because an email is syntactically 20 correct doesn't mean it is valid.
An email 19 can adhere to the RFC 5322 and pass the 18 regex but there will be no true insight 17 into the emails actual deliverability. What 16 if you wanted to know if the email was a 15 bogus email or if it was disposable or not 14 deliverable or a known bot? What if you 13 wanted to exclude emails that were vulgar 12 or in some way factious or problematic? By 11 the way, just so everyone knows, I work 10 for a data validation company and with that 9 I just wanted give full disclosure that 8 I work for Service Objects but, being a 7 professional in the email validation field, I 6 feel the solution we offer provides better 5 validation than a regex. Feel free to give 4 it a look, I think it can help a lot. You 3 can see more info about this in our dev guide. It 2 actually does a lot of cool email checks 1 and verification's.
Here's an example:
Email: mickeyMouse@gmail.com
{
"ValidateEmailInfo":{
"Score":4,
"IsDeliverable":"false",
"EmailAddressIn":"mickeyMouse@gmail.com",
"EmailAddressOut":"mickeyMouse@gmail.com",
"EmailCorrected":false,
"Box":"mickeyMouse",
"Domain":"gmail.com",
"TopLevelDomain":".com",
"TopLevelDomainDescription":"commercial",
"IsSMTPServerGood":"true",
"IsCatchAllDomain":"false",
"IsSMTPMailBoxGood":"false",
"WarningCodes":"22",
"WarningDescriptions":"Email is Bad - Subsequent checks halted.",
"NotesCodes":"16",
"NotesDescriptions":"TLS"
}
}
More Related questions
We use cookies to improve the performance of the site. By staying on our site, you agree to the terms of use of cookies.