Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catastrophic backtracking in validation regexes #3

Closed
GoogleCodeExporter opened this issue Apr 17, 2015 · 7 comments
Closed

Catastrophic backtracking in validation regexes #3

GoogleCodeExporter opened this issue Apr 17, 2015 · 7 comments

Comments

@GoogleCodeExporter
Copy link

There are some email addresses that behave *very* poorly with the validation 
done in EmailValidationUtil.  I think it might be due to the nested quantifiers 
in the complex regexes there.  They literally take hours to finish the 
validation, using 100% CPU.

Is there any way to fix this, and barring that, can an option be added to skip 
validation?

To reproduce:
1. Try to send an email to an address like 
[email protected]
2. Wait for computer to explode

(Using java 1.6.0_31)

Original issue reported on code.google.com by [email protected] on 20 Apr 2012 at 10:13

@GoogleCodeExporter
Copy link
Author

The regex expressions come from another open source project, 
http://code.google.com/p/emailaddress/source/browse/.

The problem is that the class has exploded over there so I would have to patch 
our own version. Alas, regular expressions is not my specialty. Any thoughts on 
how to fix this?

Original comment by b.bottema on 9 Aug 2012 at 7:36

@seanf
Copy link

seanf commented Apr 29, 2015

It looks like the email validator might move here (eventually): https://github.com/lhazlewood/jeav

@bbottema
Copy link
Owner

Hmm, not anytime soon I'm afraid, considering the last commit was from 2011 and the validation logic was initially created in 2006. I see you're trying to get that to move. Good. Let's see.

@seanf
Copy link

seanf commented Apr 29, 2015

Hmm, not anytime soon I'm afraid, considering the last commit was from 2011 and the validation logic was initially created in 2006. I see you're trying to get that to move. Good. Let's see.

True, I just thought I'd put in the link in case it's hard to find later on (in the hope that the migration does in fact happen at some point).

@bbottema What exactly did you mean by "the class has exploded" in #3 (comment)? (assuming you can remember what you meant back that far!) Perhaps just the mess of untested regexes? Since the code seems to be unmaintained anyway, it might be worth bringing it in, assuming it's worth keeping at all.

It looks like the performance problems were known even before the validator project was set up, and not likely to be fixed any time soon: http://leshazlewood.com/2006/11/06/emailaddress-java-class/comment-page-1/#comment_count

In any case, there's a school of thought that client code shouldn't even try to validate email addresses perfectly, for instance http://davidcel.is/posts/stop-validating-email-addresses-with-regex/. It's too easy to get it wrong and reject email addresses which are actually valid, before you even start worrying about things like catastrophic backtracking.

It's probably enough to check for @ or <somename> something@something, and let the SMTP server worry about anything more sophisticated if need be. In view of the potential for erroneous rejections or performance problems and the general unmaintained and untested state of the validation library, I would suggest that Simple Java Mail shouldn't bother, or should at least make the address validation optional. (Or did I miss an option which already exists?)

Some users may care about RFC-compliant bodies, but not about 100% RFC compliant addresses. Personally, I would prefer a risk of letting a non-compliant address through (perhaps to be rejected by the SMTP server) over the risk of wasting hours of CPU.

@bbottema
Copy link
Owner

What exactly did you mean by "the class has exploded" in #3 (comment)? (assuming you can remember what you meant back that far!) Perhaps just the mess of untested regexes? Since the code seems to be unmaintained anyway, it might be worth bringing it in, assuming it's worth keeping at all.

@seanf I think at the time it was impossible to easily update to a next version of a validation library, as their version had moved away too much. I think they were halfway switching between an overgrown regex version and doing it in native java without regex.

In any case, there's a school of thought that client code shouldn't even try to validate email addresses perfectly, for instance http://davidcel.is/posts/stop-validating-email-addresses-with-regex/. It's too easy to get it wrong and reject email addresses which are actually valid, before you even start worrying about things like catastrophic backtracking.

This is a valid point. The point was provide early detection and friendly errors. It's a balance between helping the end-user with a abstract friendly API layer and letting him deal with the technical depths of native libraries and errors. It's why simple-java-mail exists in the first place, but maybe we should draw the line at email validation, simply because we don't master the subject and it is completely untested.

I would suggest that Simple Java Mail shouldn't bother, or should at least make the address validation optional. (Or did I miss an option which already exists?)

Currently not, but it is worth adding. There is a way to configure the validation criteria, by simply setting it on a Mailer instance. I will look into it.

Some users may care about RFC-compliant bodies, but not about 100% RFC compliant addresses. Personally, I would prefer a risk of letting a non-compliant address through (perhaps to be rejected by the SMTP server) over the risk of wasting hours of CPU.

I agree, the purpose of this library is to provide an easy way to handle complex mail bodies that behave consistently across the many email readers. Email validation is secondary to that. However, if there is a good library out there that properly validates email, I would still like a facility like that.

@bbottema
Copy link
Owner

@bbottema
Copy link
Owner

@seanf This library hopefully runs better: https://github.com/bbottema/email-rfc2822-validator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants