Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validating e-mails with display-name: John Doe <[email protected]> #116

Closed
ThorstenEngel opened this issue Oct 4, 2023 · 12 comments
Closed

Comments

@ThorstenEngel
Copy link

Hi,

in my use-case I need to validate the syntax of e-mails with display name. I think, https://www.rfc-editor.org/rfc/rfc5322#section-3.4 fully allows addresses like "John Doe <[email protected]>" or in my case something like "ACME Corp. <[email protected]>".

I did not find a way to verify these addresses with yours or any other library. It would be great if your library could validate this too ;-).

Warm regards,
thorsten

@JoshData
Copy link
Owner

JoshData commented Oct 4, 2023

https://github.com/mailgun/flanker can do that (and you could combine it with this library). (We link to flanker at the top of our README.)

I think parsing display names could be a useful addition.

@ThorstenEngel
Copy link
Author

Thanks, this helped!

@salty-horse
Copy link

Flanker's lack of maintenance (and dependency on unmaintained packages) is beginning to break in modern versions of Python (3.13 specifically.)

For extracting the email from the display name you can use Python's built-in email.utils.parseaddr.

@JoshData
Copy link
Owner

JoshData commented Feb 4, 2024

Great suggestion. 😀

@JoshData
Copy link
Owner

JoshData commented Feb 4, 2024

I was thinking of replacing flanker with parseaddr in the recommendation in the README, but I see the parseaddr is a little flaky with edge cases. Just from a minute of playing I see it drops parts of the input it doesn't like:

email.utils.parseaddr("Test <@x>")
('Test', '')

>>> email.utils.parseaddr("Test <a@xx>, X <b@b>")
('Test', 'a@xx')

So it's not something I would necessarily recommend to use with a strict validation tool like this library.

@salty-horse
Copy link

salty-horse commented Feb 4, 2024

Flanker doesn't accept Test <@x>, and from Test <a@xx>, X <b@b> it extracts b@b.

For my specific use, I don't care much about those cases, so I think it's a good enough solution :)

@JoshData
Copy link
Owner

JoshData commented Feb 4, 2024

Fair point !

JoshData added a commit that referenced this issue Feb 27, 2024
Per request in #116, parse display name syntax also, but don't allow it unless a new allow_display_name option is set. Parsing according to the MIME specification probably isn't what's generally wanted since the use case is probably parsing inputs in email composition-like user interfaces. So it's in the spirit of a MIME message but not the letter.

If display name syntax is used, return the unquoted/unescaped display name in the returned object.
JoshData added a commit that referenced this issue Apr 14, 2024
Per request in #116, parse display name syntax also, but don't allow it unless a new allow_display_name option is set. Parsing according to the MIME specification probably isn't what's generally wanted since the use case is probably parsing inputs in email composition-like user interfaces. So it's in the spirit of a MIME message but not the letter.

If display name syntax is used, return the unquoted/unescaped display name in the returned object.
JoshData added a commit that referenced this issue Apr 19, 2024
Per request in #116, parse display name syntax also, but don't allow it unless a new allow_display_name option is set. Parsing according to the MIME specification probably isn't what's generally wanted since the use case is probably parsing inputs in email composition-like user interfaces. So it's in the spirit of a MIME message but not the letter.

If display name syntax is permitted, return the unquoted/unescaped display name in the returned object.
@jplusc
Copy link

jplusc commented Jun 18, 2024

Hi, Just wanted to let you know I just moved from v2.1.2 to the current git branch to test out the allow_display_name option and ran into a difference with it from the previouly mentioned workarounds.

i've been using the email.utils.parseaddr and then just sending the email portion to email_validator, but email.utils.parseaddr works with email address like [email protected] (Kevin Martin) whereas email_validator raises the exception EmailSyntaxError: The part after the @-sign contains invalid characters: '(', ')', SPACE.

I know you may not want to handle emails in this format, but thought the difference should be documented somewhere.

thanks for all you do!

>>> import email.utils
>>> import email_validator
>>> from flanker.addresslib import address
>>>
>>> #parseaddr
>>> s = "[email protected] (Kevin Martin)"
>>> email.utils.parseaddr(s)
('Kevin Martin', '[email protected]')
>>>
>>> #flanker
>>> address.parse(s).address
'[email protected]'
>>>
>>> #email_validator
>>> email_validator.validate_email(s, allow_display_name = True, check_deliverability = False)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python311\Lib\site-packages\email_validator\validate_email.py", line 124, in validate_email
    domain_name_info = validate_email_domain_name(domain_part, test_environment=test_environment, globally_deliverable=globally_deliverable)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\email_validator\syntax.py", line 441, in validate_email_domain_name
    raise EmailSyntaxError("The part after the @-sign contains invalid characters: " + ", ".join(sorted(bad_chars)) + ".")
email_validator.exceptions_types.EmailSyntaxError: The part after the @-sign contains invalid characters: '(', ')', SPACE.

@JoshData
Copy link
Owner

>>> s = "[email protected] (Kevin Martin)"

Huh. What I implemented follows RFC 2822's name <email> format:

name-addr       =       [display-name] angle-addr
angle-addr      =       [CFWS] "<" addr-spec ">" [CFWS] / obs-angle-addr
display-name    =       phrase

I'm not sure what the source of a email (name) format is. Is it commonly used?

@jplusc
Copy link

jplusc commented Jun 18, 2024

I'm not sure what the source of a email (name) format is. Is it commonly used?

It might just be a qmail or older postfix or freebsd thing.

I don't see it often, but when I do, and if it came from a message, it normally also has a Received header from either qmail or postfix.
here is one I just happened to have handy: Received: by six.pairlist.net (Postfix, from userid 0) id CC6D26ED5C

And I used to rent a freebsd server and whenever I would use their mailing list functions, my outgoing emails would look like that as well.
(but I don't know if they also used postfix or qmail to send them)

It's not super common, but common enough that I would have to work around the execptions, so I am prob going to just stick with email.utils.parseaddr. (I know it also has weird edgecase behavior, but it's mishandling of edgecases hasn't effected my dataset in a meaningful way yet)

Hmm, just stumbled across this (from 2014):
https://wordtothewise.com/2014/12/friendly-email-addresses/
" parentheses isn't really a display name at all, rather it's a human readable comment. "

I thought I saw some other mention around here about ignoring comments in ()'s
I've never seen anything other than a name or mailing list name in the parens.

@ThorstenEngel
Copy link
Author

We recently had an e-Mail with the display name "TIERE (gemeinnütziger Verein) Max Müller". As it contains Umlaute and Brackets, it did not work with email.utils.formataddr (it did not add the necessary paranthesis). So I rewrote my code successfuly to replace formataddr((friendlyname, r_mail)) with

from email.headerregistry import Address

fullmail = str(Address(display_name=friendlyname, addr_spec=r_mail))

It looked to me as if email.headerreagistry is better maintained than email.utils. getaddresses worked in my cases.

@JoshData
Copy link
Owner

" parentheses isn't really a display name at all, rather it's a human readable comment. "

Ahha! That makes sense. Comments came up in #77. As fun as it has been to implement display names, I probably am not going to get motivated to support comments.

It looked to me as if email.headerreagistry is better maintained than email.utils.

Good to know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants