Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SMTP Conversation Broken - MAIL FROM invalid #317

Closed
fukawi2 opened this issue Feb 11, 2020 · 13 comments
Closed

SMTP Conversation Broken - MAIL FROM invalid #317

fukawi2 opened this issue Feb 11, 2020 · 13 comments

Comments

@fukawi2
Copy link

fukawi2 commented Feb 11, 2020

When using the SMTP lib, if the 'From' header is set to an RFC 5322 formatted address with a display name (eg, Phillip Smith - Example<[email protected]>), the SMTP conversation breaks because the MAIL FROM: command is invalid

Example code:

$smtp = new \SMTP('localhost', 25);
$smtp->set('From', 'Phillip Smith - Example<[email protected]>');
$smtp->set('CC', 'Phillip Smith - Example<[email protected]>');
$smtp->set('To', '[email protected]');
$smtp->set('Subject', 'Test 123');
$smtp->set('Content-type', 'text/html; charset=UTF-8');
$smtp->send($email_message, TRUE);

Results in this MAIL FROM: value in the SMTP transaction:

220 localhost ESMTP Postfix
EHLO mysite.example.com
250-localhost
250-PIPELINING
250-SIZE 20971520
250-VRFY
250-ETRN
250-ENHANCEDSTATUSCODES
250-8BITMIME
250 DSN
MAIL FROM: <Phillip>, <Smith>, <->, Example<[email protected]>
555 5.5.4 Unsupported option: <Smith>,

It is being caused by the "stringify headers" loop: https://github.com/bcosca/fatfree/blob/master/lib/smtp.php#L253

Appears to have been introduced in 3.6.0 by this change: https://github.com/bcosca/fatfree/blame/495a97c9957a9edfbaea362927be6f3faa96fc52/lib/smtp.php#L267

Workaround is to specify a plain email address in the "Sender" header:

$smtp->set('Sender', '[email protected]');

Also, send() does NOT return an error in response to this failure, so it goes quietly undetected until someone notices they haven't received the expected emails for several months ;)

I suggest either the "Sender" header is made mandatory, and valided/massaged into a plain email address without display name, or the stringify headers loop is adjusted (I'm not 100% understanding the purpose of this loop though)

@ikkez
Copy link
Member

ikkez commented Feb 11, 2020

it should work when you put the receivers in quotes:
$smtp->set('To', '"User 1" <[email protected]>, "User 2" <[email protected]>');
see https://fatfreeframework.com/3.7/smtp#set

@pauljherring
Copy link

pauljherring commented Feb 11, 2020 via email

@fukawi2
Copy link
Author

fukawi2 commented Feb 11, 2020

@pauljherring That is talking about the addr-spec part of the format. See section 3.4:

Normally, a mailbox is composed of two parts: (1) an optional display name that indicates the name of the recipient (which can be a person or a system) that could be displayed to the user of a mail application, and (2) an addr-spec address enclosed in angle brackets ("<" and ">"). There is an alternate simple form of a mailbox where the addr-spec address appears alone, without the recipient's name or the angle brackets.

Spaces are perfectly valid in the "display name", as long as the addr-spec (ie, "[email protected]") is wrapped in < and >

@ikkez That is another workaround I didn't realize, but the quotes are not required in the RFC, so F3 should not require them (IMHO).

@fukawi2
Copy link
Author

fukawi2 commented Feb 11, 2020

Thanks for the quick responses by the way :)

@KOTRET
Copy link
Contributor

KOTRET commented Feb 11, 2020

According to what @pauljherring linked, the rfc clearly states that quotes are required as long as the sender cannot be represented as dot-atom.

address         =   mailbox / group
mailbox         =   name-addr / addr-spec
name-addr       =   [display-name] angle-addr
angle-addr      =   [CFWS] "<" addr-spec ">" [CFWS] /obs-angle-addr
display-name    =   phrase

and as per Section 3.2.5, phrase is defined as

word            =   atom / quoted-string
phrase          =   1*word / obs-phrase

A word is either an atom or a quoted string.

@fukawi2
Copy link
Author

fukawi2 commented Feb 12, 2020

I think you're mixing up the "localpart" component of the email address, with the "display name" component of the address format. display-name is a "phrase"; the RFC does not require a "phrase" to be quoted.

These are all valid RFC5322 addresses:

As real-world examples, the notification emails from GitHub to myself regarding updates on this issue contain the following headers (unquoted display-name):

From: Christian Knuth <[email protected]>
---
From: PJH <[email protected]>

Another sample from Google in my inbox has this unquoted display-name header:

From: Google Local Guides <[email protected]>

One more sample from Hotmail also shows no quoting around the display-name:

From: Gale Boss <[email protected]>

So quoting the 'display-name' is optional (ie, not required) by the RFC, and not using quotes is valid and common, regardless of the presence of spaces within the display-name.

@pauljherring
Copy link

In the absence of unambiguously being able to determine whether a separator character is part of a display name, or genuinely separating two email addresses, how would you suggest changing the following function call, modified slightly, from @ikkez's earlier post?

 $smtp->set('To', 'Comma, one <[email protected]>, User 2 <[email protected]>');

Presuming, of course that there are two email addresses in the string, and not just one with a display name containing something that looks like an email address.

Or perhaps there's three with one being a local address.

The To header: https://tools.ietf.org/html/rfc5322#section-3.6.3

    to              =   "To:" address-list CRLF

address-list: https://tools.ietf.org/html/rfc5322#section-3.4

    address         =   mailbox / group

   mailbox         =   name-addr / addr-spec

   name-addr       =   [display-name] angle-addr

   angle-addr      =   [CFWS] "<" addr-spec ">" [CFWS] /
                       obs-angle-addr

   group           =   display-name ":" [group-list] ";" [CFWS]

   display-name    =   phrase

   mailbox-list    =   (mailbox *("," mailbox)) / obs-mbox-list

   address-list    =   (address *("," address)) / obs-addr-list

phrase: https://tools.ietf.org/html/rfc5322#section-3.2.5

   word            =   atom / quoted-string

   phrase          =   1*word / obs-phrase

.. and we're back to atoms and quoted-strings. https://tools.ietf.org/html/rfc5322#section-3.2.3

   atext           =   ALPHA / DIGIT /    ; Printable US-ASCII
                       "!" / "#" /        ;  characters not including
                       "$" / "%" /        ;  specials.  Used for atoms.
                       "&" / "'" /
                       "*" / "+" /
                       "-" / "/" /
                       "=" / "?" /
                       "^" / "_" /
                       "`" / "{" /
                       "|" / "}" /
                       "~"

   atom            =   [CFWS] 1*atext [CFWS]

I'm honestly not trying to be difficult here; it just sometimes comes across that way :)

@fukawi2
Copy link
Author

fukawi2 commented Feb 13, 2020

I'm honestly not trying to be difficult here; it just sometimes comes across that way :)

I enjoy collaborative problem solving, so all good! :)

The imap_rfc822_parse_headers function in the imap library appears to handle this correctly based on my testing:

$headers = "From: Foobar<[email protected]>\n";
$headers .= "To: Mr Example 1<[email protected]>, Mrs Example 2<[email protected]>\n";
$objHeaders = imap_rfc822_parse_headers($headers);
var_dump($objHeaders);

Results:

object(stdClass)#1 (8) {
  ["toaddress"]=>
  string(63) "Mr Example 1 <[email protected]>, Mrs Example 2 <[email protected]>"
  ["to"]=>
  array(2) {
    [0]=>
    object(stdClass)#2 (3) {
      ["personal"]=>
      string(12) "Mr Example 1"
      ["mailbox"]=>
      string(3) "one"
      ["host"]=>
      string(11) "example.com"
    }
    [1]=>
    object(stdClass)#3 (3) {
      ["personal"]=>
      string(13) "Mrs Example 2"
      ["mailbox"]=>
      string(3) "two"
      ["host"]=>
      string(11) "example.com"
    }
  }
  ["fromaddress"]=>
  string(27) "Foobar <[email protected]>"
.....

F3 doesn't necessarily need a new dependency though.

There is also mailparse_rfc822_parse_addresses, but that requires an even more obscure non-standard PHP module.

mail-mime-parser might be an option to avoid reinventing the wheel. Composer installs it without any additional PHP module requirements from what I can tell. Perhaps F3 could abstract some of this email handling functionality off to this? Email is deceivingly complex for such a 'simple' protocol, and it isn't the primary purpose of F3. mail-mime-parser is BSD 2-clause licensed, which is compatible with the GPLv3 that F3 is using.

require('vendor/autoload.php');
$headers = "From: Foobar<[email protected]>\n";
$headers .= "To: Mr Example 1<[email protected]>, Mrs Example 2<[email protected]>\n";

$mailParser = new \ZBateson\MailMimeParser\MailMimeParser();
$message = $mailParser->parse($headers);
$to = $message->getHeader('To');
foreach ($to->getAddresses() as $addr) {
  printf("To Name: '%s'\n", $addr->getName());
  printf("To Email: '%s'\n", $addr->getEmail());
}

Results:

To Name: 'Mr Example 1'
To Email: '[email protected]'
To Name: 'Mrs Example 2'
To Email: '[email protected]'

@pauljherring
Copy link

pauljherring commented Feb 13, 2020

$headers .= "To: Mr Example 1<[email protected]>, Mrs Example 2<[email protected]>\n";

Ok, introducing a comma into the first display name - I'm presuming you're not proposing banning commas from display names? ;) :

$headers .= "To: Mr, Example 1<[email protected]>, Mrs Example 2<[email protected]>\n";

object(stdClass)#1 (8) {
  ["toaddress"]=>
  string(72) "Mr@UNKNOWN, Example 1 <[email protected]>, Mrs Example 2 <[email protected]>"
  ["to"]=>
  array(3) {
    [0]=>
    object(stdClass)#2 (2) {
      ["mailbox"]=>
      string(2) "Mr"
      ["host"]=>
      string(7) "UNKNOWN"
    }
    [1]=>
    object(stdClass)#3 (3) {
      ["personal"]=>
      string(9) "Example 1"
      ["mailbox"]=>
      string(3) "one"
      ["host"]=>
      string(11) "example.com"
    }
    [2]=>
    object(stdClass)#4 (3) {
      ["personal"]=>
      string(13) "Mrs Example 2"
      ["mailbox"]=>
      string(3) "two"
      ["host"]=>
      string(11) "example.com"
    }
...

Quoting that first display name:

$headers .= 'To: "Mr, Example 1"<[email protected]>, Mrs Example 2<[email protected]>'."\n";

  ["toaddress"]=>
  string(66) ""Mr, Example 1" <[email protected]>, Mrs Example 2 <[email protected]>"
  ["to"]=>
  array(2) {
    [0]=>
    object(stdClass)#2 (3) {
      ["personal"]=>
      string(13) "Mr, Example 1"
      ["mailbox"]=>
      string(3) "one"
      ["host"]=>
      string(11) "example.com"
    }
    [1]=>
    object(stdClass)#3 (3) {
      ["personal"]=>
      string(13) "Mrs Example 2"
      ["mailbox"]=>
      string(3) "two"
      ["host"]=>
      string(11) "example.com"
    }
  }
...

Quotes needed. I think it's merely an accident of php's implementation that - as long as you aren't using commas anywhere except as a separator, then it "seems to work." Rather than "working as intended."

I'm minded of C's "undefined behaviour," when said undefined behaviour happens to match the expected behaviour by accident.

See also: Postel's law

@fukawi2
Copy link
Author

fukawi2 commented Feb 14, 2020

Yes, I see your point. But we're also slightly off-topic; the original bug is with how F3 is handling the From header to set the MAIL FROM envelope address - it's definitely invalid for the From header to have multiple addresses, so it would be reasonable to not treat commas specially when trying to extract the address.

@KOTRET
Copy link
Contributor

KOTRET commented Feb 14, 2020

Of course most servers probably can handle unquoted addresses, but as discussed:
this is different for the header: the from-header may omit these quotes whereas the to-header may contain multiple addresses. In this case you can get into trouble having commas in unquoted display names.

For a comprehensive solution you could add a lib, also this php-function (for me this is a perfect case of "know your api") is nice.
But this will require to add a new dependency (imap or composer or whatever) - 👎

Using a different parser for one field is not very reasonable and far from "fatfree".


As there _is_ an easy workaround, I'd flag this with "wontfix".

NB: the RFC really states to quote the displayname if it does not consist of atoms 😉

@fukawi2
Copy link
Author

fukawi2 commented Feb 21, 2020

I appreciate the "wontfix" suggestion - it would be nice to handle it gracefully is some way, even if it throws an error.

As I mentioned in my original post, currently F3 just silently ignores the corresponding error-condition (the 554 status code from the SMTP server), so perhaps the 'solution' to this problem is for F3 to properly handle the SMTP conversation error (instead of silently continuing and ignoring the error). At least future people like me will find the problem a lot sooner ;)

@KOTRET
Copy link
Contributor

KOTRET commented Feb 24, 2020

Ah yes, I totally agree with that.
I did not thought about this, because I used a modified version, where I extended the dialog-function to throw an Exception:

if (preg_match('/^(4|5)\d{2}\s.*$/', $reply)) {
    throw new SMTPException($cmd, $reply);
}

But I've seen there is no errorhandling yet.

@ikkez ikkez transferred this issue from bcosca/fatfree Dec 13, 2020
ikkez added a commit that referenced this issue Dec 13, 2020
@ikkez ikkez closed this as completed Dec 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants