Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error regex to getBoundary() passing by getAttachment() #126

Closed
MouMoutMan opened this issue Apr 9, 2021 · 5 comments
Closed

Error regex to getBoundary() passing by getAttachment() #126

MouMoutMan opened this issue Apr 9, 2021 · 5 comments
Labels
bug Something isn't working validating

Comments

@MouMoutMan
Copy link

MouMoutMan commented Apr 9, 2021

Describe the bug
The getAttachment function cannot find the attachment because the returned boundary is not properly parsed.
Only using the getAttachment () function.
In the mail header, when there is a string after the boundary, the regex also takes that string. This creates a bad boundary, and the attachment is not found.

Code to Reproduce
The following code returns:
"----=EDF_mixed"; charset="ISO-8859-1"
and after clearBoundaryString, this code return the bad boundary:
----=EDF_mixed charset=ISO-8859-1

<?php
$part_of_header = 'Content-Type: multipart/mixed; boundary="----=EDF_mixed"; charset="ISO-8859-1"';

// Header.php, function getBoundary() - l.176
$actual_pattern = "/boundary\=(.*)/i"; 

// Header.php, function find() - l.159
 if (preg_match_all($actual_pattern, $part_of_header, $matches)) {
            if (isset($matches[1])) {
                if(count($matches[1]) > 0) {
                    var_dump( $matches[1][0]);
                }
            }
        }

The following code returns:
"----=EDF_mixed"
and after clearBoundaryString, this code return the good boundary:
----=EDF_mixed

<?php
$part_of_header = 'Content-Type: multipart/mixed; boundary="----=EDF_mixed"; charset="ISO-8859-1"';

// Header.php, l.176
$actual_pattern = "/boundary=\"?([^\"]*)[\";\s]/i"; 

 if (preg_match_all($actual_pattern, $part_of_header, $matches)) {
            if (isset($matches[1])) {
                if(count($matches[1]) > 0) {
                    var_dump( $matches[1][0]);
                }
            }
        }

Expected behavior
Change the regex to:
/boundary=\"?([^\"]*)[\";\s]/i
like the example in the documentation:
https://www.php-imap.com/api/header

Desktop / Server (please complete the following information):

  • OS: Ubuntu
  • PHP: 7.4.13
  • Version: v2.4.0
@MouMoutMan MouMoutMan changed the title Error regex to getBoundary() Error regex to getBoundary() passing by getAttachment() Apr 9, 2021
@MouMoutMan
Copy link
Author

This issues would solve #121

MouMoutMan added a commit to MouMoutMan/php-imap that referenced this issue Apr 9, 2021
@MouMoutMan
Copy link
Author

MouMoutMan commented Apr 16, 2021

#27 is managed with this code 😁

@MouMoutMan
Copy link
Author

After many tries, the problem is not just with the regex, because the current one (/boundary=(.*)/i) covers a larger amount of mails.
However, I have several problems because of her.
It gets the semicolon and the function clearBoundaryString() therefore transforms the boundary.

Here is my suggestion:
Since RFC2046 (on page 22) does not allow the semicolon in the boundary, you might as well remove anything after the semicolon, if there is one:

    private function clearBoundaryString($str) {
        $pos = strpos($str, ';');
        if($pos) {
            $str = (strstr($str, ';', true));
        }
        return str_replace(['"', '\r', '\n', "\n", "\r", ";", "\s"], "", $str);
    }

@Webklex Webklex added bug Something isn't working validating labels Jun 18, 2021
@Webklex
Copy link
Owner

Webklex commented Jun 18, 2021

Hi @MouMoutMan ,
many thanks for your report and your time spent looking into this issue. I've had so many problems with this little regex, I'm really afraid to touch it...

RFCs are nice and all, but unfortunately almost no mail client follows it 100% - so in the end its all over the place..

I'm not sure how to handle this issue. Any additional comments or ideas are welcome :)

Best regards,

@Webklex
Copy link
Owner

Webklex commented Nov 4, 2021

Hi @MouMoutMan ,
please give v3.0.0-alpha a try. The default boundary regex used to detect the message parts in a multi-part message has been changed and can now also be modified via the config options.boundary parameter inside your configuration.

Best regards,

@Webklex Webklex closed this as completed Nov 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working validating
Projects
None yet
Development

No branches or pull requests

2 participants