Substitution of Cp1252 for ISO-ISO-8859-1 in lib\Document.php doesn't seem to be correct #512

dartheditous · 2023-04-02T11:42:56Z

Hi,

In lib/Document.php there is this note:

                        // Some EU reports are reporting Cp1252 charset in the download headers and not being correctly
                        // parsed by PHP. In those cases, replacing the encoding value with ISO-8859-1 allows PHP to
                        // correctly detect and convert the document to UTF-8

But that doesn't seem to be correct, at least with the reports I'm getting. For example, one of my SKU's titles in the Amazon.de marketplace contains „ and “ characters. In a JSON encoding of the result from Document::download, these are encoded as \u0084 and \u0093 respectively, which are control codes, instead of \u201e and \u201c, which are the correct punctuation mark codepoints.

Do you have more info on what led to this encoding switch being made? I'm wondering if it was the titles themselves on someone's report that were incorrectly encoded, not the report itself.

The text was updated successfully, but these errors were encountered:

jlevers mentioned this issue Mar 22, 2024

V6: Major rewrite using Saloon #643

Merged

iajrz closed this as completed Jan 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Substitution of Cp1252 for ISO-ISO-8859-1 in lib\Document.php doesn't seem to be correct #512

Substitution of Cp1252 for ISO-ISO-8859-1 in lib\Document.php doesn't seem to be correct #512

dartheditous commented Apr 2, 2023

Substitution of Cp1252 for ISO-ISO-8859-1 in lib\Document.php doesn't seem to be correct #512

Substitution of Cp1252 for ISO-ISO-8859-1 in lib\Document.php doesn't seem to be correct #512

Comments

dartheditous commented Apr 2, 2023