Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Substitution of Cp1252 for ISO-ISO-8859-1 in lib\Document.php doesn't seem to be correct #512

Closed
dartheditous opened this issue Apr 2, 2023 · 0 comments

Comments

@dartheditous
Copy link

Hi,

In lib/Document.php there is this note:

                        // Some EU reports are reporting Cp1252 charset in the download headers and not being correctly
                        // parsed by PHP. In those cases, replacing the encoding value with ISO-8859-1 allows PHP to
                        // correctly detect and convert the document to UTF-8

But that doesn't seem to be correct, at least with the reports I'm getting. For example, one of my SKU's titles in the Amazon.de marketplace contains „ and “ characters. In a JSON encoding of the result from Document::download, these are encoded as \u0084 and \u0093 respectively, which are control codes, instead of \u201e and \u201c, which are the correct punctuation mark codepoints.

Do you have more info on what led to this encoding switch being made? I'm wondering if it was the titles themselves on someone's report that were incorrectly encoded, not the report itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants