Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding problem with cli export/import #2645

Closed
X-dark opened this issue Nov 8, 2019 · 6 comments
Closed

Encoding problem with cli export/import #2645

X-dark opened this issue Nov 8, 2019 · 6 comments
Assignees
Milestone

Comments

@X-dark
Copy link

X-dark commented Nov 8, 2019

Hi,

After exporting my user database to sqlite from MySQL with the cli, I am trying to import it back into PostgreSQL, still with the cli.

It fails with the following error in PostgreSQL log:

Nov 08 14:38:18 htpc postgres[815]: 2019-11-08 14:38:18.987 CET [314058] ERROR:  invalid byte sequence for encoding "UTF8": 0xe9 0x20 0x69

Is it related to utf8mb4 in MySQL? Is there any way to clean my data to allow import or just skip the invalid value?

@Alkarex Alkarex added this to the 1.15.2 milestone Nov 8, 2019
@Alkarex
Copy link
Member

Alkarex commented Nov 8, 2019

Hum, I did not observe such a problem during my tests. Would you be able to find the more precise data, which makes it fail? We could add some Unicode sanitizing before insertion

@Alkarex
Copy link
Member

Alkarex commented Nov 8, 2019

Would you be able to produce a small version of your SQLite file, exhibiting the issue?

@X-dark
Copy link
Author

X-dark commented Nov 8, 2019

Invalid entries where limited to a feed that is no longer updated: http://www.hardware.fr/backend/news.xml. I have been able to import my database, removing all entries from that feed.

Here is the stripped down db (just one article of this feed): https://owncloud.cedricgirard.com/s/dtmHp66jqW5bfSS/download

@Alkarex Alkarex self-assigned this Nov 8, 2019
@Alkarex
Copy link
Member

Alkarex commented Nov 8, 2019

It is this comment, apparently in ISO-8859-x in the middle of some UTF-8, which seems to cause the problem, only with PostgreSQL:

<!-- Page actualit� individuelle -->

Alkarex added a commit to Alkarex/FreshRSS that referenced this issue Nov 9, 2019
@Alkarex
Copy link
Member

Alkarex commented Nov 9, 2019

@X-dark Could you please check this candidate patch? #2649

@X-dark
Copy link
Author

X-dark commented Nov 13, 2019

Sorry for not having been able to test this sooner. It does fix this bug. Thank you.

javerous pushed a commit to javerous/FreshRSS that referenced this issue Jan 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants