-
Notifications
You must be signed in to change notification settings - Fork 539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes are breaking something commit 1f4056d #359
Comments
It seems that there are glyphs in use which are not part of |
It would be really great to have a sample, otherwise I'm not sure how to debug this properly. Maybe you can remove any sensitive content and just leave in a few sentences or paragraphs that cause the issue? |
Also, could you please try whether the issue happens with this branch as well? https://github.com/skyfms/pdfparser/tree/fix_encoding |
Thought after editing (I am not the author of this file) problem will disappear but it's still there. Problem occurs in this branch too. |
Can we add the PDF file (https://github.com/smalot/pdfparser/files/5429471/cv_Zj.pdf) to our test environment? Is it free of charge and has no obligations? |
I am not the author but I've removed sensitive informations, so.. probably. |
Sorry for not responding, I'm quite busy at the moment with several work projects in parallel, but I haven't forgotten about the issue and will look at it as soon as I find the time! |
…postscript lookup table
Finally got around looking into this. I just created PR #362 to handle this issue. Please note the following, which I also added as a comment in the related test case:
|
If it's reading provided pdf then it's fine and it looks like it is, original one too. |
…ookup table (#362) * hotfix for #359 and #360: fallback for glyphs not in the postscript lookup table * test comment and assertion: actually just one character being decoded incorrectly * added @todo keyword to test case comment so we can keep track of this * moved comment before code as requested in the review * fix code linting
…postscript lookup table (smalot#362) * hotfix for smalot#359 and smalot#360: fallback for glyphs not in the postscript lookup table * test comment and assertion: actually just one character being decoded incorrectly * added @todo keyword to test case comment so we can keep track of this * moved comment before code as requested in the review * fix code linting
Unfortunately I can't provide sample PDF this time.
Output is correct before commit 1f4056d.
Output with latest commit / release:
text:
ż ś ą ą żą ń ą ą ą ą ą ą ą ą ę ś ę ń ą ą ą ń ę ż ą ś ś ż ą ą ż ą ą ą ż ą ę ą ń ż ń ą ą ś ą ą ą ę ż ęż ń ę ę ą ą ą Ś Ś ą ę ż żą ż ż ż ą ś ąś ń ń ń ń ą ę ś ść Ś ść ść ść ść ś ść ę ść ą ść ść ść ść ę ś ż ę ą ą ą
The text was updated successfully, but these errors were encountered: