-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need better escaping for PDF title #636
Labels
Comments
DanBloomberg/leptonica@792db025518a
|
Dan's Leptonica change is for a different code path and is not reusuable, because it only works for ASCII. The Tesseract fix needs to use UTF16-BE, which is fortunately already used elsewhere in pdfrenderer.cpp. |
Fix written, under review. |
EDIT: I've revised the fix to remove a compiler warning.
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
We're going to get corrupt output if an open paren is passed as a title.
Proper escaping look like this. The leading FEFF is boilerplate that
signifies the byte order, everything else is UTF-16BE. The title in this
case is "ru"
https://github.com/tesseract-ocr/tesseract/blob/master/api/pdfrenderer.cpp#L963
The text was updated successfully, but these errors were encountered: