-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Numerals read as \u0000
when using font feature settings
#523
Comments
Thanks for the clear report and simple test files. Looking at the features on file, it has a ToUnicode CMap that maps each glyph code to the unicode codepoint \u0000 and we're honoring it:
However, the content stream is using the optional "marked content" operators (
pdf-reader currently doesn't look at marked content. Maybe we should, and maybe this suggests marked content should take precedence over ToUnicode CMaps? |
First of all, thanks for the work and effort you've put into this great library!
Bug description
We are having an issue with numerals not being read correctly by
PDF::Inspector::Text.analyze
. They get misinterpreted as\u0000
when we usefont-feature-settings: 'tnum'
as style. We are generating the PDF with Gotenberg from HTML templates.Minimal reproducible example
<div>21.09.2023</div>
gets read as21.09.2023
while
<div style="font-feature-settings: 'tnum'">21.09.2023</div>
gets read as\u0000\u0000.\u0000\u0000.\u0000\u0000\u0000\u0000
.PDFs
Here are two PDFs, one with the feature turned off and one with the feature turned on:
font_features_off.pdf
font_features_on.pdf
Further information
The UNIX tool
pdftotext
is able to read both versions correctly so I think the PDF is alright.The font in use is Barlow if that makes any difference.
Any help would be appreciated!
P.S.: I'll also open an issue regarding this problem over at https://github.com/prawnpdf/pdf-inspector so feel free to close this one if you think it should be handled there.
The text was updated successfully, but these errors were encountered: