-
Notifications
You must be signed in to change notification settings - Fork 689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change text box to return remaining text as UTF-8, improve Win1252 handling internally, raise errors or warnings rather than silently replacing invalid glyphs #793
Conversation
This patch almost works... we see one test failure and one manual example failure currently. I'm a little nervous about merging this even if we can fix the remaining failure, because it constitutes an API change to the 1.0 stable API. That said, we can preserve nearly perfect backwards compatibility in practice by making |
@@ -101,6 +101,10 @@ def normalize_encoding(text) | |||
"Arguments to text methods must be UTF-8 encoded" | |||
end | |||
|
|||
def to_utf8(text) | |||
text.bytes.pack("U*") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: This may not actually be a correct way of converting back into UTF-8. Needs closer investigation.
In theory, this should be working now! The only thing I haven't completely dealt with yet is making @airblade: Can you please try this code in your original use case and see if it solves the problem for you? It implements UTF-8 in, UTF-8 out, so you shouldn't get any encoding surprises anymore. |
@packetmonkey: This is another "I'm worried it's going to break stuff" pull request. Are you using built-in fonts in any of your non-trivial Prawn code? If so, please take this for a spin. |
Note to all... I am not sure why we were using our own Win1252 implementation... perhaps it wasn't supported in Ruby 1.8 back when we still needed to support 1.8? It's possible to use Ruby's M17N system now for this purpose, so this patch switches to using Ruby's implementation rather than our own under the hood. |
We use both the built in fonts and additional fonts. I have a lot of mock data with letter accents. All of my text gets added via a call to I'll give it a look over and see if I trip into anything. |
@sandal This fixes the problem for me. Thank you! |
a68eef6 contains a scary but probably good change: It raises an error if non-Windows-1252 glyphs are attempted to be rendered in AFM fonts. The behavior of existing versions of Prawn is to replace with an underscore, which is pretty much never what users will want, and masks bugs. This latest also changes or gets rid of tests that actually only worked because they were exploiting this bad behavior. |
@packetmonkey: I'm going to merge this, but if anything breaks open up issues and we'll get them sorted out. |
A squashed merge of #793, containing the following improvements: * Text for all Prawn methods is now UTF-8-in, UTF-8-out, so the user does not need to handle Windows-1252 strings. * Internally, we're now using Ruby's M17n system to handle the encoding into Windows-1252, so text.encoding will come back as Windows-1252 when `AFM#normalize_encoding` is called, rather than `ASCII-8Bit` * When using AFM fonts + ASCII only text, no warning will be seen. * When using AFM fonts + non-ASCII characters that are supported in Windows-1252, users will see a warning about the limited internationalization support, along with a recommendation to use a TTF font instead. * The warning includes instructions on how to disable it (just set `Prawn::Font::AFM.hide_m17_warning = true`) * When using AFM fonts + non-ASCII characters that are NOT supported in * WIndows-1252, an exception will be raised rather than replacing w. `_`. * None of the above will apply to anyone using TTF fonts with sane UTF-8 support, everything should "just work" for those folks.
Merged in adaf18c! |
@sandal Great news! |
@practicingruby I think this change makes the fallback_fonts ignored. |
@isaiah Do not mix AFM fonts with fallback fonts... use TTF fonts throughout and you'll be fine! |
@practicingruby How can I do that? Currently I monkey patched the Prawn::Document#font_families method and #fallback_fonts method, to add my own TTF fonts. Should I disable the original fonts completely? |
Currently if using an AFM-font,Prawn's text box will return overflow text in Win1252 encoding, even though the user must provide their text in UTF-8 format. To render this overflowed text, the user needs to pass
:skip_encoding
, which is a fairly awkward workflow.In an ideal setting, we should treat any transcoding to Win1252 (
which isn't even supported by Ruby, so we implement it ourselves) as an implementation detail, and never allow the transcoded text to cross the public API boundary. In other words, Prawn should be UTF-8 in, UTF-8 out to keep things simple for users.See related discussion on #777, and on #779.