Odd Unicode characters instead of real letters are now used to render texts #10205

wojtekmaj · 2018-11-01T07:38:55Z

Hello,
in v2.0.550 text rendered in SVG rendering mode used normal letters:

<tspan x="0 16.308 33.264 61.74 80.46 88.416 114.48 133.2 151.956 167.256 185.976 214.452 232.236 250.74" y="0" font-family="g_d0_f2" font-size="36px" fill="rgb(46,83,149)">
  Sampledocument
</tspan>

The same node in v2.0.943 (after #9192) looks like so:

<tspan x="0 16.308 33.264 61.74 80.46 88.416 114.48 133.2 151.956 167.256 185.976 214.452 232.236 250.74" y="0" font-family="g_d1_f2" font-size="36px" fill="rgb(47,84,150)">
  
</tspan>

I don't see how losing the ability to read the source would benefit anyone. Is there a way to get the old behavior back?

The text was updated successfully, but these errors were encountered:

wojtekmaj · 2018-11-01T14:27:54Z

It looks like glyphs array items have fontChar property broken now. unicode is fine.

Going further to charToGlyph function, we will notice that fontCharCode was usually equal to charcode. In a version working properly, I can see the line:

      var unicode = this.toUnicode.get(charcode) || charcode;

while in the version not working properly:

      var unicode = this.toUnicode.get(charcode) || this.fallbackToUnicode.get(charcode) || charcode;

There are no other notable differences in that function, so I assume it's this line that causes the problems.

timvandermeij · 2018-11-01T22:05:44Z

From the information here I assume this is an SVG back-end specific issue, is that correct?

@brendandahl @Snuffleupagus Do you perhaps know more about what can cause this?

wojtekmaj · 2018-11-02T07:50:44Z

It's not back-end specific. It's the easiests to see the consequences when using SVG rendering (which can also be done front-end side). Prior to 2.0.943, SVG used a sane textContent for text to be rendered. Have a look here:

http://projects.wojtekmaj.pl/react-pdf/test/ (this is version based on older PDF.js)

"Use imported file"
Choose SVG render mode
Inspect text in rendered SVG, e.g. "Sample document"

On this version, you'll find "Sampledocument" textContent. Alright, PDFs doing their PDF-y thingies, that's close enough to me.

Now do the same steps using
http://projects.wojtekmaj.pl/react-pdf/test/beta (this is version based on 2.0.943)

In this version you will see that while the letters appear correct, the HTML rendered is garbage.

This has two serious consequences:

I'm unable to programatically create a text layer which would match text and font of the original PDF file
Copying the text from SVG results in a complete garbage:







brendandahl · 2018-11-02T15:24:49Z

I commented in the other bug, this is by design because of #9340. All the char codes are moved into the private use area unicode range. To properly do text selection w/ svg we should do something like the canvas backend and create a text layer from the unicode mappings.

Snuffleupagus · 2018-11-02T15:48:13Z

To properly do text selection w/ svg we should do something like the canvas backend and create a text layer from the unicode mappings.

Note that this is already done in the default viewer, when the renderer preference is set to svg.

I'm unable to programatically create a text layer which would match text and font of the original PDF file

Keep in mind that that will never be a complete solution for text-selection/copying/searching purposes, since the PDF format distinguishes between rendering/text-extraction; hence why e.g. ToUnicode exists.
In particular, consider the case of ligatures (e.g. fi, ff, ...) which PDF viewers generally will expand to their separate characters. Since there's no guarantee that a font will contain data for the separate characters of a ligature, attempting to use the original font for text-selection purposes will never be a complete solution.

Also, please keep in mind that the status of the SVG back-end is probably, as far as I know, best described as "experimental" and that it's thus not officially supported; #9211 (comment) is probably relevant here as well.

wojtekmaj · 2018-11-02T16:28:53Z

It's not only applicable to SVG though. It's especially harmful for SVGs for the reasons I pointed out, like copying the original text, but that can be worked around using the same text layer that's being used for canvas rendering.

I'm using the original fonts to create a text layer over the canvas in my implementation, and using the same font as the original source in vast majority of cases gave me much more accurate results than using some default font. Moving all the char codes are moved into the private use area Unicode range without leaving them in their default positions made the fonts completely unusable.

wojtekmaj · 2018-11-13T12:26:08Z

Is there anything I could do to resolve this issue? Perhaps it could be an option, like disablePrivateUnicodeArea on page.render?

Snuffleupagus · 2018-11-13T14:08:41Z

[...] and using the same font as the original source in vast majority of cases gave me much more accurate results than using some default font.

A text-selection implementation that by design breaks a relatively common feature, such as ligatures, should probably not be described as a "good solution" in general; but I digress.

Perhaps it could be an option, like disablePrivateUnicodeArea on page.render?

If glyphs are left in their original positions, and are not being re-mapped to a PUA, that is guaranteed to completely break font rendering in a very large number of PDF files; refer to PR #9340 for additional details.
Honestly, it really makes no sense whatsoever to add an option (and related code) that will knowingly break font rendering in this way.

Perhaps it may be slightly more acceptable to add an option, false by default of course ~~(to not unnecessarily bloat toFontChar), that would leave glyphs in their original position in addition to re-mapping them to a PUA.~~ Edit: D'oh, but obviously that won't work, and you'd need an additional array (e.g. originalToFontChar, naming things is hard) to hold this data.
However, before anyone attempts to implement something, it's advisable to wait for Brendan to comment.

brendandahl · 2018-11-13T23:36:07Z

If we really just want to improve text selection there some other things we could try. One option would be to generate a font that has the same width glyphs as the original font, but each glyph would just draw a square or line and it would be assigned to the unicode value..

Snuffleupagus · 2019-01-21T09:17:49Z

One option would be to generate a font that has the same width glyphs as the original font, but each glyph would just draw a square or line and it would be assigned to the unicode value..

In this case, it seems that this issue could just be marked as a duplicate of #1914.

timvandermeij · 2019-01-21T22:20:20Z

Yes, let's close this as such and track the issue there.

This was referenced Nov 1, 2018

How to get a font used on the page? #10204

Closed

Officially release 4.0.0 wojtekmaj/react-pdf#269

Closed

timvandermeij added the 4-svg label Nov 1, 2018

wojtekmaj changed the title ~~SVG rendering now uses odd Unicode characters instead of real letters~~ Odd Unicode characters instead of real letters are now used to render texts Nov 2, 2018

timvandermeij added font-conversion and removed 4-svg labels Nov 2, 2018

wojtekmaj mentioned this issue Nov 13, 2018

Compatibility with Webpack 4 / Create React App 2 wojtekmaj/react-pdf#179

Closed

Snuffleupagus mentioned this issue Nov 16, 2018

Wrong svg:tspan encoding #10261

Closed

wojtekmaj mentioned this issue Nov 23, 2018

Characters and orientation messed up during printing #10296

Closed

taherbert mentioned this issue Nov 28, 2018

SVG rendering displays odd unicode characters instead of letters wojtekmaj/react-pdf#309

Closed

3 tasks

timvandermeij closed this as completed Jan 21, 2019

kevin8479 mentioned this issue Mar 3, 2020

[SVG render] How to relate rendered svg text to actual text #11661

Closed

This was referenced Jul 3, 2020

[PDF to SVG, Text/Glyphs] How to have the reel text (remapped) into the exported svg? #12053

Closed

[PDF Merge, Header/Footer] Do we have a way to merge additional pdfs and keep the header/footer? wkhtmltopdf/wkhtmltopdf#4754

Closed

wojtekmaj mentioned this issue Aug 11, 2020

Copy/paste text selection not working correct from rendered pdf wojtekmaj/react-pdf#627

Closed

3 tasks

Snuffleupagus mentioned this issue Aug 5, 2022

SVG: Text cannot be displayed in the browser console. #15277

Closed

2-one-week mentioned this issue Apr 29, 2024

[react-pdf] 미지원 글꼴 지원 (cmap) 및 font overriding 대응 이슈 확인 NaverPayDev/pie#30

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Odd Unicode characters instead of real letters are now used to render texts #10205

Odd Unicode characters instead of real letters are now used to render texts #10205

wojtekmaj commented Nov 1, 2018 •

edited

Loading

wojtekmaj commented Nov 1, 2018 •

edited

Loading

timvandermeij commented Nov 1, 2018

wojtekmaj commented Nov 2, 2018 •

edited

Loading

brendandahl commented Nov 2, 2018

Snuffleupagus commented Nov 2, 2018 •

edited

Loading

wojtekmaj commented Nov 2, 2018

wojtekmaj commented Nov 13, 2018

Snuffleupagus commented Nov 13, 2018 •

edited

Loading

brendandahl commented Nov 13, 2018

Snuffleupagus commented Jan 21, 2019

timvandermeij commented Jan 21, 2019

Odd Unicode characters instead of real letters are now used to render texts #10205

Odd Unicode characters instead of real letters are now used to render texts #10205

Comments

wojtekmaj commented Nov 1, 2018 • edited Loading

wojtekmaj commented Nov 1, 2018 • edited Loading

timvandermeij commented Nov 1, 2018

wojtekmaj commented Nov 2, 2018 • edited Loading

brendandahl commented Nov 2, 2018

Snuffleupagus commented Nov 2, 2018 • edited Loading

wojtekmaj commented Nov 2, 2018

wojtekmaj commented Nov 13, 2018

Snuffleupagus commented Nov 13, 2018 • edited Loading

brendandahl commented Nov 13, 2018

Snuffleupagus commented Jan 21, 2019

timvandermeij commented Jan 21, 2019

wojtekmaj commented Nov 1, 2018 •

edited

Loading

wojtekmaj commented Nov 1, 2018 •

edited

Loading

wojtekmaj commented Nov 2, 2018 •

edited

Loading

Snuffleupagus commented Nov 2, 2018 •

edited

Loading

Snuffleupagus commented Nov 13, 2018 •

edited

Loading