-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[api-minor] Include and use the 14 standard font files. #12726
Conversation
/botio test |
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @brendandahl received. Current queue size: 0 Live output at: http://54.67.70.0:8877/5b052292e34e465/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @brendandahl received. Current queue size: 0 Live output at: http://3.101.106.178:8877/9cd5c996cf5da70/output.txt |
From: Bot.io (Linux m4)FailedFull output at http://54.67.70.0:8877/5b052292e34e465/output.txt Total script time: 26.47 mins
Image differences available at: http://54.67.70.0:8877/5b052292e34e465/reftest-analyzer.html#web=eq.log |
From: Bot.io (Windows)FailedFull output at http://3.101.106.178:8877/9cd5c996cf5da70/output.txt Total script time: 28.68 mins
Image differences available at: http://3.101.106.178:8877/9cd5c996cf5da70/reftest-analyzer.html#web=eq.log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this seems like a very reasonable approach, and the re-use between the CMap/StandardFont factories is really nice.
Besides a few inline comments, based on a quick look at the code, I've got one "bigger picture" suggestion below.
I'm also considering disabling this feature for browsers so we use the builtin fonts. Though, we may want to use the various symbol fonts since those seem to vary per platform.
My general train of thought here (ignoring the symbol fonts for now), is that we'd only fetch/use the standard font files when actually necessary. In practice, that'd mean:
- When the
disableFontFace
API-parameter istrue
. - When handling Patterns, see e.g.
Line 855 in 00b4f86
state.fillColorSpace.name === "Pattern" || - When failing to load a font and we thus fallback, see e.g.
Line 856 in 00b4f86
font.disableFontFace ||
Edit: Thinking about my suggestion a bit more, it seems that conditional loading will probably be somewhat difficult to implement well given the existing font-parsing code and the multitude of cases to consider. However, with caching implemented in the worker (note the inline comment), I'm not sure how much of a problem always loading these font files would really be!?
(Back when I added worker-thread caching of CMap-files, overall performance improved significantly for affected documents.)
src/core/evaluator.js
Outdated
let file = null; | ||
if (standardFontName) { | ||
const data = await this.fetchStandardFontData(standardFontName); | ||
file = new Stream(data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please let fetchStandardFontData
return the data as a Stream
, rather than doing that manually at the call-sites.
src/core/fonts.js
Outdated
@@ -243,6 +243,12 @@ function getFontType(type, subtype) { | |||
} | |||
} | |||
|
|||
function getStandardFontName(name) { | |||
const fontName = name.replace(/[,_]/g, "-").replace(/\s/g, ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're using the same regular expression at
Line 1319 in 00b4f86
let fontName = name.replace(/[,_]/g, "-").replace(/\s/g, ""); |
can you please extract the regexp-part to a new constant, used by both call-sites, to avoid these getting out of sync?
src/display/display_utils.js
Outdated
return { cMapData, compressionType }; | ||
}); | ||
} | ||
return fetchData(url, this.isCompressed).then(data => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
return fetchData(url, this.isCompressed).then(data => { | |
return fetchData(url, /* asTypedArray = */ this.isCompressed).then(data => { |
src/display/display_utils.js
Outdated
}); | ||
} | ||
} | ||
|
||
class DOMStandardFontDataFactory extends BaseStandardFontDataFactory { | ||
_fetchData(url) { | ||
return fetchData(url, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
return fetchData(url, true); | |
return fetchData(url, /* asTypedArray = */ true); |
web/app_options.js
Outdated
@@ -178,6 +178,14 @@ const defaultOptions = { | |||
: "../web/cmaps/", | |||
kind: OptionKind.API, | |||
}, | |||
standardFontDataUrl: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move this further down, since the parameters for each OptionKind
should be sorted alphabetically.
src/core/evaluator.js
Outdated
const standardFontNameToFileName = { | ||
Courier: "FoxitFixed", | ||
"Courier-Bold": "FoxitFixedBold", | ||
"Courier-BoldOblique": "FoxitFixedBoldItalic", | ||
"Courier-Oblique": "FoxitFixedItalic", | ||
Helvetica: "FoxitSans", | ||
"Helvetica-Bold": "FoxitSansBold", | ||
"Helvetica-BoldOblique": "FoxitSansBoldItalic", | ||
"Helvetica-Oblique": "FoxitSansItalic", | ||
"Times-Roman": "FoxitSerif", | ||
"Times-Bold": "FoxitSerifBold", | ||
"Times-BoldItalic": "FoxitSerifBoldItalic", | ||
"Times-Italic": "FoxitSerifItalic", | ||
Symbol: "FoxitSymbol", | ||
ZapfDingbats: "FoxitDingbats", | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might not hurt to move this to https://github.com/mozilla/pdf.js/blob/master/src/core/standard_fonts.js instead!?
Given the sheer number of long-standing issues/bugs that this PR would solve, this seems like something that would be really nice to try and fix-up and land sooner rather than later :-) Also, we're probably overdue for a new PDF.js release at this point; however I'd suggest that we hold off on that a little while longer since this PR would be really nice to include in the next release. /cc @timvandermeij |
I agree, this would be a really nice one to include. |
Still waiting on some Firefox talos runs for windows, but locally this patch is much slower loading the first page of unix01.pdf in the pdfpaint benchmark. 20 runs median 238.43ms vs 332.51ms |
Windows talos runs also show a slow down: |
I suppose that it's not entirely surprising if some slowdown is observed, since we're now having to actually parse font data that we previously didn't have to. The question is obviously how large of a regression we can accept!? Looking briefly at the tested patch, it appears that it doesn't actually implement worker-thread caching of the standard font data (something that I mentioned as a potential problem above). If that's indeed the case, I'd not be entirely surprised if that at least in part contributes to the observed regression. What would be very interesting to know, is more precisely where the regression comes from:
All-in-all, I suppose that it's going to take more work to get this landed than I'd initially hoped. |
Having played around with this patch locally, focusing on the actual standard font data loading (using Based on that observation I have one idea, however it's not a great one: What if we simply bundled the standard font data into the It's difficult to not compare with the CMaps, but I do believe that this situation is slightly different here:
[1] Many users seem to just ignore the errors/warnings, mentioning the problem, printed in the console. |
I also tried fetching the font data from the worker. It was a bit faster but not much. I'd be curious to try directly embedding the files in the worker, but 266,616 bytes seems like a big increase for something that is not really needed in the common case.
For the talos test, the web page is reloaded so the cache wouldn't help there. Also, I only see the font data being loaded once right now since we cache fonts, so I'm not sure this would help in the re-render case either. I'm planning to put this patch on the back burner though as we have some higher priority stuff for firefox (XFA and tagged PDFs). I'll push up my changes with comments addressed, then we could add a pref to have it disabled by default for the time being. |
One more idea here: How about generating a separate bundle with the standard font data, and then using Edit: Or how about simply generating one large font data file, similar to but a lot simpler than the |
34a28ab
to
bc41373
Compare
I've tried implementing the second idea in #12726 (comment), i.e. creating a bundle with all of the standard font data such that we only need to fetch a single file. Please find my patch at master...Snuffleupagus:standard-fonts @brendandahl This is probably the best that we can do, without inlining the standard font data in the |
I obviously don't question the value of either of those features. |
@brendandahl Ping, how about my idea/implementation mentioned in #12726 (comment)? Given that it's going to be impossible to add this functionality without any sort of performance overhead, since we now need to parse more font data in affected cases, the only thing that we can realistically do here is to limit the necessary data fetching/loading by only having one (larger) font data bundle. |
I imagine that will slow down the case where there's only one font used (especially for the generic version of the viewer)? Another thought, we could try to translate the fonts so they don't need to go through the full font parser. I imagine this would be quite a bit of worker though. It'd probably be good to profile some more though, and see if the font translation is slowing it down or just loading the font in the browser is the slow part. |
At least locally, the difference in loading time is quite small between loading one font file and loading the (larger) bundle. It seems to me that the overhead related to simply opening the connection, fetching the data, and finally transferring it to the worker-thread is what dominates overall.
I've also had that idea, but thinking about the practical implications of that made me quickly abandon it. Besides the overall complexity, we'd also need to add a way to automatically re-build the translated fonts for any code-changes.
With the bundle approach, it's mostly the actual font conversion code that contributes to the "regression" here. However, I do believe that the comparison is perhaps not entirely fair here: |
Another thing that I think this patch would also improve is rendering of standard fonts on Linux, since we've seen a few bugs/issues reported where non-embedded standard fonts render badly on Linux because of poor font substitution. Some examples include https://bugzilla.mozilla.org/show_bug.cgi?id=1695727 and issue #11840 |
Well, with recent development I don't really think that argument holds any more :-) The XFA functionality, which just landed, increased the size of the built |
XFA is a lot less frequent than Acroform but we see about a million unique monthly active users encountering them in our desktop telemetry. I know at least of the Canadian Gov website using them. This is a significant number of users who will benefit from this change. |
In all fairness, then we should also measure how often users encounter non-embedded standard fonts in PDF documents; I'd be extremely surprised if that number isn't at least one order of magnitude larger :-) |
There are some XFA forms on the UK and USA gov websites too. |
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @brendandahl received. Current queue size: 0 Live output at: http://54.67.70.0:8877/7e9df84dd3886ac/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @brendandahl received. Current queue size: 0 Live output at: http://3.101.106.178:8877/bc88cc4cd488960/output.txt |
From: Bot.io (Linux m4)FailedFull output at http://54.67.70.0:8877/7e9df84dd3886ac/output.txt Total script time: 26.19 mins
Image differences available at: http://54.67.70.0:8877/7e9df84dd3886ac/reftest-analyzer.html#web=eq.log |
From: Bot.io (Windows)FailedFull output at http://3.101.106.178:8877/bc88cc4cd488960/output.txt Total script time: 29.65 mins
Image differences available at: http://3.101.106.178:8877/bc88cc4cd488960/reftest-analyzer.html#web=eq.log |
/botio makeref |
From: Bot.io (Linux m4)ReceivedCommand cmd_makeref from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.67.70.0:8877/cdfe0a1ad516c1b/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_makeref from @Snuffleupagus received. Current queue size: 0 Live output at: http://3.101.106.178:8877/029042f654db4da/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/cdfe0a1ad516c1b/output.txt Total script time: 22.76 mins
|
From: Bot.io (Windows)SuccessFull output at http://3.101.106.178:8877/029042f654db4da/output.txt Total script time: 26.50 mins
|
- a lot of xfa files are using Myriad pro or Arial fonts without embedding them and some containers have some dimensions based on those font metrics. So not having the exact same font leads to a wrong display. - since it's pretty hard to find a replacement font with the exact same metrics, this patch gives the possibility to read glyf table, rescale each glyph and then write a new table. - so once PR mozilla#12726 is merged we could rescale for example Helvetica to replace Myriad Pro.
Could we get a new build of |
Not immediately, since this just landed and we (obviously) need some time to handle any possible regressions and there's also various follow-up work that should happen first. (Also, the last release was less than two weeks ago.) |
I don't github (verb) very often, so I don't know if this is correct/appropriate (or if I'm about to ping a million people, in which case I am deeply sorry) but I just read through this and wanted to add my "thank you 1e6" for the effort to get this in; at least in my case it dramatically increases the usability. |
- a lot of xfa files are using Myriad pro or Arial fonts without embedding them and some containers have some dimensions based on those font metrics. So not having the exact same font leads to a wrong display. - since it's pretty hard to find a replacement font with the exact same metrics, this patch gives the possibility to read glyf table, rescale each glyph and then write a new table. - so once PR mozilla#12726 is merged we could rescale for example Helvetica to replace Myriad Pro.
Sorry for the late comment. I just wanted to say that the fonts taken from PDFium are possibly misnamed, or their format is not immediately clear. |
- a lot of xfa files are using Myriad pro or Arial fonts without embedding them and some containers have some dimensions based on those font metrics. So not having the exact same font leads to a wrong display. - since it's pretty hard to find a replacement font with the exact same metrics, this patch gives the possibility to read glyf table, rescale each glyph and then write a new table. - so once PR mozilla#12726 is merged we could rescale for example Helvetica to replace Myriad Pro.
Fixes #4244 (at least for the standard font issue).
I need to do some more testing to see if this impacts performance for the browser use case. I'm also considering disabling this feature for browsers so we use the builtin fonts. Though, we may want to use the various symbol fonts since those seem to vary per platform.