Base font glyph + uncommon glyph system #17

yhahn · 2014-02-04T20:22:11Z

Eventually move to a system where some common glyph images are downloaded as a "base font glyph file", and only include uncommon glyph images in the vector tiles.

@kkaefer curious what you have in mind here. I at one point envisioned a "glyph tile" system where you'd download/cache whole chunks of unicode ranges as you ran into them

http://jrgraphix.net/research/unicode_blocks.php

This would separate the tiles fully from font glyphs, but not sure what other considerations should be involved here.

kkaefer · 2014-02-04T22:07:48Z

That's what I thought too initially. Upon further examination, this turns out to not be such a good idea, in particular for asian languages: Mandarin has around 20k glyphs, and a typical tile uses ~50-200 of them. They seem to be mostly random; there is no concentration on a few unicode ranges. So this means that we'd have to download most of the font anyway (I measured ~5 MB to download all required font ranges for Mandarin)

yhahn · 2014-02-04T22:12:09Z

(╯°□°）╯︵ ┻━┻

yhahn · 2014-02-05T16:44:34Z

Next actions for me

I'd like to get a better feel for the requirements here. Next actions for me are to set up a demo that "simulates" download requirements by inspecting the loaded vector tiles as you pan around a map. Idea is to make glyph block sharding variable and see if there is a sweet spot in terms of shard size that optimizes no. of requests + amount of data to download.

yhahn · 2014-02-06T19:09:28Z

Rough cost of each glyph

{ size: 724727, count: 1383, avgsize: 524.0253073029646 }

Basic idea is to put a rough heuristic cost on each glyph on the deflated PBF.

Compares final deflate size of original VT (no augment) vs VT with glyphs (augmented),
Counts number of glyphs in augmented VT,
Produces an "average" cost per glyph.

This includes not just the texture cost but the overhead of metadata in the PBF for glyph position, etc. Samples broader range of glyphs -- tiles are from dense areas of SF, tokyo, shanghai, tel aviv and berlin. The avgsize here is probably on the large side for calculating glyph-only tile size because fontserver is also providing info for each feature string in the tile.

Using 524 as avg bytesize per glyph, ballpark sizes for diff glyph tile shard sizes:

no of glyphs	kb
128	65.5k
256	131.0k
512	262.0k

Next up

I'll be using these numbers to play around with simulated glyphtile DL scenarios while panning around diff parts of the world.

cc @kkaefer let me know if I'm way off here.

yhahn · 2014-02-06T19:50:14Z

@kkaefer based on a totally rough count (20x20 glyphs, say, === 215 bytes per glyph) from https://f.cloud.github.com/assets/52399/1181283/e12a5092-2205-11e3-9ddf-e4fc20b22b39.png, it seems like the cost above is pretty large. This would mean to me that either the png compression is beating deflate a lot, the overhead of the glyph position encoding is more than I would guess, or both.

What is the reason this information is encoded witih each feature string btw? It seems like it should be possible to look this up on the fly (but maybe I don't understand all the mechanics here):

https://github.com/mapbox/fontserver/blob/master/test/expected/shape.json#L703-L708

kkaefer · 2014-02-06T20:20:30Z

I think you're in a good spot with estimates. In my experience, tiles were about 30-40% larger with glyph images than without. PNG compression is zlib, but it employs additional tricks like filtering to get better run lengths and repetition.

The reason the glyph positions are included for every string rather than using the glyph advance is that for complex scripts that perform glyph substitutions, there is no 1:1 mapping from a character in the unicode string to a glyph.

While for latin, cyrillic and most east asian scripts, there usually /is/ a 1:1 mapping, scripts like arabic don't: ا and ل in a unicode sequence produce ﻻ when shaped, which is just one glyph (cf. http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=CmplxRndExamples for more background).

After we get the shaping results, we could look and employ the automatic glyph-advance-based shaper to see if this produces the same result. If it does, we could drop the shaping information and use the same algorithm on the client.

kkaefer · 2014-02-06T20:26:08Z

Oh, another issue that shaping is handling is bidirectional text: In RTL languages that use arabic numbers (like hebrew, arabic) you have text runs that go from right-to-left, then the numbers are written left-to-right, then the text continues right-to-left. Mapnik has code that handles this, Pango handles this in the shaping as well.

yhahn · 2014-02-06T20:46:23Z

Request overhead

Initial impressions/numbers from playing around with a hacked llmr to estimate glyph tile req count + dl size is not good. More than anything else, viewing anywhere outside the US leads very rapidly to a huge request count overhead in addition to the tiles being downloaded (e.g. think 20+ additional GETs for an initial map load.) Per @kkaefer's original notes, the way glyphs are used just doesn't lend itself well to good sharding/clustering.

Next actions

Look at just straight up reducing size next,
Lower priority on a base font glyph resource approach which is likely to introduce more complexity and have weirdly hard-to-predict effectiveness on a global map

yhahn · 2014-02-07T14:51:54Z

More notes,

Look at just straight up reducing size next,

bitmaps are the majority of our cost at ~80-90% of the overhead. We can leave messing with tighter metadata/encoding to later/never.
the distance fields can be compressed better (think up to 20-30%) without loss of visual quality by reducing the granularity of the 0-255 value (e.g. steps of 2, 4, 8). At around 16 distinct values you start to notice artifacts.

No huge savings to be had here.

Lower priority on a base font glyph resource approach

Reversing this thought. Assuming we provide base glyphs for 0020-00FF, the number of distinct glyphs per tile:

tile	distinct glyphs	distinct glyphs excluding 0-255
berlin.vector.pbf	108	4
sfo.vector.pbf	86	3
shanghai.vector.pbf	389	316
telaviv.vector.pbf	124	52
tokyo.vector.pbf	722	647

The character sets that will benefit from this most are 0020-04FF I think (up to Cyrillic). The glyph coverage problem is just the nature of the beast, esp in the CJK languages, and delivering glyphs on a one-off basis with tiles seems most efficient there.

Conclusion: basically I did a round trip to what @kkaefer already recommended : )

Next actions:

Start with just 0-ff as our base glyph. We can come up with additional base glyphs/extensions through 0-4ff if this is successful. It looks like @kkaefer has some commented out code in fontserver for this where I can get started.

mikemorris · 2014-02-18T22:33:26Z

After spending a while getting myself oriented to the codebase and Node C++ addons in general, I managed to get an initial sketch of a base glyph pattern hacked into place. This is nowhere near being clean or good code, just a place to start from. What should next steps be in terms of testing versus optimization?

Tracking in:

/cc @yhahn @kkaefer

mikemorris · 2014-02-18T23:11:34Z

Initial observations:

Base glyph with 0-ff shaves off about 20k per tile, replaced with an initial 60k request.
To ensure reliable rendering, no tiles can be rendered until after the base glyph has been loaded.
Currently adds overhead of copying base glyphs/rects into each web worker, don't think transferrable objects would be of much help here seeing as how we need to retain the glyph atlas in the main thread.

mikemorris · 2014-02-20T22:50:45Z

Copying even a small 0-ff glyph library into each web worker looks to be adding significant overhead, increasing the main thread to worker callback delay from single digit milliseconds to often over 100ms and in some cases over 300ms.

~~False alarm, I broke something else implementing baseglyph branch apparently, tested a more stripped down version and was back to single digit millisecond times.~~

The stripped down version wasn't actually writing glyphs to the copied object. Doh.

Next steps:

Attempt to restructure to avoid copying glyph lib into worker.

kkaefer · 2014-02-21T09:35:05Z

@mikemorris We don't need to copy the glyph images to the workers, just the position information. That should be pretty fast.

mikemorris · 2014-02-21T16:24:14Z

@kkaefer Alright, main hangup was that glyphs are expected to be on the faces object at https://github.com/mapbox/llmr/blob/master/js/text/placement.js#L228, ~~gonna try attaching them after the worker returns.~~ and the stripping at https://github.com/mapbox/fontserver/blob/baseglyph/src/tile.cpp#L681-684 completely removes them from the protobuf.

mikemorris · 2014-02-25T17:30:00Z

Tested packing rects into typed arrays and sending to the worker as transferrable objects. This ended up being slower because of the increased overhead of unpacking the typed arrays into a usable structure.

mikemorris · 2014-02-25T19:01:44Z

Okay, so I was using an ASCII hack to build the base 0-FF base glyph tile, now I'm running into all sorts of confusion trying to expand the base glyph to a larger Unicode range - any suggestions?

/cc @springmeyer @artemp @kkaefer

kkaefer · 2014-02-25T22:28:40Z

https://github.com/mapbox/fontserver/blob/baseglyph/src/tile.cpp#L729-L732 looks very fishy to me. What is it supposed to do?

mikemorris · 2014-02-25T23:34:40Z

@kkaefer It's a super hacky way of iterating over a range of characters, that really only works for 0-128 ASCII. What I'm trying to do is iterate over a Unicode character range (say 0000-06FF for Latin, Cyrillic and Arabic) to build a base glyph set for each font in the stack. Haven't quite figured out how to get a reference to the actual PangoFont object, as it looks like the regular tiles are building the font list dynamically.

mikemorris · 2014-02-28T19:51:52Z

Not quite sure what's causing this issue, but it looks like FreeType is returning an error when attempting to load many glyphs in the Greek+ Unicode ranges in Open Sans. The puzzling part is that the glyph_index has been validated by g_unichar_validate and Open Sans is being picked by pango_fontset_get_font as the font that contains the best glyph (https://github.com/mapbox/fontserver/blob/8db40c458341b785b0001a66d86b632802025413/src/tile.cpp#L736).

Unicode Range	Size	Glyphs
0000-03A9	225KB	Max range without FreeType error 6 `Invalid_Glyph_Index`

FreeType Error Codes: http://www.freetype.org/freetype1/docs/api/freetype1.txt

mikemorris · 2014-02-28T22:01:10Z

Invalid Glyph Index was a result of passing char_code where glyph_index was needed.

Unicode Range	Size	Glyphs
0000-04FF	301KB	Latin, Greek, Cyrillic
0000-06FF	376KB	Latin, Greek, Cyrillic, Armenian, Hebrew, Arabic

mikemorris · 2014-02-28T23:17:24Z

Why are all these extra fonts includes in PangoFontset?

Open Sans 24
Arial Unicode MS 24
Arial Unicode MS Bold 24
DINPro 24
GE Inspira 24
GE Inspira Small Caps 24
Helvetica LT Std 24
PT Sans Caption 24
Proxima Nova 24
League Gothic 24
TeX Gyre Heros 24
Source Sans Pro 24
Pompiere  24
Avenir 24
Freehand521 BT 24
Frutiger LT 45 Light, 24
Komika Parch 24
Visitor TT2 BRK 24
Avenir 24
Ubuntu Mono 24
Ubuntu Condensed, 24
Ubuntu 24
Crimson 24
Chennai Medium 24
Merriweather Light 24
Crimson Semi-Bold 24
Crimson Bold 24
Crimson Italic 24

mikemorris · 2014-03-07T23:59:22Z

I think the eventual goal should be to move fontserver to HarfBuzz/FreeType, same stack as Mapnik and Firefox (Chromium/Blink is using HarfBuzz directly too but still uses fontconfig as well). Pango simply isn't designed to offer enough control. All of these projects use FriBidi for bidirectional text.

Short term goal is to fix base glyph loading in llmr to ensure the base glyph set is loaded into the glyph atlas before tiles are rendered before removing Pango though.

springmeyer · 2014-03-08T00:52:52Z

To be clear: The browsers may use Fribidi but Mapnik does not: we moved from fribidi to icu for bidi in 2008: http://mapnik.org/news/2008/02/20/mapnik_unicode/

mikemorris · 2014-03-10T15:11:58Z

fribidi was choking in multi-threaded rendering

Ah, because of the above you retained ICU bidi even after switching to HarfBuzz @springmeyer?

springmeyer · 2014-03-10T20:39:19Z

We first replaced fribidi with ICU. Because ICU only supports shaping for arabic, we then added harfbuzz for better shaping. For now we've kept ICU for a variety of things: notably for text itemization which uses the ICU bidi algo: https://github.com/mapnik/mapnik/blob/master/include/mapnik/text/itemizer.hpp

…k#17

…17

mikemorris · 2014-03-18T22:42:10Z

Welp, moved the Font class using Pango to Pango_Font and got a FT_Font class cobbled together (pulling in useful-looking pieces from Mapnik), only to discover that this class was being used in exactly one spot in src/shaping.cpp

It looks like src/tile.cpp is using PangoFont directly instead of the wrapper class, does it make sense to switch this to the new FT_Font class @kkaefer?

Work is in the harfbuzz branch.

mikemorris · 2014-03-18T22:48:48Z

As a followup, under what circumstances is that use case of Font actually triggered @kkaefer? I added some logging and didn't notice that path ever running, even when panning around areas with shaped text.

kkaefer · 2014-03-19T13:38:11Z

@mikemorris There used to be a separate interface for just creating fonts and inspecting these fonts (like enumerating all glyphs in that font and getting their metrics) which I used for debugging purposes. We don't need that interface anymore.

mikemorris · 2014-03-19T15:02:05Z

Thanks @kkaefer. After all this wrangling I at least have a pretty solid understanding of FreeType now to tackle the rest.

mikemorris · 2014-06-02T14:52:42Z

Glyph ranges implemented, next steps for base + uncommon in #36

mikemorris added a commit to mapbox/mapbox-gl-js that referenced this issue Feb 18, 2014

pass ALL THE GLYPHS to the workers, ref mapbox/node-fontnik#17

1ca08ee

mikemorris added a commit to mapbox/mapbox-gl-js that referenced this issue Feb 21, 2014

pass glyph offsets with rects to workers, ref mapbox/node-fontnik#17

29a9f25

mikemorris added a commit that referenced this issue Feb 27, 2014

iterate over glyphs, print glyph_index to std::cerr, ref #17

3c58d20

mikemorris added a commit that referenced this issue Feb 28, 2014

build base glyph with optimal fonts, add debug logging, ref #17

8db40c4

mikemorris added a commit that referenced this issue Feb 28, 2014

iterate char_code, pass glyph_index, ref #17

5febf40

mikemorris added a commit that referenced this issue Feb 28, 2014

log PangoFontset font list, ref #17

f20d683

mikemorris added a commit to mapbox/mapbox-gl-js that referenced this issue Mar 5, 2014

don't remove base glyphs from glyph atlas, ref mapbox/node-fontnik#17

8183f2b

mikemorris added a commit to mapbox/mapbox-gl-js that referenced this issue Mar 11, 2014

update buckets when base glyph set is loaded, ref mapbox/node-fontnik#17

78f0196

mikemorris added a commit to mapbox/mapbox-gl-js that referenced this issue Mar 11, 2014

disable source until base glyph set it loaded, ref mapbox/node-fontni…

65c0352

…k#17

mikemorris added a commit that referenced this issue Mar 11, 2014

abstracting Pango engine from Font class, ref #17

73c96f8

mikemorris added a commit that referenced this issue Mar 13, 2014

move shaping to pango_shaper, ref #17

b14f811

mikemorris added a commit that referenced this issue Mar 13, 2014

initial skeleton of freetype_engine, ref #17

803edea

mikemorris added a commit that referenced this issue Mar 14, 2014

adding FT_Font class to inherit from ObjectWrap and wrap FT_Face, ref #…

7c4ab88

…17

mikemorris added a commit that referenced this issue Mar 17, 2014

switch freetype engine to be factory of wrapped objects, ref #17

3188f3e

mikemorris added a commit that referenced this issue Mar 18, 2014

use FT_Font in pango_shaper instead of PangoFont, ref #17

de7b6bf

mikemorris added a commit that referenced this issue Mar 18, 2014

select FreeType font from string, ref #17

b111cd9

mikemorris added a commit that referenced this issue Mar 19, 2014

add freetype include, remove v8 namespace, ref #17

bc3bd30

mikemorris added a commit that referenced this issue Mar 19, 2014

remove unused includes from fontserver.cpp and binding.gyp, ref #17

175f511

mikemorris added a commit that referenced this issue Mar 19, 2014

back to tile.cpp, add FT_Library to constructor and destructor, ref #17

a3d463f

mikemorris added a commit that referenced this issue Mar 19, 2014

switch from PangoFont handling to FT_Face handling, ref #17

e339a2c

mikemorris added a commit that referenced this issue Mar 24, 2014

add mapnik font stack, ref #17

c58cb13

mikemorris added a commit that referenced this issue Mar 25, 2014

switch harfbuzz_shaper from struct to class, ref #17

b691833

mikemorris added a commit that referenced this issue Mar 25, 2014

compile with harfbuzz_shaper.cpp, ref #17

35a149a

mikemorris added a commit that referenced this issue Mar 27, 2014

add register_fonts v8 wrapper from node-mapnik, ref #17

542e5bd

mikemorris added a commit that referenced this issue Mar 27, 2014

why does boost::filesystem::exists segfault? ref #17

21091eb

mikemorris added a commit that referenced this issue Mar 28, 2014

add boost libs to binding.gyp, remove cerr logging, ref #17

8bbd282

mikemorris self-assigned this Apr 23, 2014

mikemorris closed this as completed Jun 2, 2014

yhahn mentioned this issue Jun 2, 2014

Backlog #16

Closed

10 tasks

kkaefer mentioned this issue Aug 28, 2016

Complex Text Rendering mapbox/DEPRECATED-mapbox-gl#4

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Base font glyph + uncommon glyph system #17

Base font glyph + uncommon glyph system #17

yhahn commented Feb 4, 2014

kkaefer commented Feb 4, 2014

yhahn commented Feb 4, 2014

yhahn commented Feb 5, 2014

yhahn commented Feb 6, 2014

yhahn commented Feb 6, 2014

kkaefer commented Feb 6, 2014

kkaefer commented Feb 6, 2014

yhahn commented Feb 6, 2014

yhahn commented Feb 7, 2014

mikemorris commented Feb 18, 2014

mikemorris commented Feb 18, 2014

mikemorris commented Feb 20, 2014

kkaefer commented Feb 21, 2014

mikemorris commented Feb 21, 2014

mikemorris commented Feb 25, 2014

mikemorris commented Feb 25, 2014

kkaefer commented Feb 25, 2014

mikemorris commented Feb 25, 2014

mikemorris commented Feb 28, 2014

mikemorris commented Feb 28, 2014

mikemorris commented Feb 28, 2014

mikemorris commented Mar 7, 2014

springmeyer commented Mar 8, 2014

mikemorris commented Mar 10, 2014

springmeyer commented Mar 10, 2014

mikemorris commented Mar 18, 2014

mikemorris commented Mar 18, 2014

kkaefer commented Mar 19, 2014

mikemorris commented Mar 19, 2014

mikemorris commented Jun 2, 2014

Base font glyph + uncommon glyph system #17

Base font glyph + uncommon glyph system #17

Comments

yhahn commented Feb 4, 2014

kkaefer commented Feb 4, 2014

yhahn commented Feb 4, 2014

yhahn commented Feb 5, 2014

Next actions for me

yhahn commented Feb 6, 2014

Rough cost of each glyph

Next up

yhahn commented Feb 6, 2014

kkaefer commented Feb 6, 2014

kkaefer commented Feb 6, 2014

yhahn commented Feb 6, 2014

Request overhead

Next actions

yhahn commented Feb 7, 2014

mikemorris commented Feb 18, 2014

mikemorris commented Feb 18, 2014

mikemorris commented Feb 20, 2014

kkaefer commented Feb 21, 2014

mikemorris commented Feb 21, 2014

mikemorris commented Feb 25, 2014

mikemorris commented Feb 25, 2014

kkaefer commented Feb 25, 2014

mikemorris commented Feb 25, 2014

mikemorris commented Feb 28, 2014

mikemorris commented Feb 28, 2014

mikemorris commented Feb 28, 2014

mikemorris commented Mar 7, 2014

springmeyer commented Mar 8, 2014

mikemorris commented Mar 10, 2014

springmeyer commented Mar 10, 2014

mikemorris commented Mar 18, 2014

mikemorris commented Mar 18, 2014

kkaefer commented Mar 19, 2014

mikemorris commented Mar 19, 2014

mikemorris commented Jun 2, 2014