Extended Latin and Viet subsets missing many characters #6

jvgaultney · 2023-02-02T17:33:31Z

This is the fourth place I've submitted this issue in the last few months, as there is still no progress. See also google/fonts#5385 google/fonts#3756 googlefonts/lang#30

A large number of extended Latin and Vietnamese characters are not displaying properly. These characters are being displayed with fallback fonts even if the characters are supported in the fonts.

In the following screenshots LPR = local path-referenced font, GF = Google Font with subset=latin-ext,cyrillic-ext,vietnamese, FLO = our own internal font server. Screen shots are from current Chrome on Win 10.

Three specific examples:

Vietnamese text properly renders the Vietnamese diacritic forms when lang='vi' is set. However certain combinations with dot below are using fallback fonts.
Character string in example: Ấấ Ầầ Ẩẩ Ẫẫ Ắắ Ằằ Ẳẳ Ẵẵ Ếế Ềề Ểể Ễễ Ốố Ồồ Ổổ Ỗỗ Phải áp dụng chế độ giáo dục miễn phí, ít nhất là ở bậc tiểu học và giáo dục cơ sở

Extended Latin does not seem to include some important diacritics, such as U+0329, and again fallback fonts are used. Example from Yoruba language UDHR.
Character string in example: E̩nì kò̩ò̩kan ló ní è̩tó̩ láti kó̩ è̩kó̩. Ó kéré tán, è̩kó̩ gbo̩dò̩ jé̩ ò̩fé̩ ní àwo̩n ilé‐è̩kó̩ alákò̩ó̩bè̩rè̩. E̩kó̩ ní ilé‐è̩kó̩ alákò̩ó̩bè̩rè̩ yìí sì gbo̩dò̩ jé̩ dandan. A gbo̩dò̩ pèsè è̩kó̩ is̩é̩‐o̩wó̩, àti ti ìmò̩‐è̩ro̩ fún àwo̩n ènìyàn lápapò̩. Àn fàní tó dó̩gba ní ilé‐è̩kó̩ gíga gbo̩dò̩ wà ní àró̩wó̩tó gbogbo e̩ni tó bá tó̩ sí.

Many common diacritics, like ogonek, are not displaying properly
Character string in example: ọ o̧ ǫ ô o˞ o̝̠̣ ô͑ n f i fi f l fl ˥ ˦ ˧ ˨ ˩ ˥˥ ˥˦ ˥˧ ˥˨ ˥˩ ˥˨˥ ˥˨˦ ˥˨˧ ˥˨˨ ˥˨˩

The text was updated successfully, but these errors were encountered:

simoncozens · 2023-03-13T13:31:53Z

Very few of the U+03XX combining marks appear in any of the Google Fonts glyphsets, so they will all be stripped out of fonts served via GF. We could make piecemeal PRs adding combining marks into the Latin and Vietnamese and extended Latin and whatever various other script sets use them, but it feels really yucky; it's clearly symptomatic of a larger problem. However, the engineering team sees a lot of benefit in subsetting fonts, so I'm not sure how to solve that larger problem.

simoncozens · 2023-03-13T13:38:10Z

(See also #7. There are a huge number of fonts on GF which offer these combining marks, but they can't be used.)

jvgaultney · 2023-03-14T23:35:18Z

Well that's a non-answer. We know it's not working, and that the combining marks are not getting included, and that it's one symptom of a larger systemic problem with GF.

However we just need something that works, even if it feels yucky to you. Even if only the more common combining diacritics were added it would make GF useful for many more languages. The lack of basic Vietnamese support is really embarrassing, when the fix is trivial.

thlinard · 2023-04-28T13:58:55Z

This is a screenshot of Roboto on https://fonts.google.com/specimen/Roboto?subset=vietnamese&noto.script=Latn (sample in Vietnamese):

Same situation for every font with Vietnamese support (.notdef displayed for ịửỡ in standard sample text).

garretrieger · 2023-04-28T20:34:03Z

FYI I made an update for this issue in googlefonts/glyphsets#102. Since this affects many families it may take a bit to get the fix rolled out to each family. For now I've already updated Noto Sans, Andika, Charissil, and Gentium Plus with the fixed subset definitions.

thlinard · 2023-06-19T10:15:29Z

FYI I made an update for this issue in googlefonts/glyphsets#102. Since this affects many families it may take a bit to get the fix rolled out to each family. For now I've already updated Noto Sans, Andika, Charissil, and Gentium Plus with the fixed subset definitions.

Hi @garretrieger

The fix is incomplete:

Example with Andika, from the API:

Andika downloaded and displayed on desktop:

Displaying other fonts is still problematic:

garretrieger · 2023-06-19T22:42:31Z

We had to partially rollback some of the fixes due to google/fonts#6245. The problem is that the combining marks are present in the latin, latin extended, and vietnamese subsets. Selecting the subset to load/use for a particular occurrence of a combining mark is up to the browser and sometimes it doesn't use the right one.

We're experimenting with different subset definitions + unicode range setups to try and find something that works for all cases, but this is difficult. You end up fixing one case, but causing another to break.

I'm currently working on assembling a test suite that tries to cover as many of the different cases as possible. So we can evaluate potential fixes to make sure we don't regress anything.

Could you provide the specific codepoint sequences you used for the above iuo case? I'll add it to the test suite.

For Roboto, we haven't pushed updated subset definitions yet and likely won't until it's upgraded to the variable version. Unfortunately the way the layout rules are set up on the static version of Roboto causes it's subset sizes to massively increase in size when introducing the additional combining marks. This issue has been fixed in the upcoming variable version of the font.

thlinard · 2023-06-19T23:28:43Z

Thanks for the information.

For the sequences, I simply copied the problematic characters in the sample text from "Select preview text > Asia > Vietnamese", i.e.:

ị (0069 LATIN SMALL LETTER I + 0323 COMBINING DOT BELOW)
ĩ (0069 LATIN SMALL LETTER I + 0303 COMBINING TILDE)
ỉ (0069 LATIN SMALL LETTER I + 0309 COMBINING HOOK ABOVE)
ắ (0103 LATIN SMALL LETTER A WITH BREVE + 0301 COMBINING ACUTE ACCENT)
ẫ ‎(00E2 LATIN SMALL LETTER A WITH CIRCUMFLEX + 0303 COMBINING TILDE)
ụ (0075 LATIN SMALL LETTER U + 0323 COMBINING DOT BELOW)
ử (01B0 LATIN SMALL LETTER U WITH HORN + ‎0309 COMBINING HOOK ABOVE)

Results vary from font to font. For example, on Lora, a VF, the results are good in Italic, bad in Roman:

garretrieger · 2023-06-20T00:00:25Z

I've been trying to reproduce your Andika example and haven't been able to: https://codepen.io/garretrieger/pen/XWyKaZq

What browser are you using?

garretrieger · 2023-06-20T00:05:01Z

This is what I get for that example:

moyogo · 2023-06-20T05:07:08Z

@garretrieger U+031B is used in ử (0075 031B 0309) but it is not in the vietnamese set in https://fonts.googleapis.com/css?family=Andika. Chrome shows the example correctly but Safari and Firefox do not.

Firefox:

Safari:

There also seem to be others missing: googlefonts/glyphsets#110 (comment)

thlinard · 2023-06-20T07:12:44Z

What browser are you using?

Firefox 114.0.1 on macOS 13.4.

simoncozens mentioned this issue Mar 15, 2023

[nam] Add combining marks to Latin, ext Latin and Vietnamese googlefonts/glyphsets#102

Closed

felipesanches mentioned this issue Mar 16, 2023

New check: Give us advance warning of glyphs which would be inaccessible due to subsetting fonttools/fontbakery#4097

Closed

moyogo mentioned this issue Jun 20, 2023

[name] Add combining marks to the latin, latin ext, and vietnamese glyphsets. googlefonts/glyphsets#110

Merged

garretrieger mentioned this issue Aug 16, 2023

Incremental Font Transfer: Patch Subset w3ctag/design-reviews#849

Closed

1 task

davelab6 mentioned this issue Sep 25, 2023

Do all characters in all gf glyphsets latin lists exist in latin + latin-ext? googlefonts/glyphsets#133

Open

simoncozens transferred this issue from googlefonts/glyphsets Jan 26, 2024

simoncozens mentioned this issue Jan 9, 2025

Add coverage for U+0326, U+0327, and U+0328 #23

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extended Latin and Viet subsets missing many characters #6

Extended Latin and Viet subsets missing many characters #6

jvgaultney commented Feb 2, 2023

simoncozens commented Mar 13, 2023

simoncozens commented Mar 13, 2023

jvgaultney commented Mar 14, 2023

thlinard commented Apr 28, 2023

garretrieger commented Apr 28, 2023

thlinard commented Jun 19, 2023 •

edited

Loading

garretrieger commented Jun 19, 2023 •

edited

Loading

thlinard commented Jun 19, 2023 •

edited

Loading

garretrieger commented Jun 20, 2023

garretrieger commented Jun 20, 2023

moyogo commented Jun 20, 2023

thlinard commented Jun 20, 2023

Extended Latin and Viet subsets missing many characters #6

Extended Latin and Viet subsets missing many characters #6

Comments

jvgaultney commented Feb 2, 2023

simoncozens commented Mar 13, 2023

simoncozens commented Mar 13, 2023

jvgaultney commented Mar 14, 2023

thlinard commented Apr 28, 2023

garretrieger commented Apr 28, 2023

thlinard commented Jun 19, 2023 • edited Loading

garretrieger commented Jun 19, 2023 • edited Loading

thlinard commented Jun 19, 2023 • edited Loading

garretrieger commented Jun 20, 2023

garretrieger commented Jun 20, 2023

moyogo commented Jun 20, 2023

thlinard commented Jun 20, 2023

thlinard commented Jun 19, 2023 •

edited

Loading

garretrieger commented Jun 19, 2023 •

edited

Loading

thlinard commented Jun 19, 2023 •

edited

Loading