Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

glyphs/case mapping between caps and lowercases #3230

Closed
RosaWagner opened this issue Apr 1, 2021 · 16 comments
Closed

glyphs/case mapping between caps and lowercases #3230

RosaWagner opened this issue Apr 1, 2021 · 16 comments
Assignees
Labels
GF's priority list List of high priority issues for google/fonts CI New check proposal We expect new check proposals to include a detailed rationale description and a suggested check-id P0 Urgent Severity 5 (Highest) Font problems that must be addressed urgently!
Milestone

Comments

@RosaWagner
Copy link
Contributor

For example, a font I am trying to onboard has Ydieresis, but not ydieresis.
We need a case mapping check cause if someone capitalises ÿ… then there won't be any Ÿ.
We could have the same with small caps too.

There will be few exception like Dz, which doesn't have a lowercase relative.

@RosaWagner RosaWagner added the New check proposal We expect new check proposals to include a detailed rationale description and a suggested check-id label Apr 1, 2021
@davelab6 davelab6 added this to the 0.7.35 milestone Apr 2, 2021
@davelab6
Copy link
Contributor

davelab6 commented Apr 2, 2021

Agreed, this is EXTREMELY important, and should be quite simple to implement.

@felipesanches
Copy link
Collaborator

@RosaWagner mentioned "very few exceptions". Can we come up with a list of them all?

@chrissimpkins
Copy link
Member

chrissimpkins commented Apr 2, 2021

Can we come up with a list of them all?

It looks like maybe this Unicode chart is a start? https://www.unicode.org/charts/case/chart_NoCaseMapping.html

Defined as the following:

If characters have a decomposition containing a cased character, but do not have a case mapping (lower, title, upper, or fold), then they are listed in NoCaseMapping.

Also relevant from the Unicode case mapping docs:

There are a number of complications to case mappings that occur once the repertoire of characters is expanded beyond ASCII.

  • In most cases, the titlecase is the same as the uppercase, but not always. For example, the titlecase of U+01F1 "DZ" capital dz is U+01F2 "Dz" capital d with small z.
  • Case mappings may produce strings of different length than the original.
    For example, the German character U+00DF "ß" small letter sharp s expands when uppercased to the sequence of two characters "SS". This also occurs where there is no precomposed character corresponding to a case mapping, such as with U+0149 "ʼn" latin small letter n preceded by apostrophe.
  • There are some characters that require special handling, such as U+0345 combining iota subscript.
  • Characters may also have different case mappings, depending on the context.
    For example, U+03A3 "Σ" capital sigma lowercases to U+03C3 "σ" small sigma if it is followed by another letter, but lowercases to U+03C2 "ς" small final sigma if it is not.
  • Characters may have case mappings that depend on the locale.
    For example, in Turkish the letter U+0049 "I" capital letter i lowercases to U+0131 "ı" small dotless i.

Since many characters are really caseless (most of the IPA block, for example) and have no matching uppercase, the process of uppercasing a string does not mean that it will no longer contain any lowercase letters.

It might be possible to pull these data out of the ICU lib using something like Cased or Changes_When_* properties?

@chrissimpkins
Copy link
Member

chrissimpkins commented Apr 2, 2021

The Python str.islower(), str.isupper(), and str.istitle() have interesting definitions that involve the presence of Unicode case mapping definitions.

>>> case_str = "A"
>>> nocase_str = "1"
>>> case_str.islower()
False
>>> case_str.isupper()
True
>>> case_str.istitle()
True
>>> nocase_str.islower()
False
>>> nocase_str.isupper()
False
>>> nocase_str.istitle()
False

@felipesanches felipesanches modified the milestones: 0.7.35, 0.7.36 May 12, 2021
@felipesanches felipesanches modified the milestones: 0.7.37, 0.7.x May 20, 2021
@felipesanches felipesanches modified the milestones: 0.7.x, 0.8.x series Jul 14, 2021
@RosaWagner RosaWagner added the Severity 5 (Highest) Font problems that must be addressed urgently! label Sep 13, 2021
@RosaWagner
Copy link
Contributor Author

To what @chrissimpkins mentioned I would add another exception:
uni0237 (j dotless), doesn't have a capital counter part either.

  • For the smallcaps mapping, it has to have the same mapping as uppercases logically
  • IMO severy=10 / FAIL, cause if a font shows tofu in caps but not in lowercase then it can be considered broken.
  • Would it be a problem to have this check implemented prior to have an exhaustive list of exceptions? Cause it is really hard to check case mapping with human eyes, and the exception list could be completed when something comes up?

@felipesanches felipesanches modified the milestones: 0.8.x series, 0.8.4 Oct 14, 2021
@felipesanches felipesanches modified the milestones: 0.8.4, 0.8.8 Nov 19, 2021
@felipesanches felipesanches modified the milestones: 0.8.8, 0.8.9 Mar 14, 2022
@felipesanches felipesanches modified the milestones: 0.8.9, 0.8.11 Jun 10, 2022
@felipesanches felipesanches modified the milestones: 0.8.11, 0.8.12 Aug 19, 2022
@felipesanches felipesanches modified the milestones: 0.8.12, 0.8.14 Jun 2, 2023
@RosaWagner RosaWagner added the GF's priority list List of high priority issues for google/fonts CI label Jun 14, 2023
@felipesanches felipesanches modified the milestones: 0.10.9, 0.10.10 Jan 12, 2024
@felipesanches
Copy link
Collaborator

I suspect we won't need an exception list. The python unicode methods seem to be enough for the task.

@felipesanches
Copy link
Collaborator

Also, I think this check is sufficiently generic to be included as a FAIL in the Universal profile.

@felipesanches
Copy link
Collaborator

The screenshot below has two different renderings for the results in my initial implementation:

  • a bullet-list
  • a table

I think I'll use the table, as it seems more readable, and delete the bullet-list.

Screenshot from 2024-01-18 18-57-35

@felipesanches
Copy link
Collaborator

Here's how it will look on a markdown report:

🔥 FAIL: Ensure the font supports case swapping for all its glyphs. (com.google.fonts/check/case_mapping)

Ensure that no glyph lacks its corresponding upper or lower counterpart (but only when unicode supports case-mapping).

  • 🔥 FAIL The following glyphs lack their case-swapping counterparts:
Glyph present in the font Missing case-swapping counterpart
U+00B5: MICRO SIGN U+039C: GREEK CAPITAL LETTER MU
U+0192: LATIN SMALL LETTER F WITH HOOK U+0191: LATIN CAPITAL LETTER F WITH HOOK
U+0394: GREEK CAPITAL LETTER DELTA U+03B4: GREEK SMALL LETTER DELTA
U+03A3: GREEK CAPITAL LETTER SIGMA U+03C3: GREEK SMALL LETTER SIGMA
U+03C0: GREEK SMALL LETTER PI U+03A0: GREEK CAPITAL LETTER PI
U+2126: OHM SIGN U+03C9: GREEK SMALL LETTER OMEGA
U+24CA: CIRCLED LATIN CAPITAL LETTER U U+24E4: CIRCLED LATIN SMALL LETTER U

[code: missing-case-counterparts]

@felipesanches
Copy link
Collaborator

Unfortunately, the Google Fonts library is mostly in bad shape regarding this new check:

Screenshot from 2024-01-18 18-40-01

@felipesanches
Copy link
Collaborator

Here's where perhaps we could see if we want to add exceptions. But for that I think we would need some statistics on which are the most common missing case-mapping counterparts. I'll try to come up with the numbers.

@felipesanches
Copy link
Collaborator

felipesanches commented Jan 19, 2024

These are the most common occurrences on the Google Fonts library (the first number indicates how many times fontbakery detected that specific missing case-mapping counterpart):

  • 2281 - U+0192: ƒ - Latin Small Letter F with Hook
  • 2263 - U+00B5: µ - Micro Sign
  • 1612 - U+03C0: π - Greek Small Letter Pi
  • 1272 - U+2126: Ω - Ohm Sign - _ - U+
  • 1162 - U+03BC: μ - Greek Small Letter Mu
  • 970 - U+03A9: Ω - Greek Capital Letter Omega
  • 912 - U+0394: Δ - Greek Capital Letter Delta
  • 407 - U+0251: ɑ - Latin Small Letter Alpha
  • 245 - U+0261: ɡ - Latin Small Letter Script G
  • 167 - U+00FF: ÿ - Latin Small Letter Y with Diaeresis
  • 158 - U+0250: ɐ - Latin Small Letter Turned A
  • 150 - U+025C: ɜ - Latin Small Letter Reversed Open E
  • 149 - U+0252: ɒ - Latin Small Letter Turned Alpha
  • 146 - U+0271: ɱ - Latin Small Letter M with Hook
  • 146 - U+0282: ʂ - Latin Small Letter S with Hook
  • 141 - U+029E: ʞ - Latin Small Letter Turned K
  • 136 - U+0287: ʇ - Latin Small Letter Turned T
  • 134 - U+0127: ħ - Latin Small Letter H with Stroke
  • 132 - U+0140: ŀ - Latin Small Letter L with Middle Dot
  • 124 - U+023F: ȿ - Latin Small Letter S with Swash Tail
  • 121 - U+0240: ɀ - Latin Small Letter Z with Swash Tail
  • 151 - U+026B: ɫ - Latin Small Letter L with Middle Tilde

@felipesanches
Copy link
Collaborator

If we list these as exceptions, then the situation improves a bit:
Screenshot from 2024-01-18 21-29-18

(note: I am running this agains all *-Regular.ttf on the full library, instead of all *.ttf, because that was eating up all RAM on my laptop - which sounds like a bug to investigate - but this gives us at least an overall idea of the state of the library)

felipesanches added a commit to felipesanches/fontbakery that referenced this issue Jan 19, 2024
Ensure that no glyph lacks its corresponding upper or lower counterpart (but only when unicode supports case-mapping).

com.google.fonts/check/case_mapping (EXPERIMENTAL)
Added to the Universal profile.
(issue fonttools#3230)
felipesanches added a commit to felipesanches/fontbakery that referenced this issue Jan 19, 2024
But we need to inspect them more carefully
(issue fonttools#3230)
felipesanches added a commit to felipesanches/fontbakery that referenced this issue Jan 19, 2024
Ensure that no glyph lacks its corresponding upper or lower counterpart (but only when unicode supports case-mapping).

com.google.fonts/check/case_mapping (EXPERIMENTAL)
Added to the Universal profile.
(issue fonttools#3230)
felipesanches added a commit to felipesanches/fontbakery that referenced this issue Jan 19, 2024
But we need to inspect them more carefully
(issue fonttools#3230)
felipesanches added a commit to felipesanches/fontbakery that referenced this issue Feb 1, 2024
Ensure that no glyph lacks its corresponding upper or lower counterpart (but only when unicode supports case-mapping).

com.google.fonts/check/case_mapping (EXPERIMENTAL)
Added to the Universal profile.

(issue fonttools#3230)
felipesanches added a commit that referenced this issue Feb 1, 2024
Ensure that no glyph lacks its corresponding upper or lower counterpart (but only when unicode supports case-mapping).

com.google.fonts/check/case_mapping (EXPERIMENTAL)
Added to the Universal profile.

(issue #3230)
@moyogo
Copy link
Contributor

moyogo commented Feb 23, 2024

@felipesanches

167 - U+00FF: ÿ - Latin Small Letter Y with Diaeresis

That one as an exception doesn’t make sense.
It’s not a symbol. It’s used in French or German names, sometimes in names of Hungarian origin.

The likely reason the uppercase Ÿ is missing in many fonts may be because ÿ is in the Latin Extended A block which most Latin fonts cover and the uppercase is in the Latin Extended B block which most fonts do not cover.

@moyogo
Copy link
Contributor

moyogo commented Feb 29, 2024

@felipesanches @simoncozens This should be reopened. The exceptions are inconsistent or should raise a WARN.

There are roughly orthographic characters, phonetic characters and historical characters. The orthographic, phonetic and historical sometimes overlap, for example the lowercase is phonetic and the uppercase is historical, or the lowercase is
phonetic and both lowercase and uppercase are orthographic.

For example:

  • ÿ 00FF is used in French and German names, Ÿ 0178 should be present.
  • ß 00DF is used in German and ẞ 1E9E is an alternate uppercase to SS.
  • ᶎ 1D8E ꞔ A794 (not currently exceptions) are phonetic symbols, the case-pairs with Ᶎ A7C6 Ꞔ A7C4 are historical (proposed Hanyu Pinyin used in a few documents) but ȿ 023F ɀ 0240 (currently exceptions) are historical phonetic symbols, the caise-paris are historical-orthographic.
  • ↄ 2184 and Ↄ 2183 are historical (not currently exceptions)
  • ɥ 0265 ɦ 0266 ɪ 026A ɬ 026C ʝ 029D are currently not exceptions but ƒ 0192 ɑ 0251 ɐ 0250 ɱ 0271 ħ 0127 ɫ 026B are exceptions, both sets are phonetic or other kind of symbols with case pairs used in orthographies.

The fontbakery check should likely check if a case-pair is orthographic (for example reported by shaperglot as such), then either FAIL or at least WARN. For the FAIL there could be some heuristic like whether the character is decomposable with unicodedata.normalize("NFD", char).

@felipesanches felipesanches reopened this Feb 29, 2024
felipesanches added a commit that referenced this issue Apr 12, 2024
After being marked as **experimental** for 9 weeks since the v0.11.1 release, these checks are now made effective.
For more details, see their previous entries on the changelog.

Made effective on the Open Type profile
  - * **com.typenetwork/check/varfont/ital_range** (PR #4402)
  - * **com.google.fonts/check/varfont/family_axis_ranges** (issue #4554)

Made effective on the Universal profile
  - * **com.google.fonts/check/tabular_kerning** (issue #4440)
  - * **com.google.fonts/check/case_mapping** (issue #3230)
felipesanches added a commit that referenced this issue Apr 12, 2024
After being marked as **experimental** for 9 weeks since the v0.11.1 release, these checks are now made effective.
For more details, see their previous entries on the changelog.

Made effective on the Open Type profile
  - * **com.typenetwork/check/varfont/ital_range** (PR #4402)
  - * **com.google.fonts/check/varfont/family_axis_ranges** (issue #4554)

Made effective on the Universal profile
  - * **com.google.fonts/check/tabular_kerning** (issue #4440)
  - * **com.google.fonts/check/case_mapping** (issue #3230)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GF's priority list List of high priority issues for google/fonts CI New check proposal We expect new check proposals to include a detailed rationale description and a suggested check-id P0 Urgent Severity 5 (Highest) Font problems that must be addressed urgently!
Projects
None yet
Development

No branches or pull requests

6 participants