Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the glyph of U+5DD5 巕 #169

Closed
tamcy opened this issue Feb 23, 2017 · 9 comments
Closed

Question about the glyph of U+5DD5 巕 #169

tamcy opened this issue Feb 23, 2017 · 9 comments
Assignees

Comments

@tamcy
Copy link

tamcy commented Feb 23, 2017

Below is the glyph U+5DD5 巕 in CN and TW version of SHS:

u 5dd5

As shown, the composition in CN glyph is ⿰山㜸-like (with 艹 replaced by 䒑), but for TW it is ⿰山孽.

As first glance I thought that either the CN or TW glyph must be wrong, since 㜸 and 孽 (or at least 子 and 女) components are obviously not unifiable. However, the form of CN glyph is equal to what is shown on the Unihan database, while the TW glyph is same as the listing on CNS11643 website. The TW glyph is also adopted by TW standard compliant fonts like 新細明體 (MingLiU) and Microsoft JhengHei. Interestingly, CNS11643 website lists "女" as the composing part of the character, although its composition is ⿰山孽 according to the standard. So it looks somewhat messy here.

巕 U+5DD5 corresponds to codepoint F6DD in Big5, so it seems to me that two characters are sharing the same codepoint in Unicode due to some historical reasons. I can find very limited information about these two codepoints here. The document claims that the form in MingLiU is "in error", but I'm not very sure what it really means. And from the same document I learnt that the form ⿰山孽 is now actually encoded in Unicode as U+21FD2, but due to the lack of information I don't know if this is a measure to "right the wrong", so the ⿰山㜸-like form should be used for U+5DD5 for all locales including TW.

SHS doesn't include U+21FD2, and U+5DD5 is rendered as ⿰山㜸-like or ⿰山孽 depending on the language used. My question is, is this the intended behavior in SHS (and thus not a bug)? Thanks.

@kenlunde kenlunde self-assigned this Feb 24, 2017
@kenlunde
Copy link
Contributor

This is indeed messy. The representative glyph changed on two fronts.

The first front is Big Five and CNS 11643. Big Five and the 1986 and 1992 versions of CNS 11643 use a representative glyph that looks like the CN form shown above: ⿰山㜸 (the upper-right component is not Radical 140, but rather a non-penetrating three-stroke component like used in the CN form shown above). The 2007 version of CNS 11643 uses a representative glyph that looks like the TW form shown above: ⿰山孽.

Also, the Taiwan MOE glyph standards include this character as serial number 200911, and its glyph matches the structure of the TW glyph shown above.

The second front is Unicode (and ISO/IEC 10646). Unicode up through Version 5.1 use a representative glyph that looks like the CN form shown above, which matches Big Five and the first two versions of CNS 11643. Version 5.2 (10/2009) introduced the multi-column code chart for the URO, and this is when the T-Source glyph started to look like the TW form shown above.

The fact that the TW form of U+5DD5 is unifiable with U+21FD2 makes this even more messy.

Lastly, please note that the "GE" sources are fill-in characters for the G-Source in order to provide complete coverage of the URO, initially for GBK, which morphed into GB 18030. In other words, the GE representative glyphs cannot be trusted, and for the most part are historical.

I will bring this up for discussion on the Unihan Mailing List, and will point them to this issue for the background.

@hfhchan
Copy link

hfhchan commented Feb 24, 2017

@kenlunde this was discussed on Unihan before and no consensus was made. But given the amount of other places where CNS11643 has changed their glyph, the CNS11643 standard is undermining the stability of Unicode. Previously IRG has ordered a pair restored (礴礡); I agree with this approach.

@kenlunde
Copy link
Contributor

Because the current T-Source source reference of U+5DD5 corresponds to a CNS 11643 Plane 2 character, meaning that it is of higher frequency than its higher planes, TCA will be (understandably) reluctant to move the T-Source source reference to U+21FD2.

@tamcy
Copy link
Author

tamcy commented Feb 25, 2017

Did some checking and here's how Big5's F6DD looks like in legacy systems:

b5-f6dd

So Chinese systems in early days were using the form ⿰山㜸 until Windows 98.

@kenlunde
Copy link
Contributor

According to the CNS 11643 website, U+21FD2 𡿒 corresponds to CNS 11643 Plane 10 (0xA) 0x3E79.

This reminds me of the 2016-09-13 CJK Type Blog article in that we're not likely to find the answer staring at these standards. The answer will be elsewhere, such as in dictionaries or other references that do not correspond to standards. At some point, in the late 1990s, Taiwan changed the glyph for this character, and we're simply unsure whether it was intentional (and if so, why) or an error.

@hfhchan
Copy link

hfhchan commented Feb 27, 2017

Should I raise this up in the next IRG meeting document?

@kenlunde
Copy link
Contributor

@hfhchan: Yes, please.

@kenlunde
Copy link
Contributor

We will change the lower-right component of the TW glyph for U+5DD5 巕, uni5DD5-TW, from 子 to 女. I am in the process of opening new consolidated issues for the Version 2.000 update, and this will be included.

@kenlunde
Copy link
Contributor

Consolidated with Issue #178.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants