Implements Traditional Chinese Characters to Pinyin Support #16

ujamer · 2020-09-11T12:58:02Z

Hey there,

Are you accepting PRs at this time? I've got a website here that requires Traditional Chinese support, and when I tried your plugin, a lot characters are lost on the slug output. So I went ahead and updated the dictionary file with Traditional Chinese characters. I generated the new dictionary file from the Unicode Consortium's Unihan database, with kMandarin Reading field as the pinyin keys. I've tested the new data set on my site, and it appears to work well.

The downside of this PR is that some of the characters fell under different pinyin keys, and it's probably because the source of the original data set used an alternate pronunciation compared to the Unihan database; thus the new slugs would give different pinyin for some characters compared to the old slugs. I suppose we can maintain "backwards compatibility" by searching for misplaced characters in the new dictionary; but I've found that most, if not all of these differences are context sensitive, and the correctness of the pinyin depends on how it's used in a particular phrase. Given that we don't have pinyin conversion on a phrase level here, and only on a character level, it's really difficult to gauge the correctness for either data set. Unicode.org does claim that kMandarin is the best pronunciation to use with Mandarin, so it's got that going for it.

There is also a ton of rarely used character in the Unihan data set, many of which my browser won't display at all, so the file size is about 5 times larger than before. I'm not sure about the performance impact of the extra character set, but from my limited testing, I don't feel much of a difference with my test strings.

Let me know if you have any concerns.

Generated a new dictionary file via the Unihan database, using the kMandarin Reading field as pinyin phonetic keys

senlin · 2020-09-14T08:17:08Z

Hi Yun-yu Shen (@ujamer),

Many thanks for your PR.

The file you updated indeed seems to work well for both Simplified as well as Traditional Chinese, so that certainly is a big improvement.

I do not care too much for the downside you mention, as if we update the plugin, it will only affect the slugs going forward, so the chance on conflicts is minimal and does not outweigh the advantage of now being able to use the plugin for both Traditional and Simplified Chinese.

What I do care about is the non-existent characters. Not only do they don't show in the browser, I also don't get them to show in my text-editor(s).

Would there be an issue removing all those you think?

ujamer · 2020-09-15T01:13:28Z

Hm. The only issue I can think of would be for those users who are using more complete sets of Chinese fonts, these characters would in fact render correctly on their machines, but the plugin would then fail to transform them into pinyin. I think this mainly occurs when the user is dealing with centuries old Chinese documents, poetry, and such.

I don't mind removing them, since it's unlikely that I'd personally hit that particular use case, but it's something to keep in mind.

senlin · 2020-09-15T08:19:10Z

I think this mainly occurs when the user is dealing with centuries old Chinese documents, poetry, and such.

Understood, let's take our chances on that.

I don't mind removing them

Yes, please do. After that I can merge your PR :)

Removed all unrendered characters that are missing from system fonts. (Tested on macOS Catalina)

ujamer · 2020-09-15T19:07:01Z

Done.

I had to remove the characters manually, because when I tried to filter them with apple's system fonts, (ie, Heiti and PingFang) my script ended up removing many rare characters that are actually rendered in Firefox. A bit of a head-scratcher on that one. I suppose this means that things might be different under Windows as well, depending on whats available in the system fonts. Let me know if I missed anything.

senlin · 2020-09-16T12:48:08Z

Many thanks for your initiative Yun-yu Shen!

ujamer · 2020-09-16T13:17:24Z

No problem. Thanks for the plugin!

@ujamer

* date: September 16, 2020 * dictionary update: UniHan database using the kMandarin Reading field as pinyin phonetic keys (https://unicode.org/charts/unihan.html); with many thanks to [Yun-yu Shen @ujamer](#16) * edits in readme files and on Settings page

@ujamer

* date: September 16, 2020 * dictionary update: UniHan database using the kMandarin Reading field as pinyin phonetic keys (https://unicode.org/charts/unihan.html); with many thanks to [Yun-yu Shen @ujamer](#16) * edits in readme files and on Settings page

Updated Dictionary.php

aca9c0c

Generated a new dictionary file via the Unihan database, using the kMandarin Reading field as pinyin phonetic keys

Removed rare and unrendered characters

56f5094

Removed all unrendered characters that are missing from system fonts. (Tested on macOS Catalina)

senlin merged commit 63faa87 into senlin:master Sep 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implements Traditional Chinese Characters to Pinyin Support #16

Implements Traditional Chinese Characters to Pinyin Support #16

ujamer commented Sep 11, 2020

senlin commented Sep 14, 2020

ujamer commented Sep 15, 2020

senlin commented Sep 15, 2020

ujamer commented Sep 15, 2020

senlin commented Sep 16, 2020

ujamer commented Sep 16, 2020

Implements Traditional Chinese Characters to Pinyin Support #16

Implements Traditional Chinese Characters to Pinyin Support #16

Conversation

ujamer commented Sep 11, 2020

senlin commented Sep 14, 2020

ujamer commented Sep 15, 2020

senlin commented Sep 15, 2020

ujamer commented Sep 15, 2020

senlin commented Sep 16, 2020

ujamer commented Sep 16, 2020