-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better zh_TW and zh_CN conversion #18
Comments
Thank you very much for your kind words! Indeed, the current simplified to traditional converter doesn't handle cases where the single simplified char maps to multiple traditional chars. I've modified both the conversion dataset and the codebase. When tested on taibun dataset, the accuracy improved by 10% (2.17% higher than OpenCC's conversion), and currently it's 32% more efficient than OpenCC's conversion. I'll think about how to further boost efficiency and I plan to release the new version by week's end at the latest. I deeply appreciate your valuable feedback! |
Thank you Andrei! How do you measure efficiency, is it the execution time of the function? |
Yes, I measure the time it takes to convert all items in words.json from Simplified to Traditional. The converter I've developed is specifically designed to handle the conversion of characters exclusively found in words.json rather than all Chinese characters, so this accounts for its faster execution. |
@andreihar Thank you Andrei! I made a simple Gradio app to make it easier for non-technical people to use taibun here https://huggingface.co/spaces/tddschn/taibun-converter , do you think you can include it in your README? |
Sorry for the late reply! It seems GitHub doesn't notify about messages in closed issues. The live demo of Taibun can be currently accessed via this link: https://taibun.vercel.app/. I plan to change domains for all my web projects very soon, hence I don't have a link to it in the README. I hope I'll get to it in the near future. |
Your web app looks great!
…On Tue, May 21, 2024 at 9:47 PM Andrei Harbachov ***@***.***> wrote:
Sorry for the late reply! It seems GitHub doesn't notify about messages in
closed issues.
The live demo of Taibun can be currently accessed via this link:
https://taibun.vercel.app/. I plan to change domains for all my web
projects very soon, hence I don't have a link to it in the README. I hope
I'll get to it in the near future.
—
Reply to this email directly, view it on GitHub
<#18 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AK375IHWASC7IUFGKW34RW3ZDP2KDAVCNFSM6AAAAABHJZGB3SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRTG4ZDEOBRHE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
May I ask how you learned the language? My written and spoken Hokkien is
really bad and I'm curious to learn how people learn it.
…On Tue, May 21, 2024 at 9:47 PM Andrei Harbachov ***@***.***> wrote:
Sorry for the late reply! It seems GitHub doesn't notify about messages in
closed issues.
The live demo of Taibun can be currently accessed via this link:
https://taibun.vercel.app/. I plan to change domains for all my web
projects very soon, hence I don't have a link to it in the README. I hope
I'll get to it in the near future.
—
Reply to this email directly, view it on GitHub
<#18 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AK375IHWASC7IUFGKW34RW3ZDP2KDAVCNFSM6AAAAABHJZGB3SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRTG4ZDEOBRHE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
I currently live in the Metro Vancouver area, so I have quite a lot of Taiwanese friends. Besides that, the main grammar resource I use is Taiwanese Grammar: A Concise Reference by Philip T. Lin. It's written in English and explains many grammar points by comparing them with both English and Mandarin grammar, so it makes it very easy to understand the Taiwanese language. When it comes to Written Taiwanese, pretty much nobody knows it since in schools Taiwanese is taught primarily as a spoken language. When I ask my friends to translate something into Taiwanese, they will usually use iTaigi and the Taiwanese Ministry of Education Dictionary to find Chinese characters for Taiwanese words. |
Thank you Andrei! |
Thank you for making this! As a native Hokkien speaker I find it very professionally done.
However, when doing conversion between zh_TW and zh_* (
to_traditional
&to_simplified
), the context (the word and the sentence a char is in) should be considered, simple char-to-char mapping can be problematic in some cases.https://github.com/BYVoid/OpenCC This library seem to be better at handling the subtlety of conversion.
The text was updated successfully, but these errors were encountered: