-
Notifications
You must be signed in to change notification settings - Fork 540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(synced-lyrics): romanization #2790
base: master
Are you sure you want to change the base?
feat(synced-lyrics): romanization #2790
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
This brings an issue to the surface, should I run only one romanizer and not all of them? If I run the pinyin romanization first, it will affect japanese lyrics, but if I run the japanese romanization first, it will affect chinese lyrics..... I'll decide what to do about this later, for now I'll ignore this issue |
Ok, for now I ended up with the following solution to the problem described earlier:
That fixed the issues I encountered in chinese and japanese lyrics, unfortunately, if the lyrics are entirely in kanji, I can't do much about it 😔 More creative solutions are always welcome! |
You can use lingua for language detection https://github.com/pemistahl/lingua-js or |
That sounds a little bit overkill, and honestly, I doubt it would work unless it knows the difference between Chinese and Japanese grammar, because a lot of kanji characters are 1:1 the same as Chinese characters |
If a sentence contains shinjitai (新字体), it can be considered Japanese (although there are cases where it uses kanji (旧字体), but in those cases I think it's mostly distinguishable by the presence of hiragana and katakana) |
Also, if a sentence contains Korean hanja (한자), it can be considered Korean. |
Hmm, I think it should be safe to assume that if at least one line contains katakana/hiragana, then we are dealing with japanese for all lines. If no japanese/korean is detected anywhere, that's when I should use the pinyin library, I think that should work fine for most cases? I really don't want to add more dependencies than needed |
It should be much more reliable now, I doubt there are many japanese songs that are exclusively in kanji, if at least one line contains one character that is hiragana/katagana, then all lines are treated as japanese otherwise, it is treated as chinese |
@JellyBrick es-hangul doesn't have a method to test if a given text contains hangul, so I found some regex to do the job |
@kimjammer could you send a list of 2-3 songs that have lyrics and contain chinese+korean+japanese? |
By 3 languages I meant Japanese + English + Korean, but I think it would be good to have a list of mixed language songs for testing so: J, K, JE, KE: LIT - STAYC I don't think any song has Chinese and Japanese on the same line I totally understand if you just draw the line somewhere, there's probably not too many of these super multilingual songs. |
@kimjammer can you confirm if this is correct? |
I pulled the pr to test it and I can confirm that the top 4 songs look right to me. |
Started working on romaji support!
TODO:
Preview:
Related discussion #2382
Blocked by #2788