Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent results (Korean) #18

Open
MyronChiu opened this issue Oct 29, 2013 · 2 comments
Open

Inconsistent results (Korean) #18

MyronChiu opened this issue Oct 29, 2013 · 2 comments

Comments

@MyronChiu
Copy link

require 'whatlanguage'
=> true
"펀 치 히 어 로".language
=> :spanish
"이 어떤 언어인가?".language
=> :korean
"이 어떤 언어인가 히 어 로?".language
=> :korean
"이 히 어 로?".language
=> :russian
"펀치히어로".language
=> :italian
"한국드라마".language
=> :russian
"한 국 드 라 마".language
=> :korean

@peterc
Copy link
Owner

peterc commented Oct 29, 2013

Unfortunately this isn't an issue we're going to be able to fix in the current version as due to the technique it uses, it provides a very suboptimal experience with languages or orthographies that either don't use spaces between words (Arabic, Chinese, Japanese) or whose words are all relatively short (Korean).

The next version of WhatLanguage which is currently in planning will resolve this issue entirely as one of the detection techniques is character analysis which would be able to identify most of the graphemes above as being Korean (except the question marks).

I would leave this issue open however as it's an important point for other people to know about and for us to remember for future releases. Thanks!

@sixtyfive
Copy link

Peter, with all due respect to the work you're putting into this, that's a ridiculous argument. Identifying what language or languages are represented by a character set is more than trivial and it would be easy to use that at least as a sanity check. I don't understand what's preventing you from doing so?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants