Inconsistent results (Korean) #18

MyronChiu · 2013-10-29T01:18:18Z

require 'whatlanguage'
=> true
"펀 치 히 어 로".language
=> :spanish
"이 어떤 언어인가?".language
=> :korean
"이 어떤 언어인가 히 어 로?".language
=> :korean
"이 히 어 로?".language
=> :russian
"펀치히어로".language
=> :italian
"한국드라마".language
=> :russian
"한 국 드 라 마".language
=> :korean

peterc · 2013-10-29T12:42:21Z

Unfortunately this isn't an issue we're going to be able to fix in the current version as due to the technique it uses, it provides a very suboptimal experience with languages or orthographies that either don't use spaces between words (Arabic, Chinese, Japanese) or whose words are all relatively short (Korean).

The next version of WhatLanguage which is currently in planning will resolve this issue entirely as one of the detection techniques is character analysis which would be able to identify most of the graphemes above as being Korean (except the question marks).

I would leave this issue open however as it's an important point for other people to know about and for us to remember for future releases. Thanks!

sixtyfive · 2017-11-11T16:28:33Z

Peter, with all due respect to the work you're putting into this, that's a ridiculous argument. Identifying what language or languages are represented by a character set is more than trivial and it would be easy to use that at least as a sanity check. I don't understand what's preventing you from doing so?

jm3 mentioned this issue Nov 9, 2017

Library doesn't seem to take character sets into account #43

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent results (Korean) #18

Inconsistent results (Korean) #18

MyronChiu commented Oct 29, 2013

peterc commented Oct 29, 2013

sixtyfive commented Nov 11, 2017

Inconsistent results (Korean) #18

Inconsistent results (Korean) #18

Comments

MyronChiu commented Oct 29, 2013

peterc commented Oct 29, 2013

sixtyfive commented Nov 11, 2017