-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dictionary inconsistency #115
Comments
The data used to build the dictionaries are pulled from the opensubtitles project. The build process is automated using script/build_dictionaries.py. I don't know of a good method to automatically find and flag each of these "cross-overs" but any help in making the build_dictionaries.py script more robust would be appreciated. |
There are almost 1.5 million entries in the English dictionary. It's clear that there are far too many. But it's not just French, Spanish, and German. For example: "開かれた": 1, |
Python 3.9.5
Windows 10 x64
Expected Behavior
Each language setting only contains itself. Words written in a different language, deliberately or by mistake, are unknown.
Observed Behavior
Each language appears to contain itself plus one or more additional language(s).
en contains words from English as expected, but also from Spanish, French, and German.
es contains words from Spanish as expected, but also from English and French.
fr contains words from French as expected, but also from English.
pt contains words from Portuguese as expected, but also from English.
de contains words from German as expected, but also from English.
Impact
Typos in the selected language are undetectable if they incidentally match one of the extra languages. (see console output below)
Steps to Reproduce
spellCheckerTest.py
console output
The text was updated successfully, but these errors were encountered: