Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Common misspellings are not included #124

Closed
aMiss-aWry opened this issue Jun 8, 2022 · 2 comments · Fixed by #128
Closed

Common misspellings are not included #124

aMiss-aWry opened this issue Jun 8, 2022 · 2 comments · Fixed by #128

Comments

@aMiss-aWry
Copy link

I understand the word frequency method doesn't do so well with common misspellings (since it is likely the source data is contaminated with common typos) but is there any way to add to a 'blacklist' of common misspellings, easily sourced from: https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines

ex. taht tiem are both not considered misspellings by pyspellchecker.

@aMiss-aWry
Copy link
Author

I found a workaround which was pretty straightforward - making wikipedia's list of common misspellings into a dictionary and checking through that afterwards. It would be nice if it was incorporated into pyspellchecker itself, though.

@barrust
Copy link
Owner

barrust commented Jun 13, 2022

There is a way to fix these issues in future builds of the dictionary. Words added to scripts/data/{lang}_exclude.txt will remove those words from the next build of the dictionaries.

As always, PR's or code to generate the list of common typos to add to this file is always welcome. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants