Building and Using a Custom Dictionary #90
Replies: 5 comments 7 replies
-
Hello! I have couple of questions:
{
"word1": 123,
"word2": 321,
}
spell = SpellChecker(language='ru)
spell.load_dictionary("my_custom_dict.json")
|
Beta Was this translation helpful? Give feedback.
-
hello |
Beta Was this translation helpful? Give feedback.
-
how can i create dictionary not with just sequence of words but in the format shown below when i give any of the mispelled words it has to display the correct word. |
Beta Was this translation helpful? Give feedback.
-
This is very helpful. I need to set Custom Dictionary as priority over the default language dictionary. |
Beta Was this translation helpful? Give feedback.
-
Hello! I got a question. We plan to use this package to build a dictionary for the Filipino language. I want to test the accuracy of the dictionary. How do I go about getting the numbers? |
Beta Was this translation helpful? Give feedback.
-
Building and Using a Custom Dictionary
Building a custom dictionary is a great way to utilize the
pyspellchecker
library. This thread is based on issues that I have run across multiple times and I figured I would put a summary of all the information in a single location. This thread can be updated and added to as needed!Building a custom dictionary is more preference than anything. Using external data sources can lead to misspellings to unexpectedly occur. This is even the case with the default dictionaries as they are built using the open source subtitles projects. That being said, here are a few tips.
There are several methods to load data into a dictionary:
Using one of the above function you can now build your custom dictionary.
Stand alone dictionary
This is a new language or person specific dictionary that is not built on a previous language. Make sure to not load a dictionary before building the new dictionary!
Adding to a default dictionary
In this case, it is recommended to act like it is a stand alone dictionary and set the language parameter to
None
and load the default dictionary before loading the custom dictionary. This will allow the user to get any future changes to the default dictionary.Modifying a default dictionary
If you are planning to modify a default dictionary for personal use (not to be sent back to the library) then you will want to load the correct dictionary, modify and export it but then do not load the original dictionary again when using the updated, custom dictionary.
Once the data is originally loaded, run some tests to ensure the data is as correct and as expected. Check for commonly misspelled words in your dictionary to make sure they are not present and also check for known words.
Once the data is set, the data can then be exported to be used in the future.
Using the Custom dictionary
The first major thing that trips up users is loading in the custom dictionary when another dictionary is already loaded. Thus, the user is getting a default dictionary and their custom dictionary.
For example:
Unless your custom dictionary is to augment the base dictionary, remember to setup the
SpellChecker
object without a language. There are two methods to easily accomplish this:The two options above are identical! If a filename, local_dictionary, is provided to the original creation, then the default languages are not loaded.
Adding a New Supported Language
It is possible to provide a new language to
pyspellchecker
. The goal is that the building of dictionaries to be packaged with the library are able to be re-created as needed. To do this, you will need to update the scripts/build_dictionary.py script to support the language. Once that is updated, including all the necessary information, a new dictionary can be created and updated. It is the preferred method instead of submitting a PR with a new dictionary that cannot be re-created.Issues Creating New Dictionaries
If you are experiencing issues creating a new, custom dictionary, then please submit an issue. A good thing to provide will be the code used to verify the steps included in this discussion. You may be pointed here for "guidance" if the issue is not clear.
If you would like to share your experiences and tips, please add to the discussion!
Thanks!!
@barrust
Beta Was this translation helpful? Give feedback.
All reactions