The kata-spell-check.sh
tool is used to check a markdown file for
typographical (spelling) mistakes.
The spell check tool is based on
hunspell
. It uses standard Hunspell
English dictionaries and supplements these with a custom Hunspell dictionary.
The document is cleaned of several entities before the spell-check begins.
These entities include the following:
- URLs
- Email addresses
- Code blocks
- Most punctuation
- GitHub userids
A custom dictionary is required to accept specific words that are either well understood by the community or are defined in various document files, but do not appear in standard dictionaries. The custom dictionaries allow those words to be accepted as correct. The following lists common examples of such words:
- Abbreviations
- Acronyms
- Company names
- Product names
- Project names
- Technical terms
$ ./kata-spell-check.sh check /path/to/file
Note: If you have made local edits to the dictionaries, you may re-create the master dictionary files as documented in the Adding a new word section, in order for your local edits take effect.
Lists all available options and commands:
$ ./kata-spell-check.sh -h
A Hunspell dictionary comprises two text files:
-
A word list file
This file defines a list of words (one per line). The list includes optional references to one or more rules defined in the rules file as well as optional comments. Specify fixed words (e.g. company names) verbatim. Enter “normal” words in their root form.
The root form of a "normal" word is the simplest and shortest form of that word. For example, the following list of words are all formed from the root word "computer":
- Computers
- Computer’s
- Computing
- Computed
Each word in the previous list is an example of using the word "computer" to construct said word through a combination of applying the following manipulations:
- Remove one or more characters from the end of the word.
- Add a new ending.
Therefore, you list the root word "computer" in the word list file.
-
A rules file
This file defines named manipulations to apply to root words to form new words. For example, rules that make a root word plural.
The rules file and the the word list file for the custom dictionary generate
from "source" fragment files in the data
directory.
All the fragment files allow comments using the hash (#
) comment
symbol and all files contain a comment header explaining their content.
The *.txt
files are word list file fragments. Splitting the word list
into fragments makes updates easier and clearer as each fragment is a
grouping of related terms. The name of the file gives a clue as to the
contents but the comments at the top of each file provide further
detail.
Every line that does not start with a comment symbol contains a single word. An optional comment for a word may appear after the word and is separated from the word by whitespace followed by the comment symbol:
word # This is a comment explaining this particular word list entry.
You may suffix each word by a forward slash followed by one or more upper-case letters. Each letter refers to a rule name in the rules file:
word/AC # This word references the 'A' and 'C' rules.
The rules file contains a set of general rules that can be applied to one or more root words in the word list files. You can make comments in the rules file.
For an explanation of the format of this file see
man 5 hunspell
(source).
If you want to allow a new word to the dictionary,
-
Check to ensure you do need to add the word
Is the word valid and correct? If the word is a project, product, or company name, is the capitalization correct?
-
Add the new word to the appropriate word list fragment file.
Specifically, if it is a general word, add the root of the word to the appropriate fragment file.
-
Add a
/
suffix along with the letters for each rule to apply in order to add rules references.
It should not generally be necessary to update the rules file since it already contains rules for most scenarios. However, if you need to update the file, read the documentation carefully.
Every time you change the dictionary files you must recreate the master dictionary files:
$ ./kata-spell-check.sh make-dict
As a convenience, checking a file will automatically create the database.
You must test any changes to the word list file fragments or the rules file by doing the following:
-
Recreate the master dictionary files.
-
Run the spell checker on a file containing the words you have added to the dictionary.