-
Notifications
You must be signed in to change notification settings - Fork 122
Allow for pluggable spell-checking modelled after Linter #74
Comments
I started messing with this, mainly because I really want to use it for some of my upcoming projects that I'm stalling on because I don't want to use Emacs (it just isn't pretty enough anymore). Looking at some of the other items (#11 and #21 in specific), I can see two ways of doing this but I'm not entirely sure which one would make sense for the long run. In both cases, I'm suggesting making The reason I'm having the system dictionary as a separate package is because there are places where I think it is good not to have the system dictionary involved (some publisher/customers dictate which dictionary is allowed and you don't want pollution; translators might want to only see the language they are translating too). One Package Per Language The first approach is to create one Also, I'm proposing multiple packages to handle the other word sources ( The drawback of this approach is that it would be harder to use system-specific dictionaries unless you also added a This also would let us allow adding words easier. With separate packages, I think it would let us have some dictionaries that add words and others that's don't (package-based ones) and have the provider give flags for those (canAddWords). One Package for Spellchecker The second approach is to have a single package It would just be more complicated code to maintain and may add complexity to everything else. Considerations One of the goals is to let someone turn off a dictionary for a given project. So, if I have the German, English, and fantasy dictionaries involved, I want to be able to turn off any or all of them depending on the needs of the project. I figured the default would be "use all of them" unless there is something to turn it off. (Or have a config for each package in the first option that determines if it is automatic or not). Not for this, I was thinking a separate APM project that provides feedback ( |
The first approach sounds better to me. |
I figured it was a good time for an update with my work over at dmoonfire/spell-check. I will be squashing the commit before I submit the PR but I'm a very noisy/frequent committer. The system now can identify incorrect words from multiple system dictionaries. I implemented the Dictionaries can now be positive matches (ignore words) or negative matches (incorrect words). The drawback of using a lot of dictionaries is the ~500 ms/dictionary loading time. I haven't solved that yet, but I have some ideas. It does use the listener, so it reloads the dictionaries when those configuration values change. It doesn't recheck the document yet. I also have it so the user can list dictionary path. I don't have the Windows one in yet and I haven't tested the Windows 8 logic, but Linux work pretty well with I'm also using I have a second plugin for ignoring known words. We had "GitHub" and "github" both listed in the single dictionary. That is now a configuration option, so anyone can add other entries, such as their name (most dictionaries don't like my last name of "Moonfire"). The ignore is based on regexp, so if you put "GitHub" it converts it into Current plans until this weekend when I have to actually use it:
|
Finished my development for the week, so here is the status until I can wander back (hopefully in a week or so). I'm behind with deadlines, but I got the two packages up to the point I can write a bunch of words and see where it's painful to use. ProjectsProject plugin (dmoonfire/spell-check-project) is now functional. This uses a {
"localWords": [
"word",
"/wordmustbelowercase/",
"/wordCanBeLowercase/i"
]
} The project files are aware of the multiple project paths, so a given file will use it's own project If Rewriting the SuggestionsThe biggest improvement is that suggestions now work across all dictionaries. They are gathered from every dictionary that provides suggestions and then interspersed together based on the plugin Suggestions for regex-based items (ignoreWords and project) will fake what will be replaced with the suggestion. Eventually, it should do the Emacs thing (if the compared word starts with a capital, make the suggestion start). Right now, whatever you put in the regex is given as a replacement. Both ignoreWords and the project dictionary use natural to calculate the Jaro–Winkler string distance so only "similar" words are suggested. I picked an arbitrary distance of 0.90 or higher (1.00 is exact match, 0 is non-match). AddingWhile I wasn't planning on adding to the system dictionary, both the ignoreWords and project allow adding to their dictionaries. In both cases, the option to do so shows up as the last few items in the suggestion list (such as "Add to Project (case-sensitive)") in italic. If that is selected, it will either add it to the In both cases, the file is not rechecked for spelling because I haven't figured out how to do it yet. PerformanceIt still adds a reasonable amount of time to the startup, about 250 ms + 400 ms/dictionary. With 89 project words and editing a 6k word file on my laptop, performance was pretty reasonable (no really obvious delays or slowdowns) at about 80 wpm. It also handled accented characters fairly well, which is good because I use them heavily in my novels. |
* Changed the package to allow for external packages to provide additional checking. (Closes atom#74) - Diabled the task-based handling because of passing plugins. - Two default plugins are included: system-based dictionaries and "known words". - Suggestions and "add to dictionary" are also provided via interfaces. (Closes atom#11) - Modified various calls so they are aware of the where the buffer is located. * Modified system to allow for multiple plugins/checkers to identify correctness. - Incorrect words must be incorrect for all checkers. - Any checker that treats a word as valid is considered valid for the buffer. * Extracted system-based dictionary support into separate checker. - System dictionaries can now check across multiple system locales. - Locale selection can be changed via package settings. (Closes atom#21) - External search paths can be used for Linux and OS X. - Default language is based on Chromium settings. * Extracted hard-coded approved list into a separate checker. - User can add additional "known words" via settings. - Added an option to add more known words via the suggestion dialog. * Updated ignore files and added EditorConfig settings for development. * Various coffee-centric formatting.
* Changed the package to allow for external packages to provide additional checking. (Closes atom#74) - Disabled the task-based handling because of passing plugins. - Two default plugins are included: system-based dictionaries and "known words". - Suggestions and "add to dictionary" are also provided via interfaces. (Closes atom#11) - Modified various calls so they are aware of the where the buffer is located. * Modified system to allow for multiple plugins/checkers to identify correctness. - Incorrect words must be incorrect for all checkers. - Any checker that treats a word as valid is considered valid for the buffer. * Extracted system-based dictionary support into separate checker. - System dictionaries can now check across multiple system locales. - Locale selection can be changed via package settings. (Closes atom#21) - External search paths can be used for Linux and OS X. - Default language is based on Chromium settings. * Extracted hard-coded approved list into a separate checker. - User can add additional "known words" via settings. - Added an option to add more known words via the suggestion dialog. * Updated ignore files and added EditorConfig settings for development. * Various coffee-centric formatting.
* Changed the package to allow for external packages to provide additional checking. (Closes atom#74) - Disabled the task-based handling because of passing plugins. - Two default plugins are included: system-based dictionaries and "known words". - Suggestions and "add to dictionary" are also provided via interfaces. (Closes atom#11) - Modified various calls so they are aware of the where the buffer is located. * Modified system to allow for multiple plugins/checkers to identify correctness. - Incorrect words must be incorrect for all checkers. - Any checker that treats a word as valid is considered valid for the buffer. * Extracted system-based dictionary support into separate checker. - System dictionaries can now check across multiple system locales. - Locale selection can be changed via package settings. (Closes atom#21) - External search paths can be used for Linux and OS X. - Default language is based on Chromium settings. * Extracted hard-coded approved list into a separate checker. - User can add additional "known words" via settings. - Added an option to add more known words via the suggestion dialog. * Updated ignore files and added EditorConfig settings for development. * Various coffee-centric formatting.
* Changed the package to allow for external packages to provide additional checking. (Closes atom#74) - Disabled the task-based handling because of passing plugins. - Two default plugins are included: system-based dictionaries and "known words". - Suggestions and "add to dictionary" are also provided via interfaces. (Closes atom#11) - Modified various calls so they are aware of the where the buffer is located. * Modified system to allow for multiple plugins/checkers to identify correctness. - Incorrect words must be incorrect for all checkers. - Any checker that treats a word as valid is considered valid for the buffer. * Extracted system-based dictionary support into separate checker. - System dictionaries can now check across multiple system locales. - Locale selection can be changed via package settings. (Closes atom#21) - External search paths can be used for Linux and OS X. - Default language is based on Chromium settings. * Extracted hard-coded approved list into a separate checker. - User can add additional "known words" via settings. - Added an option to add more known words via the suggestion dialog. * Updated ignore files and added EditorConfig settings for development. * Various coffee-centric formatting.
* Changed the package to allow for external packages to provide additional checking. (Closes atom#74) - Disabled the task-based handling because of passing plugins. - Two default plugins are included: system-based dictionaries and "known words". - Suggestions and "add to dictionary" are also provided via interfaces. (Closes atom#10) - Modified various calls so they are aware of the where the buffer is located. * Modified system to allow for multiple plugins/checkers to identify correctness. - Incorrect words must be incorrect for all checkers. - Any checker that treats a word as valid is considered valid for the buffer. * Extracted system-based dictionary support into separate checker. - System dictionaries can now check across multiple system locales. - Locale selection can be changed via package settings. (Closes atom#21) - Multiple locales can be selected. (Closes atom#11) - External search paths can be used for Linux and OS X. - Default language is based on the process environment, with a fallback to the browser, before finally using `en-US` as a fallback. * Extracted hard-coded approved list into a separate checker. - User can add additional "known words" via settings. - Added an option to add more known words via the suggestion dialog. * Updated ignore files and added EditorConfig settings for development. * Various coffee-centric formatting.
* Changed the package to allow for external packages to provide additional checking. (Closes atom#74) - Disabled the task-based handling because of passing plugins. - Two default plugins are included: system-based dictionaries and "known words". - Suggestions and "add to dictionary" are also provided via interfaces. (Closes atom#10) - Modified various calls so they are aware of the where the buffer is located. * Modified system to allow for multiple plugins/checkers to identify correctness. - Incorrect words must be incorrect for all checkers. - Any checker that treats a word as valid is considered valid for the buffer. * Extracted system-based dictionary support into separate checker. - System dictionaries can now check across multiple system locales. - Locale selection can be changed via package settings. (Closes atom#21) - Multiple locales can be selected. (Closes atom#11) - External search paths can be used for Linux and OS X. - Default language is based on the process environment, with a fallback to the browser, before finally using `en-US` as a fallback. * Extracted hard-coded approved list into a separate checker. - User can add additional "known words" via settings. - Added an option to add more known words via the suggestion dialog. * Updated ignore files and added EditorConfig settings for development. * Various coffee-centric formatting.
I wasn't entirely sure where the best place to put this, but this seems like the most likely the best one. I'm curious about thoughts and designs on making
spell-check
pluggable much like how linter provides a framework for linting and different lint packages provide the actual processing. This is related to using other dictionaries but more than just choosing a different file.This would allow supporting Emacs-style "LocalWords" in a file or directory-specific word lists (both via Atom packages). The latter is important to me because I write novels and short stories. In most of my novels, there are hundreds of project-specific words that I don't want in my system dictionary but I do want checked into Git (saves me adding the words back in every time my machine explodes). I also have genre and world-specific lists ("mage" for example for fantasy genre, "Fedran" for my world).
For Emacs, I wrote caspell which got me the bulk of this functionality but I'd like to switch over to Atom. I already miss it.
After working with the linter framework, it seems like the same could be done for spell-check. Create a framework that gathers up all the packages that provide the spell-check service, and then query them to see if a given word is correct among any of them. If it isn't, then gather up suggestions with a given weight and display the top X unique values. Most of the
spell-check
already provides that, I'd just like a framework to write my per-directory word lists that coordinates with the system dictionary.Linter could also be used for spell-checking in general and I considered writing the framework in that to just call spell-check, but I wasn't sure how closely you'd want those packages tied together. Linter doesn't have the ability to display fixes (Correct Spelling..., context menu), but it is possible to add that if that would be a better framework to consider.
These are other things I'm aiming for in the future, which may or may not be applicable for the framework (but probably not
spell-check
directly):providesSpelling
orprovidesDictionary
function), but that would allow for things like dictionary.com or Google lookup of definitions.This is something I want, so I'm willing to do coding toward it. But, better to get feedback before blindly writing something. :)
The text was updated successfully, but these errors were encountered: