- added and improved rules
- updated dictionary (catalan-pos-dict-2.14)
- added and improved rules
- extended spelling dictionary
- additional tags for personal pronouns, e.g.
us[we/PRP,we/PRP_O1P]
;mine[mine/PRP$,I/PRP$_P1S]
- added and improved rules
- small rule improvements
- added and improved rules
- the sentence length rule is now active in 'picky' mode
- added and improved rules
- added words and POS data
- fixed tons of false positives
- small rule improvements
- added and improved rules
- updated dictionary (spanish-pos-dict-1.2)
- new words in the POS dictionary
- added and improved rules
- improved tagging and disambiguation
- The sentence length rule is now a text-level rule and it underlines the whole sentence, not just the position where the threshold is reached.
- added and improved rules
- improve tagger and synthesizer to better tag pronouns
- add ArabicTransVerbRule and Arabic Punctuations Whitespace Rules
- added and improved rules
- updated dictionary (catalan-pos-dict-2.14)
- added and improved rules
- updated en_US spellchecker dictionary from http://wordlist.aspell.net (Version 2020.12.07)
- updated en_CA spellchecker dictionary from http://wordlist.aspell.net (Version 2020.12.07)
- updated en_AU spellchecker dictionary from http://wordlist.aspell.net (Version 2020.12.07)
- updated en_GB spellchecker dictionary from https://github.com/marcoagpinto/aoo-mozilla-en-dict (Version 2021.03.01)
- updated en_ZA spellchecker dictionary from https://github.com/marcoagpinto/aoo-mozilla-en-dict (Version 2021.02.15)
- changes in the word tokenizer for contractions and possessives, e.g.
does[do/VBZ]n't[not/RB]
;Harper[Harper/NNP,harper/NN]'s['s/POS]
- added and improved rules
- added and improved rules
- added and improved rules
- added words and POS data
- fixed tons of false positives
- added and improved rules
- added and improved rules
- updated dictionary (spanish-pos-dict-1.1)
- over 6000 new words in the POS dictionary
- added and improved rules
- improved tagging and disambiguation
- added and improved rules
- updated dictionary (catalan-pos-dict-2.13)
- added and improved rules
- There's now support for Belgian Dutch (
nl-BE
). "Dutch" (nl
) is still the default. nl-BE-specific rules can be added tonl-BE/grammar.xml
- added and improved rules
- updated en_GB spellchecker dictionary from https://github.com/marcoagpinto/aoo-mozilla-en-dict (Version 2.91 - 2020-12-01)
- added and improved rules
- added and improved rules
- updated spell checker and POS dictionary (unified in one dictionary) to lexique-grammalecte 7-0 (source: https://grammalecte.net/download.php?prj=fr), as an external dependency (source: https://github.com/languagetool-org/french-pos-dict)
- added and improved rules
- added words and POS data
- fixed tons of false positives
- added and improved rules
- added new words and POS data for it
- improved suggestions algorithms for spellchecking
- added and improved rules
- updated dictionary (spanish-pos-dict-0.9)
- over 7000 new words in the POS dictionary
- added and improved rules
- improved tagging and disambiguation
- There's now
RegexAntiPatternFilter
which can be used to have antipatterns for<regexp>
rules. Use like this:Note that the regex after<regexp>my regex</regexp> <filter class="org.languagetool.rules.patterns.RegexAntiPatternFilter" args="antipatterns:regex1|regex2"/>
antipatterns:
cannot contain spaces. - German, French, Dutch, and Spanish have ngram-based false friends for
some time already, meaning that a German/Dutch/... native speaker will
get an error if (probably) using and English word incorrectly in an English
text. The change in this version is that for all other language pairs that
also have false friends, these rules are now active only in picky mode
(
--level PICKY
on the command line,level=picky
with the HTTP API.)
- fixed https://github.com/languagetool-org/languagetool/issues/3666 ("... not a language code known to LanguageTool")
- fixed a NullPointerException crash in the LibreOffice/OpenOffice add-on
- added and improved rules
- added and improved rules
- added and improved rules
- updated en_GB spellchecker dictionary from https://github.com/marcoagpinto/aoo-mozilla-en-dict (Version 2.88 - 2020-09-01)
- added and improved rules
- Updated the German part-of-speech dictionary (https://github.com/languagetool-org/german-pos-dict) to version 1.2.2.
- each pair of
ProhibitedCompoundRule
has its own ID now, so it can be separately turned on/off - added and improved rules
- small rule improvements
- added and improved rules
- added words and POS data
- fixed tons of false positives
- added and improved rules
- added and improved rules
- dictionary update
- many new punctuation rules
- many new styling rules
- tokenization and tagging improvements
- disambiguation improvements
- each pair of
ConfusionProbabilityRule
has its own ID now, so it can be separately turned on/off - new XML attribute
chunk_re
for<token>
, which specifies a chunk as a regular expression
- added and improved rules
- updated POS dictionary (Arramooz #e33794e)
- remove the Algerian variant (ar-DZ)
- add support of ngram data (languagetool-tools-ar)
- add Darja, Diacritics, Redundancy, WrongWordInContext, Wordiness, Homophones and WordCoherency rules.
- added and improved rules
- updated dictionary (catalan-pos-dict-2.10)
- added and improved rules
- updated en_GB spellchecker dictionary from https://github.com/marcoagpinto/aoo-mozilla-en-dict (Version 2.85 - 2020-06-01)
- added and improved rules
- added and improved rules
- added and improved rules
- rules that apply to de-DE and de-AT (but not de-CH) can now be placed in
de/de-DE-AT/grammar.xml
- Updated the German part-of-speech dictionary (https://github.com/languagetool-org/german-pos-dict) to version 1.2.1.
- Special chars
_
and/
can now be escaped inspelling.txt
andspelling_custom.txt
using the backslash. For example,foo\/s
will addfoo/s
to the dictionary.
- commented out rules that caused many false alarms
- added and improved rules
- added words and POS data
- fixed tons of false positives
- added and improved rules
- added new Java rules
- rebuilt and improved main spellchecker dictionary, added many new words
- new variant (only yo "ё") spellchecker dictionary and new java rule for it (set off by default)
- new
filter
arguments:prefix
andsuffix
to be used for matching the part-of-speech of parts of words with prefix and suffix added to original token, e.g.:
<filter class="org.languagetool.rules.ru.RussianPartialPosTagFilter"
args="no:2 regexp:(.*) postag_regexp:(ADV) prefix:не suffix: "/>
- commented out rules that caused many false alarms
- added and improved rules
- new tagger dictionary by Jaume Ortolà, LGPL, source: https://github.com/jaumeortola/spanish-dict-tools
- the spelling rule is enabled in LibreOffice using the tagger dictionary (no other spelling dictionary is needed)
- dictionary update, including many rare and slang words
- new rules
- tokenization and tagging improvements
- disambiguation improvements
- added
replace_custom.txt
for several languages so users can have their own very simple replace rules without worrying about updates (they still need to copy the file to the new LT version, though). - Updated dependency
com.gitlab.dumonts:hunspell
to 1.1.1 to make spell checking work on older Linux distributions like RHEL 7.
- Added initial support for Arabic, contributed by Sohaib Afifi (#2219)
- added and improved rules
- updated dictionary (catalan-pos-dict-2.7)
- added and improved rules
- added and improved rules
- added new part-of-speech tag
ORD
for ordinal numbers (e.g., first, second, twenty-third etc.) - updated en_GB spellchecker dictionary from https://github.com/marcoagpinto/aoo-mozilla-en-dict (Version 2.82 - 2020-03-01)
- improved rules
- added and improved rules
compounds.txt
now automatically expandsß
toss
when using German (Switzerland)- German
spelling.txt
now supportsprefix_verb
syntax likevorüber_eilen
so the speller will accept all forms of "eilen" prefixed by "vorüber"
- Added initial support for Irish, contributed by Jim Regan (#2260)
- added and improved rules
- added words and POS data
- small improvements
- dictionary update
- new rules
- tokenization and tagging improvements
- added and improved rules
- updated dictionary (catalan-pos-dict-2.6)
- Now using https://github.com/hankcs/HanLP for tokenization (PR 1981)
- corrections are now offered for spell check errors
- updated spell checker to version 2.4 (2018-04-15) (source: https://extensions.libreoffice.org/extensions/stavekontrolden-danish-dictionary)
- added and improved rules
- added and improved rules
- updated en_GB spellchecker dictionary from https://github.com/marcoagpinto/aoo-mozilla-en-dict (Version 2.79 - 2019-12-01)
- updated en_US spellchecker dictionary from http://wordlist.aspell.net (Version 2019.10.06)
- updated en_CA spellchecker dictionary from http://wordlist.aspell.net (Version 2019.10.06)
- updated en_AU spellchecker dictionary from http://wordlist.aspell.net (Version 2019.10.06)
- corrections are now offered for spell check errors
- improved rules
- updated spell checker (Grammalecte·dic/Dicollecte) to version 6.4.1 (2019-04-05) (source: https://grammalecte.net/download.php?prj=fr)
- updated part-of-speech dictionaries to dicollecte-6.4.1 (#1963)
- added and improved rules
- updated spelling dictionary to el_GR 0.9 (14/03/2019), by George Zougianos
- updated spell checker to version 1.82 (2015-10-23) (source: https://extensions.libreoffice.org/extensions/khmer-spelling-checker-sbbic-version)
- added and improved rules
- added words and POS data
- added new words
- improve java rule
- updated spelling dictionary to version 2.42 (Released Feb 03, 2019) (source: https://extensions.libreoffice.org/extensions/swedish-spelling-dictionary-den-stora-svenska-ordlistan)
- dictionary update
- new rules
- tokenization improvements
- The unmaintained code from package
org.languagetool.dev.wikipedia.atom
has been removed. It hadn't been maintained for years and didn't work properly anymore. spelling_global.txt
has been added. Words or phrases added here will be accepted for all languages.prohibit_custom.txt
andspelling_custom.txt
can be used to make your own additions tospelling.txt
andprohibit.txt
without having to edit those files after a LanguageTool update (you will still need to manually copy those files). Paths to these files (xx
= language code):./org/languagetool/resource/xx/hunspell/prohibit_custom.txt
./org/languagetool/resource/xx/hunspell/spelling_custom.txt
Note that you can simply create these files if they don't exist for your language yet.
- The dynamic languages feature (
lang-xx=...
andlang-xx-dictPath=...
) now also supports hunspell dictionaries. Just letlang-xx-dictPath
point to the absolute path of the.dic
file. Note that hunspell is quite slow when it comes to offering suggestions for misspelled words.
AbstractSimpleReplaceRule2
has been fixed so that it's now case-insensitive. If you implement a sub class of it and you want the old behavior, please implementisCaseSensitive()
and have it returntrue
. (Issue #2051)
- The internal hunspell has been updated from 1.3 to 1.7, now using https://gitlab.com/dumonts/hunspell-java as the project providing the bindings. For Portuguese, this speeds up generating suggestions for misspellings by a factor of about 3 (but it's still slow compared to Morfologik). 32-bit systems are not supported anymore (only affects languages like German and French).
- Experimental: the new
default="temp_off"
attribute ingrammar.xml
files will turn off a rule/rulegroup, but keep it activated for our nightly regression tests. - Many external dependencies have been updated to new versions.
- added and improved rules
- updated dictionary (catalan-pos-dict-2.5)
- added and improved rules
- added and improved rules
- updated en_GB spellchecker dictionary from https://github.com/marcoagpinto/aoo-mozilla-en-dict (Version 2.76 - 2019-09-01)
- improved rules
- improved rules
- added and improved rules
- small rule improvements
- added rules and significantly improved accuracy
- disambiguation improvements
- POS and spelling improvements
- improved rules
- added new words to spellchecker dictionary
- added and improved rules
- 2k of new words in the dictionary
- improved tokenization
- improved dynamic tagging
- added and improved rules
- Spell suggestion improvements: for many cases of a misplaced space, the suggestions are now better. For example, "thef eedback" can now be corrected to "the feedback" in one step. (#1729)
- The synthesizer now considers entries in
added.txt
andremoved.txt
(except for Catalan and Polish; for German removing compounds inremoved.txt
might not work) (#884)
- added and improved rules
- updated dictionary (catalan-pos-dict-2.4) with more health terminology
- added and improved rules
- added and improved rules
- introduced new part-of-speech tag
PCT
for punctuation marks (.,;:…!?
) - updated en_GB spellchecker dictionary from https://github.com/marcoagpinto/aoo-mozilla-en-dict (Version 2.73 - 2019-06-01)
- added and improved rules
- added and improved rules
- Rule
FRENCH_WHITESPACE
has been split intoFRENCH_WHITESPACE
(on by default) andFRENCH_WHITESPACE_STRICT
(off by default).FRENCH_WHITESPACE
only complains if there's no space at all before?
,!
,;
,:
, or»
.FRENCH_WHITESPACE_STRICT
complains if there's no space or a common space instead of a non-breaking space before these characters. - added some popular names to dictionary
- added verbal agreement rules
- added and improved rules
- The false friend rule has been modified to use ngrams: Now false friends
cause error messages if they are used in a wrong context, according to ngram statistics.
Note that some pairs from
false-friends.xml
are not supported anymore because their precision isn't good enough. Seeconfusion_sets_l2_de.txt
for active DE/EN pairs. UseMy handy is broken.
to test the rule. As before, this will only create an error ifmotherTongue
is set to a German language code. prohibit.txt
: lines starting with.*
will prohibit all words ending with the subsequent string (e.g.,.*artigel
will prohibitVersandartigel
)
- added rules
- added popular names to dictionary
- POS and spelling improvements
- added and improved rules
- added new words to spell dictionary
- updated spell dictionary from 2.1 to 2.4
- support for new spelling rules from 2019
- thousands of new words in the dictionary
- many rule improvements
- tokenization and tagging improvements
altLanguages
will only be considered for words with >= 3 characters- Cleaned up error handling: invalid parameters will now return an HTTP error 400 instead of 500.
- Fixed a bug that caused the rules in the options dialog to not appear in the text language
- added and improved rules
- updated dictionary (catalan-pos-dict-2.3) with health terminology
resource/en/en-US-GB.txt
contains a mapping from US to British English and vice versa. It's not used to detect correct or incorrect spellings, but only to improve error messages so that they explicitly explain that the incorrect word is actually a different variant (like 'colour' in an en-US text).- updated en_GB spellchecker dictionary from https://github.com/marcoagpinto/aoo-mozilla-en-dict (Version 2.70 - 2019-03-01)
- spell check ignores single characters (e.g., 'α')
- added and improved rules
- disambiguation improvements
- foreign names recognition
- added and improved rules
- Simple German: added and improved rules
- improved suggestions for typos that end with a dot (typically at the end of the sentence) - the dot is not included anymore
- spell check ignores single characters (e.g., 'α') and hyphenated compounds (e.g., 'α-Strahler')
- added and significantly improved rules accuracy
- disambiguation improvements
- Chinese common names are now detected
- POS and spelling improvements
- updated Hunspell dictionaries to:
- [pt-PT pos-AO] Dicionários Portugueses Complementares 3.1
- added and improved rules
- disambiguation improvements
- added many words without "yo" letter to POS dictionary
- added new words to spell dictionary
- dictionary update
- added and improved rules
- improvements to tokenization, tagging, and disambiguation
- URLs written like
mydomain.org/
are now detected as domains and not considered spelling errors anymore. Note that the slash is still needed to avoid missing real errors. - JSON output: The
replacements
list now has an optional new itemshortDescription
for eachvalue
. It can contain a short definition/hint about the word. Currently, the only words that have a short description are ones that have a description inconfusion_sets.txt
(i.e. a text after the|
symbol).
- bug fix: don't make
interpretAs
part of getTextWithMarkup() (#1393) - Experimental new attribute
raw_pos
for the<pattern>
element ingrammar.xml
. If set toyes
, thepostag
will refer to the part-of-speech tags before disambiguation. - Experimental support for
<antipattern>
indisambiguation.xml
- Experimental new parameter
preferredLanguages
: up to a certain limit (currently 50 characters), only these languages will be considered for language detection. This has to be a comma-delimited list of language codes without variants (e.g. use 'en', not 'en-US'). This only works with fasttext configured as the language detector. - Spellcheck-only languages can now be added dynamically from the configuration
using
lang-xx=languagename
andlang-xx-dictPath=/path/to/morfologik.dict
.xx
needs to be the language code. The JSON result will containspellCheckOnly: true
for these languages.
- Fixed a bug that prevented opening the Options dialog in LibreOffice/OpenOffice
- added and improved rules
- updated dictionary
- added and improved rules, including more confusion rules for dyslectic people
- added large amount of family names to reduce false alarms in spelling
- added and improved rules
- segmentation improvements
- updated en_GB spellchecker dictionary from https://github.com/marcoagpinto/aoo-mozilla-en-dict (Version 2.67 - 2018-12-01)
- added rules for 'Oxford spelling' (applicable to British English only)
- small rule improvements
- added and improved rules
- Swiss German: improved POS tagging of words that contain 'ß' in de-DE German (e.g., 'gross' is tagged as 'gross[groß/ADJ:PRD:GRU]'); (#1147)
- Simple German: added and improved rules; restructured grammar.xml
- added and improved rules
- disambiguation improvements
- POS and spelling improvements
- added and improved rules
- disambiguation improvements
- POS and spelling dictionary improvements
- Serbian never moved beyond its "initial support" state with a tiny number of rules, and it has no active maintainer, so we have deactivated it for now. If you'd like to maintain support for Serbian, let us know in the forum (https://forum.languagetool.org). Once it's clear that a new active long-term maintainer has been found, we'll activate support for Serbian again.
- dictionary update (about 7k of new words)
- added and improved rules
- improvements to tokenization, tagging, and disambiguation
- Experimental support for
altLanguages
parameter: takes a list of language codes. Unknown words of the main languages (as specified by thelanguage
parameter) will cause errors of type "Hint" if accepted by one of these languages. We expect clients to interpret this like style issues, e.g. these words should be underlined with a light blue instead of red. Support for this is experimental, i.e. it might be removed again or implemented in a different way. - Experimental support for
noopLanguages
parameter: takes a list of language codes of languages that are not supported by LT but that will be detected and mapped to a no-op language without rules. Useful for clients that rely on language auto-detection and whose users might use languages not supported by LT. NOTE 1: only works with fastText configured NOTE 2: setting languages here will worsen language detection quality on average - Change to language detection behavior: Removed fallback to English when confidence of
detection algorithm is low, instead now always returning highest scoring detected language.
Added a field
confidence
todetectedLanguage
object in the JSON response that contains the probability score for the detected language as computed by the detection algorithm.
- added and improved rules
- added and improved rules
- added and improved rules
- added and improved rules
- small rule improvements
- added and improved rules
- added and improved rules
- added and improved rules
- added and improved rules
- improvements to disambiguation, and segmentation
- updated Hunspell dictionaries to:
- [pt-PT pos-AO] Dicionários Portugueses Complementares 3.0
- added and improved rules
- added and improved rules
- Prepared support for AIX. See https://github.com/MartinKallinger/hunspell-aix for the required libraries
- Email signatures are now ignored for language detection as long as they are
separated from the main text with
\n-- \n
- The server can now accept JSON as the
data
parameter that describes markup. For example:With this input, LT will ignore the{"annotation":[ {"text": "A "}, {"markup": "<b>"}, {"text": "test"}, {"markup": "</b>"} ]}
markup
parts and run the check only on thetext
parts. The error offset positions will still refer to the original input including the markup, so that suggestions can easily be applied. You can optionally useinterpretAs
to have markup interpreted as whitespace, like this:Note that HTML entities (including{"markup": "<p>", "interpretAs": "\n\n"}
) still need to be converted to Unicode characters before feeding them into LT. (Issue: #757) - The
blockedReferrers
setting now also considers theOrigin
header - A
blockedReferrers
setting offoobar.org
will now automatically matchhttp://foobar.org
,http://www.foobar.org
,https://foobar.org
, andhttps://www.foobar.org
- New setting
fasttextModel
(see https://fasttext.cc/docs/en/language-identification.html) andfasttextBinary
(see https://fasttext.cc/docs/en/support.html). With these options set, the automatic language detection is much better than the built-in one. - Experimental new
mode
parameter withall
,textLevelOnly
, orallButTextLevelOnly
as value: Will check only text-level rules or all other rules. As there are fewer text-level rules, this is usually much faster and the access limit for characters per minute that can be checked is more generous for this mode. - Improved spellchecker suggestions (not yet enabled by default). See https://forum.languagetool.org/t/gsoc-reports-spellchecker-server-side-framework-and-build-tool-tasks/2926/43
- Experimental new
type
in JSON. This is supposed to help clients choose the color with which they underline/mark errors. Please do not rely on this yet, it might change or even be removed.
- made many messages shorter
- updated FSA spelling dictionary from An Drouizig Breton Spellchecker 0.15
- added and improved rules
- rules and updated dictionary for new diacritics rules (IEC 2017)
- added and improved rules
- added and improved rules
- updated en_GB spellchecker dictionary from https://github.com/marcoagpinto/aoo-mozilla-en-dict (Version 2018-06-01)
- updated en_US spellchecker dictionary from http://wordlist.aspell.net (Version 2018.04.16)
- updated en_CA spellchecker dictionary from http://wordlist.aspell.net (Version 2018.04.16)
- added and improved rules
- added and improved rules
- updated jwordsplitter to 4.4 to prevent excessively long processing times for artificially long compounds
prohibit.txt
: lines ending with ".*" will prohibit all words starting with the previous string
- added and improved rules
- added rules
- added and improved rules
- added and improved grammar and punctuation rules
- upgraded the tagging and synthesizer dictionaries from AOT.ru rev.269 (extend tags, add missing tags)
- spelling dictionary update
- added and improved a few rules
- dictionary update (more than 15k of new words)
- added and improved rules
- some improvements to tokenization, tagging and disambiguation
- The JSON contains a new section
detectedLanguage
(underlanguage
) that contains information about the automatically detected language. This way clients can suggest switching to that language, e.g. in cases where the user had selected the wrong language. - New optional configuration setting
blockedReferrers
: a comma-separated list of HTTP referrers that are blocked and will not be served - BETA: New optional configuration settings
dbDriver
,dbUrl
,dbUsername
,dbPassword
to allow user-specific dictionaries
- The parameters of the
*SpellerRule
classes (e.g.MorfologikRussianSpellerRule
) have changed LanguageIdentifier
will now only consider the first 1000 characters when identifying the language of a text. This improves performance for long texts.
- added and improved rules
- added some rules
- added and improved rules
- added new Java rule
NL_PREFERRED_WORD_RULE
that suggests preferred words (e.g., 'fiets' for 'rijwiel')
- all-uppercase words are now also spellchecked
- added and improved rules
- added remaining collocation rules (~130) contributed by Nicholas Walker (Bokomaru)
- words written with x-sistemo now get proper POS tag so grammar mistakes can now be found in: ambaux virino (->ambaux virinoj), mi farigxis maljunan (-> mi farigxis maljuna), etc.
- added and improved rules
- added many
<url>
to rules
- improved suggestion for spelling mistakes (#912)
- added a couple of rules
- added and improved rules
- New rule that checks coherent use of Du/du, Dich/dich etc. Assumes that the first use has 'correct' capitalization and suggests the same capitalization for subsequent uses.
- New line extension
-*
forignore.txt
: entries ending with-*
are ignored only if they are part of a hyphenated compound (e.g,Fair-Trade-*
allowsFair-Trade-Kakao
) - Added a new rule that tries to find compounds that are probably not correct, like
Lehrzeile
instead ofLeerzeile
, requires ngram data (rule idDE_PROHIBITED_COMPOUNDS
)
- added and improved rules
- added and improved rules
- sentence segmentation improvements
- added and improved rules
- upgraded the tagging and synthesizer dictionaries with extended POS tags from AOT.ru rev.269
- update to the part-of-speech dictionary
- dictionary update (~5K new lemmas)
- compound word tagging improvements
- many new disambiguation rules
- several new barbarism and grammar rules
- The server now returns HTTP error code 500 in case of a timeout (it used to return 503)
- Constructors that take a
ResultCache
have been removed fromMultiThreadedJLanguageTool
as using them caused incorrect results. (#897)
- added and improved rules
- updated and renamed dictionary: ca-ES.dict (external dependency: catalan-pos-dict 1.6)
- added new dictionary for Valencian including most words from Diccionari Normatiu Valencià (AVL): ca-ES-valencia.dict (external dependency: catalan-pos-dict 1.6)
- added and improved rules
- added and improved rules
- removed the category
MISC
and moved the rules to more specific categories - added WordCoherencyRule, to detect cases where two different variants of a word are used in the same text (e.g. archaeology and archeology)
- added approximately 70 collocation rules contributed by Nicholas Walker (Bokomaru)
- added support for locale-specific spelling suggestions (locale-specific spelling_en-XY.txt files)
- updated en_GB spellchecker dictionary from https://github.com/marcoagpinto/aoo-mozilla-en-dict
- updated en_US spellchecker dictionary from http://wordlist.aspell.net (Version 2017.08.24)
- updated en_CA spellchecker dictionary from http://wordlist.aspell.net (Version 2017.08.24)
- LT now offers suggestions for spelling errors
- added and improved rules, including:
- grammar: agreement rules added (only number and gender agreement)
- common normative errors: includes Castilianisms, Lusitanianisms, Hipergalicisms, archaisms and Anglicisms correction
- style: barbarism, redundant expressions, and wordy expressions detection added
- typography: spacing and number formatting improvements; chemical formulas; degree signs; dashes; punctuation; international system standards; and mathematical symbol formatting
- development, punctuation and repetition rules categories added
- multiword disambiguation added
- disambiguation improvements
- new word tokenizer
- significant POS tagging and synthesizing improvements
- spellchecking exceptions for:
- abbreviations;
- variables in formulas, units, and related statistical vocabulary;
- common Latin, English and French expressions;
- species scientific names;
- famous personalities
- updated Hunspell dictionaries to:
- [gl-ES] Version 12.10 "Xoán Manuel Pintos"
- added and improved rules
- New rule that checks coherency of hyphen usage in compounds, e.g. it complains when "Ärzteverband" and "Ärzte-Verband" are both used in the same text. While both spellings are correct, it's probably a good idea to stick to one spelling.
- improved POS tagging of hyphenated compounds (e.g., "CO2-arm" is recognized as a variant of "arm")
- added rules
- disambiguation improvements
- added and improved rules
- LibreOffice category rules moved to other categories
- disambiguation improvements
- updated Hunspell dictionaries to:
- [pt-PT pos-AO] Dicionários Portugueses Complementares 2.2
- [pt-AO pre-AO] Dicionários Portugueses Complementares 2.2
- [pt-MZ pre-AO] Dicionários Natura 14.08.2017
- added and improved grammar and punctuation rules
- spelling dictionary update
- new Russian-English false friends added (thanks to ZakShaker)
- initial support for Serbian by Zoltán Csala
- big dictionary update (~10K new lemmas)
- improvements in tokenization
- compound word tagging improvements
- more than 350 new disambiguation rules
- several new barbarism and grammar rules
- Now runs with Java 9 (compilation with Maven still has issues with Java9)
- The spell checker tries harder to find suggestion for misspellings that have
a Levenshtein distance of larger than 2. The maximum Levenshtein distance is now 3.
This way you now get a suggestion for e.g.
algortherm
(algorithm) ortheromator
(thermometer). In the worst case (every single word of a text misspelled), this has a performance penalty of about 30%. - Better support for Unicode codepoints greater than
0xFFFF
- word2vec word embeddings (cf. http://colah.github.io/posts/2014-07-NLP-RNNs-Representations/#word-embeddings) are now supported as additional language models and currently available for English, German, and Portuguese.
- Neural network based rules for confusion pair disambiguation using the
word2vec model are available for English, German, and Portuguese. The necessary
data must be downloaded separately from https://fscs.hhu.de/languagetool/word2vec.tar.gz.
For details, please see:
- Code: https://github.com/gulp21/languagetool-neural-network
- Forum discussion: https://forum.languagetool.org/t/neural-network-rules/2225
- Paper: "Development of neural network based rules for confusion set disambiguation in LanguageTool" by Markus Brenneis and Sebastian Krings: https://fscs.hhu.de/languagetool/summary.pdf
- show line numbers in the text area
- a directory with word2vec language model for neural network rules can now be specified in the configuration dialog, see https://forum.languagetool.org/t/neural-network-rules/2225
- Stop disposition of vertical scroll when expanding the checkbox.
- A
RuleMatch
can now have a URL, too. The URL usually points to a page that describes the error or grammar rule in more detail. Before, only theRule
could have a URL. ARuleMatch
URL will overwrite theRule
URL in the JSON output. - A
RuleMatch
now also has information about the sentence the error occurred in (it used to have only position information and the caller was expected to find the error context and/or sentence position in the original text).
- change in configuration:
requestLimit
andrequestLimitPeriodInSeconds
now both need to be set for the limit to work - new property key
timeoutRequestLimit
: similar torequestLimit
, but this one limits not all requests but blocks once this many timeouts have been caused by the IP in the time span set byrequestLimitPeriodInSeconds
- new property key
requestLimitInBytes
: similar torequestLimit
, but this one limits the aggregated size of requests caused by an IP in the time span set byrequestLimitPeriodInSeconds
- New property key
maxErrorsPerWordRate
: set the maximum allowed errors per word, e.g.0.3
if the maximum is about one error per three words. More errors will stop the check with an exception. This is useful so no processing time gets wasted for texts with a huge amount of errors that are only caused by the wrong language being selected (leading to most words being detected as spelling errors). - The JSON output now contains a
sentence
property with the text of the sentence the error occurred in.
- small rule improvements
- added and improved rules
- added and improved rules
- added and improved rules
- added and improved rules
- added and improved rules
- upgraded dictionaries to Dicollecte-6.1
- added and improved rules
- spell checker suggestions have been improved a lot by considering more words, especially compounds (de-DE only so far, not yet active for de-AT and de-CH) (#725)
- added special dictionary extension files
spelling-de-AT.txt
andspelling-de-CH.txt
for de-AT and de-CH that will be considered in addition tospelling.txt
- updates according to "Amtliches Regelwerk der deutschen Rechtschreibung aktualisiert", 6/2017 (http://www.rechtschreibrat.com/DOX/rfdr_PM_2017-06-29_Aktualisierung_Regelwerk.pdf)
- added POS tagging of alternative imperative forms such as "Geh" or "küss" (in addition to "Gehe"/"küsse")
- introduced two new line endings ('?' and '$') for the data-file
compounds.txt
; these endings indicate that the mid-word parts of the compound need to be lower-cased (e.g., 'Geräte Wahl' -> 'Gerätewahl')
- added and improved grammar and style rules, including:
- grammar: general agreement rules, pronominal collocations, paronyms and homophones detection improvements; time agreement rules added
- punctuation: greetings and farewell punctuation
- style: puffery, weasel words, weak expressions, and biased opinion words detection added (disabled by default)
- syntax: new category; fragment detection improvements
- typography: spacing, number, and mathematical symbol formatting improvements
- disambiguation improvements
- false friends added
- Portuguese to Galician (16 new pairs)
- significant POS tagging and synthesizing improvements
- spellchecking exceptions for abbreviations, variables in formulas, units, and related statistical vocabulary
- updated Hunspell dictionaries to:
- [pt-PT pos-AO] Dicionários Portugueses Complementares 2.0
- [pt-AO pre-AO] Dicionários Portugueses Complementares 2.0
- [pt-MZ pre-AO] Dicionários Natura 15.06.2017
- spelling dictionary update
- added and improved some rules
- added and improved some rules
- significant dictionary update:
- more than 60K of new words
- some inflection adjustments
- improved dynamic tagging for compound words
- many new rules (barbarism, grammar, and spelling)
- inflection agreement rule updates
AnnotatedText
(built viaAnnotatedTextBuilder
) can now contain document-level meta data. This might be used by rules in the future.
- added and improved rules
- updated dictionary and rules for official names of Valencian municipalities
- added one rule
- added many rules (by Ruud Baars)
- spelling dictionary update
- added and improved rules
- added and improved rules
- improved messages for old spelling variants, e.g.
Kuß
now suggests onlyKuss
and also has a message explaining the user thatKuß
is an old spelling
- added rules
- added some common typos
- added and improved grammar and style rules, including:
- grammar: general agreement rules, contractions, pronominal collocations, compounding, and paronyms detection
- style: wordy expressions detection added and significant redundant expressions detection improvements
- punctuation: significant improvements
- formal speech: archaims, cacophonies, childish language and slang detection added
- typography: international system standards, number and mathematical symbol formatting
- misspellings: foreign famous personalities common misspelings
- AO90: identify words with changed spelling
- disambiguation improvements
- false friends support added
- Portuguese to Catalan (26 new pairs)
- Portuguese to Spanish (7 new pairs)
- spell checking exceptions for common Latin, English, and French expressions, species scientific names, and famous personalities
- updated Hunspell dictionaries to:
- [pt-PT pos-AO] Dicionários Portugueses Complementares 1.4
- [pt-BR] VERO version 2.1.4
- added and improved rules
- major rule updates by Matúš Matula
- Significant dictionary update:
- thousands of new words
- some inflection adjustments
- Improved dynamic tagging for compound words
- Many new rules (barbarism, grammar, and spelling)
- New noun-verb agreement rule
- The deprecated AfterTheDeadline mode has been removed
- The
apiVersion
property of the JSON output is now a number instead of a string (issue #712)
- Some deprecated methods and classes have been removed.
spelling.txt
allows multi-word entries: the words/tokens (separated by " ") of one line are converted to aDisambiguationPatternRule
in which each word is a case-sensitive and non-inflectedPatternToken
(result: the entire multi-word entry is ignored by the spell checker)
- When running a LT server, the enabled/disabled rules loaded from a configuration file at the startup time will be the new default rules. Previously these rules were "forgotten" when a server query used the parameters for enabling and disabling rules. Now the rules from the query will be added to the rules from the configuration file.
- small rule improvements
- added and improved rules
- updated dictionary
- added and improved rules
- improved rules
- upgraded dictionaries to Dicollecte-6.0.2
- added and improved rules
- added some common Latin, French, and English phrases that will be ignored by the spell checker
- updated Hunspell dictionary to version 2017.01.12:
- added and improved rules
- added one rule
- Lithuanian, Malayalam, and Icelandic are not part of this release anymore. They still exist in the git repository and can be re-activated as soon as a new maintainer takes care of them.
- added and improved grammar and style rules, including:
- grammar: general agreement rules, 'crase', pronomial colocations, impersonal verbs, fragment, and paronyms detection improvements
- capitalization: AO90 and AO45 rules
- style: repetitions and barbarism detection
- typography: number formating, chemical formulas, degrees signs, dash signs, and punctuation
- semantics: wrong words in the context (22 confusion pairs), url validator and date checker improvements
- registered brands category added
- translation errors category added
- false friends support added:
- Portuguese to Spanish (186 new pairs)
- Portuguese to English (156 new pairs)
- Portuguese to French (78 new pairs)
- Portuguese to German (16 new pairs)
- Portuguese to Galician (9 new pairs)
- spellchecking suggestions activated
- updated Hunspell dictionary to:
- [pt-PT pos-AO] Dicionários Portugueses Complementares 1.2
- [pt-AO pre-AO] Dicionários Portugueses Complementares 1.2
- [pt-MZ pre-AO] Dicionários Natura 18.02.2017
- added and improved rules
- updated tagger dictionary from AOT.ru rev.269 with extended POS tags
- Significant dictionary update:
- many new words
- some inflection adjustments
- Many new rules (barbarism, punctuations, and grammar)
- Improved dynamic tagging for compound words
- Options dialog now uses system theme instead of Nimbus.
- Added a
--languageModel
option to the embedded server, thanks to Michał Janik (issue #404)
- The 'AfterTheDeadline' mode has been deprecated and will be removed in the next version, unless users complain and present a valid use case.
- The old XML-based API has been removed. The migration to the new JSON-based API is documented at https://languagetool.org/http-api/migration.php
- Speed up with a cache for cases where the same sentences get checked again (e.g. due to a correction in a text that doesn't affect all sentences but causes the whole text to be re-checked)
- Some deprecated methods have been removed.
- A new class
ResultCache
has been added to speed up the LT server EnglishRule
,GermanRule
,CatalanRule
, andFrenchRule
are now deprecated. These are empty abstract classes that never had any real use. Rules that extend these classes will directly extendRule
orTextLevelRule
in a future release.- All rules that work on the text level instead of the sentence level
(e.g. word coherency) now extend
TextLevelRule
instead ofRule
- OpenNLP has been updated from 1.6.0 to 1.7.2 (only used for English)
- small rule improvements
- added and improved rules
- added and improved rules
- added about 131 confusion pairs like woman/women (works only with ngram data, see http://wiki.languagetool.org/finding-errors-using-n-gram-data)
- The American and Canadian English (en-US, en-CA) spelling dictionaries have been updated to the latest version from http://wordlist.aspell.net (2016.06.26)
- The Australian English (en-AU) spelling dictionary has been updated to the latest version from http://extensions.libreoffice.org/extension-center/english-dictionaries (2016-03-14 according to that page)
- added and improved rules
- upgraded dictionaries to Dicollecte-5.7
- added and improved rules
- added about 34 confusion pairs like ihm/im (works only with ngram data, see http://wiki.languagetool.org/finding-errors-using-n-gram-data)
- bugfix regarding errors in the last word of a sentence (#273)
- The internal part-of-speech dictionary has been updated with the help of Julian von Heyl of http://korrekturen.de - many entries have been fixed and added. The new data has its own Maven and git project now (https://github.com/languagetool-org/german-pos-dict)
- The
Lithuanian
class has been deprecated. Lithuanian in LT hasn't been maintained for years and there's no new maintainer in sight. It has also very low usage on languagetool.org and very few error detection rules anyway, so we'll remove its support from LT in the next release.
- The
Malayalam
class has been deprecated. Malayalam in LT hasn't been maintained for years and there's no new maintainer in sight. It has also very low usage on languagetool.org and very few error detection rules anyway, so we'll remove its support from LT in the next release.
- general agreement rules added
- number and gender words agreement
- general subject-verb agreement
- accentuated form confusion, 'dequeísmos' and many more
- new compound form detection (pt-PT recognizes all compound verbal derivations)
- duplications, redundancies, typography and semantics categories added
- style category rules added
- new word repetitions rules, fragment detection, verbosity checks, passive voice and many other
- new sentence disambiguator and new word tokenizer
- sentence segmentation improvements
- former rules and messages revision, improvement and classification
- post-reform agreement support added and pre-reform components updated
- European Portuguese specific rule group added
- post-reform agreement by default
- compound verbs, possessive pronouns, reflexive forms placement, gerund and more
- pre-reform agreement locales support added
- Angola, Cape Verde, East Timor, Guinea Bissau, Macau, Mozambique and São Tomé e Principe
- base spelling dictionary and tagger update
- variants dictionaries added and many part-of-speech fixes
- European Portuguese specific rule group added
- Portuguese has been prepared to use ngram data, that means it has a
confusion_sets.txt
file where word pairs could be added. See http://wiki.languagetool.org/finding-errors-using-n-gram-data for more information but note that we cannot offer the required ngram data yet for Portuguese, as we rely on the Google ngram data and Portuguese isn't part of that.
- added and improved many rules
- added new rules with java filter
- added new Java rule
RussianWordCoherencyRule
- added words suggested by users
- improved disambiguation rules
- updated tagger dictionary from AOT rev.268 with extended POS tags
- improved SRX sentences segmentation
- added
removed.txt
for words that need to be removed from the dictionary
- added and improved rules
- significant dictionary update
- new adj/noun inflection rule
- dynamic tagging improvements
- disambiguation improvements
- some improvements to existing rules
- experimental noun/verb agreement rule
- The old API has been deactivated, as documented at https://languagetool.org/http-api/migration.php - it now returns a pseudo error pointing to the migration page
-
A new method for removing overlapping errors has been implemented. By default, it is enabled for the HTTP API and LibreOffice outputs, and disabled for the command-line output. If necessary, priorities for rules and categories can bet set in
Language.getPriorityForId(String id)
. Default value is0
, positive integers have higher priority and negative integers have lower priority. -
Language.getShortName()
has been deprecated, useLanguage.getShortCode()
instead -
Language.getShortNameWithCountryAndVariant()
has been deprecated, useLanguage.getShortCodeWithCountryAndVariant()
instead -
Languages.getLanguageForShortName()
has been deprecated, useLanguages.getLanguageForShortCode()
instead -
The following languages have been unmaintained for a long time. A warning has been shown for some time on languagetool.org and in the stand-alone GUI for these languages. This warning has now been extended to Java in the form of a deprecation, i.e. the constructors of the following languages have been deprecated. That does not mean they are going to be removed in the next version, but it's a warning that we cannot offer support for them or guarantee they will be included in the future:
- Belarusian
- Swedish
- Icelandic
- Tagalog
- Asturian
- Danish
- Slovenian
If you're interested in contributing to one of these languages, please post to our forum at http://forum.languagetool.org.
-
The uppercase sentence start rule (id
UPPERCASE_SENTENCE_START
) now ignores immunized tokens - this way users can add lowercase words todisambiguation.xml
so the rule won't complain about these lowercase words at the beginning of a sentence.
- Added a
--json
option as an alternative to--api
(deprecated XML output) See https://languagetool.org/http-api/swagger-ui/#/default for a documentation of the new API.
- Apache commons-lang has been updated from 2.6 to commons-lang3 3.5
- Updated lucene-gosen-ipadic to 6.2.1 (#376)
- added and improved rules
- added words suggested by users
- added and improved rules
- added about 50 confusion pairs like talking/taking (works only with ngram data, see http://wiki.languagetool.org/finding-errors-using-n-gram-data)
- added category
MISUSED_TERMS_EU_PUBLICATIONS
- updated en_GB spellchecker dictionary from https://github.com/marcoagpinto/aoo-mozilla-en-dict
- added and improved rules
- added and improved rules
- added rules
- fixed several false alarms
- added and improved rules
- added and improved rules
- added rules
- now possible checking the texts with the signs of stress
- added and improved many new grammar and style rules
- added words suggested by users
- improved disambiguation rules
- for review, test and improve rules, feedback in bugtracker thanks to Konstantin Ladutenko
- added and improved rules
- added ~6k new words
- added many new grammar and styling rules
- added many new barbarism replacement suggestions
- improved dynamic word tagging
- Bugfix: avoid repeating the same suggestion
- Enhancement: ignore e-mail addresses
Rule.getCorrectExamples()
now returns a list ofCorrectExample
s instead of a list ofString
s.
- speed up for long texts with many errors (#530)
- add new menu item for showing/hiding the result area
- Deprecated the
--api
option - we recommend using LanguageTool in server mode (JSON API), which is faster as it has no start up overhead for each call. See https://languagetool.org/http-api/swagger-ui/#/default for a documentation of the new API.
- added and improved rules
- added words suggested by users
- added about 33 confusion pairs such as throe/throw, raps/wraps (works only with ngram data, see http://wiki.languagetool.org/finding-errors-using-n-gram-data)
- upgraded dictionaries to Dicollecte-5.6
- added 32 confusion pairs like pris/prix, quand/quant (works only with ngram data, see http://wiki.languagetool.org/finding-errors-using-n-gram-data)
- added some rules
- improved handling of hyphenated compound words
- added some rules
- added and improved rules
- removed some false alarms
- added and improved rules
- added 14 confusion pairs like tubo/tuvo, ciento/siento (works only with ngram data, see http://wiki.languagetool.org/finding-errors-using-n-gram-data)
- upgraded Hunspell dictionary to 2.1
- rebuilt spellchecker dictionary
- added words suggested by users
- added and improved rules
- big dictionary update (thousands of new words and many fixes)
- compound tagger improvements
- several new rules and many improvements to existing ones
- new token inflection agreement rule (still work-in-progress so turned off by default)
- new replacement suggestions for barbarisms
- some formerly deprecated code has been removed
- all rules now have a category ("Misc" if the rule doesn't specify a category)
- a new module
languagetool-http-client
has been added with a classRemoteLanguageTool
that you can use to query a remote LanguageTool server via HTTP or HTTPS - removed the public modifier from
LanguageComboBox
- The existing HTTP/HTTPS API will be replaced by a new one
that returns JSON. This version of LanguageTool supports
both APIs. The new API is prefixed with
/v2/
. It is documented at https://languagetool.org/http-api/swagger-ui/#/default. Please do not use the old XML-based HTTP API anymore. Information about migrating from the old to the new API can be found at https://languagetool.org/http-api/migration.php - Changed behaviour for OutOfMemory situations: the server process now stops instead of being in an unstable state
- Missing parameters (like
text
) now cause a400 Bad Request
response (it used to produce500 Internal Server Error
) - New parameter
preferredVariants
to specify which variant is preferred when the language is auto-detected: Example:language=auto&preferredVariants=en-GB,de-AT
- if English text is detected, British English will be used, if German text is detected, German (Austria) will be used. - Code refactorings: methods have been removed without being deprecated first,
e.g. in
LanguageToolHttpHandler
- groups of rules and categories are now required to have non-empty names to avoid user confusion
- detect encoding of files with BOM header
- add new menu to open recent files
- add new configuration option to allow user to select the GUI language
- preserve GUI state between program restarts
- detect encoding of files with BOM header when there is no
encoding
parameter
- small rule improvements
- added and improved rules
- added words suggested by users
- minor change in the format of the binary dictionary: POS tag and frequency data are no longer separated by a separator character.
- small rule improvements and URL updates, thanks to Koen Vervloesem
- added and improved rules, improved categorization of rules
- added checks on date ranges
- added about 215 confusion pairs like best/bets, wand/want (works only with ngram data, see http://wiki.languagetool.org/finding-errors-using-n-gram-data)
- improved several rules
- added and improved rules
- added and improved rules, improved categorization of rules
- updated Hunspell dictionary to version 2015.12.28 (http://extensions.libreoffice.org/extension-center/german-de-de-frami-dictionaries etc.)
- added Spanish false friends
- better suggestions for some errors that involve compounds
- new rule for checking correct spell of ordinal numerals
- added new XML rules
- added and improved a large number of rules, largely improved disambiguation
- upgraded the tagging and synthesis dictionaries to Morfologik Polimorf 2.1
- improved tokenization of number ranges (such as 1-1234 or 1--10)
- added checks on date ranges
- added and improved rules, improved categorization of rules
- added and improved rules, improved categorization of rules
- added words suggested by users
- added German false friends
- big dictionary update:
- more than 202K lemmas
- homonyms have been properly split
- vocative case for inanimates has be added
- list of barbarism has been updated
- improved some rules
- improved sentence tokenization
- improved dynamic tagging for compounds
- some improvements for disambiguation
- some formerly deprecated code has been removed
- added
acceptPhrases(List<String> phrases)
toSpellingCheckRule
so you can avoid false alarms on names and technical terms that consist of more than one word.
- Speed up for input with short sentences
- Added new parameters
enabledCategories
anddisabledCategories
that take a comma-separated list of categories to enable/disable. Fixes #326. - The output now contains a
shortmsg
attribute if available, which is a short version of themsg
attribute. - The output now contains a
categoryid
attribute if available. It's supposed not to change in future versions (whilecategory
might change).
- new parameters
--enablecategories
and--disablecategories
to activate/deactivate all rules in a category (#66) - Bugfix: for files >= 64,000 bytes, the position information
(
fromx
andtox
) could be wrong. Also, rules that work across paragraphs like the German word coherency rule wouldn't work. Both bugs have been fixed but with the side-effect that large files will now be loaded into memory completely. If you're using LanguageTool on large files (several MB) you might need to split these files now before you check them. If you need the old behavior, use the--line-by-line
switch. #254
- Indexing: fixed an
IllegalArgumentException
for long sentences (#364)
- Fixed a bug while sentence and paragraph end tags were removed during disambiguation.
- Fixed a bug with a possible
NullPointerException
for tokens containing soft hyphens that might be disambiguated.
- Updated Morfologik library to version 2.1.0. The tools for building dictionaries (languagetool-tools) have been adapted to the new version. The format of the dictionaries has not changed, except for a minor change only in Catalan.
- LanguageTool requires Java 8 now
- new spellchecker dictionary. This dictionary is based on dict-be-official-2008-20140108.oxt from http://bnkorpus.info/download.html
- fixed false alarms
- added new rules
- added words suggested by users
- updated hunspell dictionary to Version 2.3 (2015-11-15):
- Corrections made regarding new spelling of 2012
- General cleanup
- A lot of compound flags added
- fixed bug where Hunspell flags wrongly was in the tagger-dictionary. For example:
vintrenes+F+sub:bes:plu:utr:gen/115,70,85,976,941,947
vinåndstermometrenes+F+sub:bes:plu:neu:gen/70,118,85,976
- added new tags
- updated and made adjustment for the new things introduced by the new spelling of 2012 and Hunspell-da 2.3
- added/improved several rules
- added more than 150 confusion pairs like shall/shell, sheer/shear (works only with ngram data, see http://wiki.languagetool.org/finding-errors-using-n-gram-data)
- added
en/removed.txt
so incorrect readings of the POS tagger can be avoided without rebuilding the binary dictionary (#306)
- added/improved several rules
- upgraded dictionaries to Dicollecte-5.5
- added/improved several rules
- added/improved a few rules
- improved agreement rule to detect errors like
Ich gebe dir ein kleine Kaninchen.
where the determiner is indefinite but the adjective fits only for a definite determiner - added
de/removed.txt
so incorrect readings of the POS tagger can be avoided without rebuilding the binary dictionary
- added an agreement rule
- added/improved several rules
- added/improved several rules
- added words suggested by users to spellchecker dictionary
- big dictionary update: more than 10k new words, many fixes (the dictionary source is now available at https://github.com/arysin/dict_uk)
- many new rules
- improvements for euphony rules
- improvements in dynamic compound tagger
- new disambiguation rules
- New rule syntax
<regexp>...<regexp>
as a simple alternative to<pattern><token>...</token></pattern>
. Note that this is limited: E.g. it's not possible to address POS tags and the<suggestion>
cannot change the case of the match. Available attributes:type
with valuesmart
(treats space in the regular expression as\s+
or a non-breaking space) orexact
(smart
is the default),mark
to specify which part of the match gets underlined (everything by default, use1
to only underline the first group etc.) - Non-breaking spaces (
\u00A0
) are now treated like regular spaces. Before, using a non-breaking space could cause a rule not to match. <filter>
can now also be used indisambiguation.xml
- Speed up for testing short sentences for de-DE, de-AT, and de-CH
GeneralCatalan
has been removed, useCatalan
insteadSuggestionExtractorTool
andSuggestionExtractor
have been removedConfusionProbabilityRule
has been moved to packageorg.languagetool.rules.ngrams
ConfusionProbabilityRule.getWordTokenizer()
is now calledConfusionProbabilityRule.getGoogleStyleWordTokenizer()
RuleAsXmlSerializer
has been renamed toRuleMatchAsXmlSerializer
- some formerly deprecated code has been removed
- some code has been deprecated
StringTools.isWhitespace()
now returnstrue
for a token that is a non-breaking space or a narrow non-breaking spaceRuleFilter
is not an interface anymore but an abstract class- the
LanguageModel
interface has been redesigned, seeBaseLanguageModel
for a class similar to the previous implementation - Class
BerkeleyLanguageModel
was added to support BerkeleyLM language models. See https://github.com/adampauls/berkeleylm for the software and e.g. http://tomato.banatao.berkeley.edu:8080/berkeleylm_binaries/ for pre-built models. To use the new models your language class needs to overwrite thegetLanguageModel(File)
method. For now, we recommend you continue using the Lucene-based models at http://languagetool.org/download/ngram-data/.
- fix: disabling rules that are disabled by default and had been enabled didn't work
- updated segment library to 2.0.0 (https://github.com/loomchild/segment)
- added new rules
- fixed false alarms
- added words suggested by users
- added and improved a few rules
- added several pairs of easily confused words - active only with ngram data (see http://wiki.languagetool.org/finding-errors-using-n-gram-data)
- upgraded Hunspell dictionary to Dicollecte-5.4.1
- upgraded POS tag and Synthesizer dictionaries to Dicollecte-5.4
- added/improved several rules
- new filter to be used for matching the part-of-speech of parts of words, e.g.:
<filter class="org.languagetool.rules.fr.FrenchPartialPosTagFilter"
args="no:1 regexp:(.*)-tu postag_regexp:V.*(ind|con|sub).*2\ss negate_pos:yes"/>
- added and improved several rules
- added a rule to detect word confusion by using ngram data, so far it has only a few word pairs (see http://wiki.languagetool.org/finding-errors-using-n-gram-data),
- major rule update with 700+ new rules, thanks to Shugyousha
- added some compound prepositions to avoid false alarms (thanks to Sławek Borewicz)
- added/improved several rules
- added and improved a few rules
- added a few false friends rules (Russian/English)
- significant dictionary update (fixes, lot of new adjectives and last names)
- fix: the ngram directory that turns on the confusion rule (see http://wiki.languagetool.org/finding-errors-using-big-data) was ignored in LibreOffice and OpenOffice
- Chinese, French, Italian, Russian, and Spanish have been prepared to
use ngram data, that means they have a
confusion_sets.txt
file where word pairs can be added. See http://wiki.languagetool.org/finding-errors-using-n-gram-data for information on where to download the ngram data. - if a directory with ngram data for the confusion rule is specified,
this directory is now expected to have at least one sub directory
en
orde
with the1grams
,2grams
, and3grams
directories (also see http://wiki.languagetool.org/finding-errors-using-n-gram-data)
- new property file key
rulesFile
to use a.languagetool.cfg
file to configure which options should be enabled/disabled in a server (#281)
- several deprecated methods and classes have been removed
- Rules can now overwrite
getAntiPatterns()
with patterns to be ignored. See the javadoc for details of what needs to be considered to make this work. Seeorg.languagetool.rules.de.CaseRule
for an example.
- updated to Lucene 5.2.1
- updated to Apache OpenNLP 1.6.0
- updated FSA spelling dictionary from An Drouizig Breton Spellchecker 0.13
- updated POS dictionary from Apertium (svn r61079)
- added new rules
- fixed false alarms
- added words suggested by users
- added a few new rules
- ConfusionProbabilityRule (only enabled with the
--languagemodel
option) has been rewritten andhomophones.txt
has been renamed toconfusion_sets.txt
and now only has few items enabled by default, the rest is commented out to improve quality (less false alarms). Also see http://wiki.languagetool.org/finding-errors-using-big-data
- fixed some false alarms
- updated to jwordsplitter 4.1 for better compound splitting
- the spell checker offers correct suggestions now for incorrect past tense forms like "gehte" -> "ging" (useful mostly for non-native speakers)
- added word frequency information to improve spelling suggestions (but this won't help for compounds which are not in the dictionary)
- added new rules
- fixed dozens of false alarms
- added/improved several rules (started adding morphologic rules)
- improved rules
- updated spellchecker
- dictionary update and several new rules
- big dictionary update (thousands of new words, new tagging for pronouns)
- improved sentence and word tokenization
- improved tokenization and tagging of lowercase abbreviations
- new grammar and styling rules
- new spelling rules, especially for lowercase abbreviations with dots
- improved compound word tagging
- improved some rules coverage
- many new barbarism replacement suggestions
UppercaseSentenceStartRule
didn't properly reset its state so that different errors could be found when e.g.JLanguageTool.check()
got called twice with the same text.Authenticator.setDefault()
is now only called if it's allowed by the Java security manager. In rare cases, this might affect using external XML rule files as documented at http://wiki.languagetool.org/tips-and-tricks#toc9 (Github issue #255)
- fixed auto-detection of text language, which didn't work after editing text
- a directory with ngram data for the confusion rule can now be specified in the configuration dialog (English only for now), see http://wiki.languagetool.org/finding-errors-using-big-data
- performance improvements for checking small texts
for the use case that creates a new
JLanguageTool
object for every check, as done by the embedded server (or multithreaded LT users in general)
- Fixed an error with the
--api
option that printed invalid XML for large documents or when the input was STDIN (Github issue #251) - Print some information to STDERR instead of STDOUT so the
--api
option makes more sense
- added
MultiThreadedJLanguageTool.shutdown()
to clean up the thread pool - several deprecated methods and classes have been removed, e.g.
Language.REAL_LANGUAGES
is nowLanguages.get()
Language.LANGUAGES
is nowLanguages.getWithDemoLanguage()
- but you will probably want to useLanguages.get()
- Other static methods from class
Language
have also been moved toLanguages
Language.addExternalRuleFile()
andLanguage.getExternalRuleFiles()
have been removed. To add rules, load them withPatternRuleLoader
and callJLanguageTool.addRule()
.getAllRules()
,getAllActiveRules()
, andgetPatternRulesByIdAndSubId()
in classJLanguageTool
used to callreset()
for all rules. This is not the case anymore.reset()
is now called when one of thecheck()
methods is called. This shouldn't make a difference for all common use-cases.Language.setName()
has been removed. If you need to set the name, overwrite thegetName()
method instead.Rule.getCorrectExamples()/getIncorrectExamples()
,PatternToken.getOrGroup()/getAndGroup()
andRuleMatch.getSuggestedReplacements()
now return an unmodifiable listAbstractSimpleReplaceRule.getFileName()
andAbstractWordCoherencyRule.getFileName()
have been removed, the sub classes are now themselves responsible for loading their data- Sub classes of
AbstractCompoundRule
are now responsible for loading the compound data themselves usingCompoundRuleData
AbstractCompoundRule.setShort(String)
has been removed and added as a constructor parameter instead.
- updated to language-detector 0.5
- fix
osl::Thread::Create failed
error message, see https://bugs.documentfoundation.org/show_bug.cgi?id=90740
See CHANGES.txt for changes before 2.9.1.