-
Notifications
You must be signed in to change notification settings - Fork 1.4k
/
Copy pathCHANGES.txt
2848 lines (2146 loc) · 109 KB
/
CHANGES.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
LanguageTool Change Log
3.0
*** See CHANGES.md for all changes after LT 2.9: ***
*** https://github.com/languagetool-org/languagetool/blob/master/languagetool-standalone/CHANGES.md ***
2.9 (2015-03-30)
-Catalan:
-updated POS tag dictionary
-added new rules
-fixed false alarms
-English:
-Added a few rules and fixed a few false alarms
-Added many new style rules contributed by Heikki Lehvaslaiho. As these
may cause false alarms, they are not activated by default. You can
activate them by turning on all rules in the new 'Plain English' category.
-Esperanto:
-added a few new rules
-French
-updated POS tag dictionary and Hunspell dictionary to Dicollecte-5.3
-German:
-added a few new rules and fixed false alarms
-Added a new rule that checks for subject verb agreement. For now, only cases
with 'ist', 'sind', 'war', and 'waren' are supported. Example for errors that
are detected: 'Der Hund sind schön.', 'Die Autos ist schnell.'
To make this rule work, phrases are now unified in disambiguation.xml: for
example, 'Mann' in the phrase 'ein Mann' will only retain its nominative
reading (SUB:NOM:SIN:MAS), whereas it used to have also accusative and
dative readings (SUB:AKK:SIN:MAS, SUB:DAT:SIN:MAS).
(https://github.com/languagetool-org/languagetool/issues/233)
-Italian:
-improved a few rules
-Polish:
-added several new rules
-Portuguese:
-added/improved several rules
-3695 compound words (pre-reform) - the largest free database
-Russian:
-added and improved rules
-Ukrainian:
-big dictionary update
-new grammar rules
-new simple replace rule for soft suggestions
-disambiguator improvements
-compound tagging and spelling improvements
-initials tagging improvements
-sentence and word tokenizing improvements
-improved handling of stres symbol and soft hyphen
-Bitext rules:
-added a simple rule for checking whether translations end with the same punctuation
mark as the original (this includes only .?! characters).
-it is now possible to add external bitext rule files on the command line, by using
--bitextrule <FILE> option. The file path has to be absolute. Note: this allows
using bitext rules also for languages that have no bitext rules included by
default.
-Spelling:
-The new files at <languageCode>/hunspell/spelling.txt can be used to add
accepted words to the spell checker that are also considered when creating
suggestions for misspelled words.
This is similar to the <languageCode>/hunspell/ignore.txt files, which list
accepted words which are *not* used when creating suggestions for
misspelled words.
-API:
-JLanguageTool.activateDefaultPatternRules() and
JLanguageTool.activateDefaultFalseFriendRules() have been removed - all
pattern rules and false friend rules (if a second language is specified)
are now activated automatically when the constructor of JLanguageTool is called.
Should you need a checker without the XML-based pattern rules, extend your
language class (e.g. 'English') with one that overwrites the getPatternRules()
method and returns an empty list there.
-ManualTagger.lookup() has been replaced by ManualTagger.tag() after being
deprecated since the latest release
-All static methods and fields from class 'Language' have been moved to the new
class 'Languages'. For now, the methods/fields in class Language still exist
but have been deprecated.
-LanguageIdentifierTools has been removed. Use LanguageIdentifier instead.
-Removed (Default)ResourceDataBroker.setResourceDir() and setRulesDir()
as these can be set with the constructor
-Cleaned up up class Contributor, e.g. removing getRemark()
-Category.setDefaultOff() has been removed, this can be set via constructor now
-Renamed classes:
o.lt.rules.patterns.Element => o.lt.rules.patterns.PatternToken
o.lt.rules.patterns.ElementMatcher => o.lt.rules.patterns.PatternTokenMatcher
-Other small API cleanups that shouldn't affect the common use cases,
e.g. IncorrectExample.getCorrections() returns and unmodifiable list now,
removal of deprecated methods.
-Embedded server:
-XML escaping has been fixed, this could cause invalid XML documents
to be returned
-new config file option 'maxWorkQueueSize' that lets you set the maximum
size of the request queue - if it gets larger than this, requests will
be rejected (503 Service unavailable)
-The server now responds with more specific HTTP status codes to these
error conditions:
413 Request Entity Too Large - if text exceeds maximum text size
503 Service Unavailable - if check exceeds maximum check time
-GUI:
-The stand-alone GUI can now take a plain text file as an argument, this
file will then be loaded on startup (Github issue #232).
-Command-line:
-It is now possible to add an external rule file when calling LanguageTool from the
command line. Use --rulefile <file> to add a file. If the file name has a format
that contains a language name, it will be used alongside other rules; otherwise,
it will replace the rules. You can also load an external file with false friends by
using the option --falsefriends <file>. The file name should be an absolute file
path, and false friend files are always added to the ones that are loaded for the
language. (Github issue #192)
-Rule syntax:
-A rule may now have a single example sentence as long as it has a 'correction'
attribute - this can save some redundancy if the only correct sentence is the
same as the incorrect sentence with the correction applied. Before, a rule
needed at least two example sentences.
-'example' element: type="incorrect" is now optional if there's a 'correction'
attribute. The 'correction' attribute implies that the sentence is incorrect.
-'example' element: type="correct" is now optional. No 'type' attribute and
no 'correction' attribute implies that the sentence is correct.
-Internal:
-We have switched from Apache Tika to language-detector
(https://github.com/optimaize/language-detector) for automatically
identifying the text language. It should be faster and results
should be more reliable.
Detection of Asturian and Galician had to be disabled because the
detection quality was too low and also affected detection of Spanish.
-Fixed a regression that made it impossible to load external rule files in the GUI.
2.8 (2014-12-30)
-Asturian:
-removed dependency on Hunspell, now uses Morfologik for spell checking
-Breton:
-added and improved a few rules
-Catalan:
-updated dictionary
-added and improved rules
-fixed false alarms
-Dutch:
-added and improved many rules
-English:
-some new rules (thanks to Nick Hough)
-updated the tagger and synthesizer dictionaries, fixing issue #202
-new filter to be used for matching the part-of-speech of parts of words, e.g.:
<pattern>
<token regexp="yes">in.*</token>
</pattern>
<filter class="org.languagetool.rules.en.EnglishPartialPosTagFilter"
args="no:1 regexp:in(.*) postag_regexp:JJ"/>
This will only keep matches for words that start with 'in' and where the
part after the 'in' is an adjective (POS tag 'JJ'). The 'no:1' is the token
position, i.e. here the first (and only) matching <token> is referred to.
-French:
-added and improved a few rules
-German:
-added and improved a few rules
-Polish:
-added and improved several rules
-added and improved false friends with English
-Portuguese:
-added/improved several rules
-Spanish:
-removed dependency on Hunspell, now uses Morfologik for spell checking
-Reformatted rules file
-Added more rules
-Tagalog:
-removed dependency on Hunspell, now uses Morfologik for spell checking
-the dash character ("-") is a delimiter now when tokenizing the text
-Russian:
-added and improved rules
-added a few false friend rules (Russian/English)
-Ukrainian:
-many new rules (including agreement with nouns, time expressions etc)
-rule coverage improvement
-dictionary update (big improvements for proper nouns and vocative case)
-new tag and rule to warn about alternative spelling
-added word frequency information to improve spelling suggestions
-some new disambiguator rules
-Rule Syntax:
-<short>...</short> can now be added to a rulegroup to affect all the rules
of that group
-If you develop your own rules that are not part of LT you can now add external="yes"
to your categories to prevent the rule link to community.languagetool.org from
appearing in our stand-alone GUI (the link would not work for rules that are not part
of the main distribution of LT). (Github issue #223)
-If a rule group specifies default="off", the rules inside the rule group
may not also specify default="on"/"off".
-API:
-Removed classes and methods that had been deprecated since 2.7 or longer
-Embedded server:
-The config file options 'requestLimit' and 'requestLimitPeriodInSeconds' can
now also be used for the HTTP server (not just for the HTTPS server)
-New config file option 'trustXForwardForHeader': set this to 'true' if you
run the server behind a reverse proxy and want the request limit to
work on the original IP addresses provided by the 'X-forwarded-for' HTTP header,
usually set by the proxy. If you run behind a proxy but don't set this
property to true, one user can use all the requests so other users will
also get an error message because of the request limit.
-Fix response of After the Deadline mode: <description>...</description> was
sometimes empty, confusing the text check in WordPress
-Bitext rules were not disabled properly, even if they were specified with a
proper parameter for the server; now it's fixed
-Fixed problem with improper positions for some bitext rules (issue #218)
-GUI:
-A new 'errorColors' setting has been added to the languagetool.cfg configuration
file. It can be used to set the background color of errors. For example,
errorColors=typographical:#b8b8ff, style:#ffb8b8
will show 'typographical' errors with a blue background and 'style' errors
with a red background in the upper part of the LT window. 'typographical' and
'style' are the types that are set in grammar.xml as "type=...".
There's no user interface yet to configure these colors. Note that you should
only edit the languagetool.cfg file when LT is not running.
Internal:
-Bugfix: rules inside a rule group had not been activated if a previous
rule from the same rulegroup used default="off"
-Words are not ignored anymore by the spell checker just because they occur in
a rule's suggestion. If you want the spell checker to ignore words globally, add
them to hunspell/ignore.txt. To ignore them depending on the context, add a
'ignore_spelling' rule to disambiguation.xml.
-A file 'hunspell/prohibit.txt' can now be used to mark words as spelling errors
even if the spell checker would normally accept them. This is useful to improve
the LanguageTool spell checker without waiting for the upstream checker to be
updated.
The 'prohibit.txt' file is the opposite of 'ignore.txt', which causes the
spell checker to ignore words.
-The part-of-speech tagger for most languages can now be extended by adding entries
to the file org/languagetool/resource/XX/added.txt (XX being the language code).
The format is "fullform baseform postag", three columns separated by tabs.
This makes it easier for users (and developers) to extend the POS tagger, as they
don't need to export, modify, and re-create the binary dictionary for every change.
2.7 (2014-09-29)
-Breton:
-added and improved rules
-New rule that checks if a weekday matches a date, e.g. detects
"Gwener 28 a viz Eost 2014", as that date isn't a Friday.
-Catalan:
-added and improved rules
-fixed false alarms
-updated dictionary
-Dutch:
-added and improved many rules
-switched to Morfologik-based spell checker
-English:
-Do you want to be part of the team that develops the world's most powerful
Open Source proofreading tool? We're looking for a maintainer for the English
rules in LanguageTool. See http://wiki.languagetool.org/tasks-for-language-maintainers
for details.
-All English dictionaries have been extended to contain word frequency classes
to improve the spell checker suggestions (the frequency data is taken from
https://github.com/mozilla-b2g/gaia/tree/master/apps/keyboard/js/imes/latin/dictionaries,
as for other languages that already use this feature).
-Better suggestions for English learners: irregular verbs, nouns, and adjectives now
usually have a better suggestion. For example, 'thinked' suggests 'thought',
'womans' suggests 'women'.
-More misspellings provide suggestions now, e.g. 'garentee' (guarantee),
'greatful' (grateful). This may cause a performance decrease of ~ 10% (more for
texts with a lot of unknown words).
-New rule that checks if a weekday matches a date, e.g. detects
"Monday, 7 October 2014", as that date isn't a Monday. This rule will only
work if it detects the date format in use. So far, these formats are supported:
* "Monday, 7 October 2014"
* "Monday, 7 Oct 2014"
* "Monday, October 7, 2014"
* "Monday, Oct 7, 2014"
* (this also works with abbreviated week days like Mo or Mon for Monday)
-Esperanto:
-New rule that checks if a weekday matches a date, e.g. detects
"Vendredon la 28-an de Aŭgusto 2014", as that date isn't a Friday.
-French:
-updated POS tag dictionary and Hunspell dictionary to Dicollecte-5.2
-added a synthesizer - the agreement rule can now make suggestions
for some errors
-added/improved several rules
-New rule that checks if a weekday matches a date, e.g. detects
"vendredi 28/08/2014", as that date isn't a Friday.
-German:
-Fixed a rare NullPointerException and an ArrayIndexOutOfBoundsException
-Fixed several false alarms
-Added and improved rules
-New rule that checks for sentences without a verb (turned off by default due to
the risk of false alarms)
-New rule that checks if a weekday matches a date, e.g. detects
"Dienstag, 29.9.2014", as that date isn't a Tuesday.
-Performance improvements for spell check suggestions
-Persian:
-added initial support for Persian (Farsi)
-Polish:
-added and improved some rules
-new rule that checks if a weekday matches a date
-Portuguese:
-added/improved several rules
-added many dozens of compound words
-Russian:
-added new rules
-fix SourceForge feature request #38 (check for different quotation marks)
-added a few false friend rules (Russian/English)
-new rule that checks if a weekday matches a date, e.g. detects
"понедельник, 30 сентября 2014 г.", as that date isn't a Monday.
-expanded Russian compound rule with new words from postag dictionary
-Spanish:
-Added new POS category Z (for spelled numbers, e.g. 'uno', 'dos', ...)
-Spelled numbers can now be detected and managed both in disambiguation and rules.
-Fixed some incorrect lemmas in POS dictionary.
-Added Hybrid chunker-disambiguator.
-Tamil:
-Added initial support for Tamil. If the font for Tamil is not properly
displayed on your computer and you're using Windows, you might need
to apply the work around described here:
https://bugs.openjdk.java.net/browse/JDK-8008572
-Ukrainian:
-big update for POS dictionary (fixes and new words)
-some POS tag renamed for consistency; new tags for abbreviations and rare words
-many new rules and fixes for existing rules
-new rule that checks if a weekday matches a date, e.g. detects
"понеділок, 7 жов 2014", as that date isn't a Monday
-token normalization performance improvement
-LibreOffice integration:
-Don't get confused by footnotes in LibreOffice 4.3 and later (it now provides us with
the footnote positions as meta data, so we can ignore them).
-API:
-Major performance improvements for the multi-thread use case, where JLanguageTool
gets created per thread, but the language object (e.g. 'German') gets
created only once. Overhead for creating JLanguageTool should now be much lower.
-Removed several classes and methods that had been deprecated since version 2.6
-Removed DutchSpellerRule - use MorfologikDutchSpellerRule instead
-The signature of Language.getRelevantRules() has changed
-The JLanguageTool and MultiThreadedJLanguageTool constructors don't declare to
throw an IOException anymore
-WhitespaceRule has been renamed to MultipleWhitespaceRule (WhitespaceRule
still exists but has been deprecated)
-Deprecated some methods whose visibility will be decreased (e.g. from public
to protected)
-MorfologikSpellerRule.getRuleMatch(String, int) has been renamed to
MorfologikSpellerRule.getRuleMatches(String, int)
-The RuleMatch constructor now throws an exception if toPosition is not
larger than fromPosition
-Introduced a new abstract class TextLevelRule that extends Rule and that
can be used for rules that cover more than single sentences.
-Command line:
-Enabling and disabling specific rules at the same time is now allowed.
In order to test only some rules (disabling all the rest), which previously was done
with "--enable LIST_OF_RULES", now use "--enabledonly --enable LIST_OF_RULES"
(or "-eo -e LIST_OF_RULES").
-Embedded server:
-Two new options can be set in the properties file to make LanguageTool
return the same XML format as After the Deadline (AtD). This way it can be used
as a drop-in replacement for AtD:
* mode - 'LanguageTool' or 'AfterTheDeadline'
* afterTheDeadlineLanguage - code of default language if mode is set to 'AfterTheDeadline'
NOTE: the 'AfterTheDeadline' mode should be considered experimental for now.
-The new option 'maxCheckThreads' allows setting the maximum number of threads working
on requests in parallel. The default is 10, as it used to be.
-Internals:
-New abstract rule AbstractDateCheckFilter that allows to check if a week day and date
match. For example "Tuesday, September 29, 2014" could be detected, as September 29, 2014 is
not actually a Tuesday. This uses the new experimental RuleFilter interface that can be called
from XML with the new 'filter' element. 'filter' takes these attributes:
'class': the fully-qualified name of a Java class that implements RuleFilter, e.g.
"org.languagetool.rules.de.DateCheckFilter"
'args': a string like "year:\1 month:\2 day:\3 weekDay:\4", i.e. a space-separated list of
key/value pairs, where \x gets resolved to the pattern's token value (as in
the 'message' element)
-The compound rule now ignores tokens that have been immunized in the disambiguation.xml
-The "filter" action in the disambiguator is now applied only to POS tags that match the POS
tag given. If they don't match, the rule is not applied.
-If you're extending the XML rules as described at http://wiki.languagetool.org/tips-and-tricks#toc2,
the external rule and disambiguation files can now be hosted on a password-protected server
by specifying an URL like this: http://user:[email protected]/path/user-rules.xml
-The em dash ("—") is now a tokenizing character for all languages
-New feature: Use of language models
LanguageTool can now make use of ngram data. ngram data is information about
how often phrases occur in a language. Currently, this uses phrases of length 3.
The data is used by an English rule to find homophone errors, like mixing up
coarse/course or flair/flare. LanguageTool had some rules of this kind before, but
the new rule now supports about 900 of such word pairs/sets.
The data needed for this is huge (7GB for English) and thus not part or LanguageTool.
The data (English only for now) and more documentation is available at
http://wiki.languagetool.org/finding-errors-using-big-data
Using ngrams makes LanguageTool slightly slower when the data is stored on an SSD.
If not stored on an SSD, the performance might drastically decrease.
Use the new --languagemodel option with the command line client to activate the rule
that uses the data. That option is not yet available for the stand-alone GUI.
2.6 (2014-06-30)
-Breton:
-updated FSA spelling dictionary from An Drouizig Breton Spellchecker 0.12
-updated POS dictionary from Apertium (svn r53329)
-improved several rules
-Catalan:
-fixed false alarms
-added and improved rules
-updated tagger dictionary
-Morfologik spellchecking rule is enabled for use in LibreOffice/OpenOffice
extension. The Hunspell spellchecker should be manually disabled in
LibreOffice/OpenOffice for the results to be visible.
-English:
-The spelling rule now accepts words with hyphens if the parts are valid words.
For example, "web-based" is accepted now. This avoids a lot of false alarms.
-fixed a thread-safety problem in the synthesizer
-added/improved several rules
-Esperanto:
-added several rules
-French:
-updated POS tag dictionary and Hunspell dictionary to Dicollecte-5.1
-added/improved several rules
-German:
-fixed false alarm for words like "Stil-" in phrases like "Stil- und
Grammatikprüfung" (issue #93)
-added/improved several rules
-detect wrong uppercase spelling in sentences like "Die Blaue Tür"
-Greek:
-added two new punctuation rules
-Japanese:
-added some rules, thanks to github user Shugyousha
-Polish:
-added/improved many rules
-improved disambiguation
-improved spell-checking of foreign words with apostrophes
-Portuguese:
-added/improved several rules
-a total of 3534 compound words
-Russian:
-added big wordlist to spellcheck dictionary. Thanks for wordlist to Dmitri Gabinski
-fixed wrong tag in tagger dictionary: ADJ_Comp --> ADJ_Sup, ADJ_S --> ADJ_Com
-added/improved several rules
-Spanish:
-added and improved several rules
-Asturian, Italian, Lithuanian, Malayalam, Swedish, and Tagalog have been switched
to an SRX-based sentence tokenizer implementation.
-Wikipedia:
-The deprecated options check-dump and wiki-index have been removed
from org.languagetool.dev.wikipedia.Main. Please use check-data
and index-data instead.
-API:
-Almost all deprecated methods and classes have been removed. If you're upgrading
from a version earlier than 2.5, we recommend to upgrade to 2.5 first, fix all
deprecation warnings, and then upgrade to 2.6.
-If you extend the 'Language' class and don't implement the getSentenceTokenizer()
method, your language now uses SimpleSentenceTokenizer. This is a very simple
tokenizer and you probably want to implement getSentenceTokenizer() to return
a LocalSRXSentenceTokenizer instead that's adapted to your needs.
-Rule.supportsLanguage() now also works for PatternRules (i.e. rules loaded from
XML files). It used to always return 'false' for those rules.
-The public field Language.DEMO has been removed. It is now only available
internally for tests, together with the demo language itself and DemoChunker.
-JLanguageTool.printIfVerbose() has been changed from protected to private,
as it was not used anywhere and it's not really useful to extend JLanguageTool.
-JLanguageTool.getAllActiveRules() has been fixed to not return rules that have
the default="off" attribute (unless they have been enabled explicitly)
-StringTools.isAlphabetic() has been removed as it was a workaround for Java 6
which is not supported anymore
-ContractionSpellingRule.isDictionaryBasedSpellingRule() now returns false
-Specific rules can be enabled for use in LibreOffice/OpenOffice extension with
the new useInOffice() method. This is done, for example, for enabling Catalan
Morfologik spelling rule.
-Embedded server:
-The HTTP server now also accepts a --config option so that the maximum request size
can be limited with the 'maxTextLength' parameter (this used to be working only
for the HTTPS server)
-The HTTP and HTTPS servers now accept a new option 'maxCheckTimeMillis' in the
property configuration file to specify a maximum duration of a single check.
Checks that take longer (e.g. because generating spell corrections is slow for
some languages) will stop with an exception.
-GUI:
-Improved configuration dialog
-Allow the user to change the font of the main editing area
-Rule syntax:
-It is now possible to specify case sensitivity of individual exceptions
and tokens in a pattern (both in grammar and disambiguation XML rules).
Simply use case_sensitive="yes" or case_sensitive="no".
-The 'url' element can now be added to the 'rulegroup' so that all rules
of a rule group share the same URL.
-Internals:
-Fixed bug with longer complex strings of tokens to be unified
-Fixed bug in a disambiguator where tokens with min="0" were mislocated
2.5 (2014-03-31)
-Breton:
-a new rule checks that there is a space character between sentences
-Catalan:
-added/improved many rules
-fixed false alarms
-added hundreds of suggestions for barbarisms in a simple replace rule
-Dutch:
-remove a hack from the word tokenizer code
-several new false friends (by Sander van Geloven)
-English:
-added a new rule to handle corrections of mistakes in standard
English contractions (wasnt, didnt)
-changed word tokenization so that the hyphen at the end of a
word is no longer its part
-removed spelling replacement pairs that made LT work very slowly
-fixed a large number of false alarms
-added new rules to handle common contextual misspellings and
redundant phrases as well as common grammatical mistakes. Now
the number of rules for English exceeds 1000.
-made it possible to use synthesizer to both add a determiner and
make different replacement operations when creating suggestions;
simply add \\+INDT or \\+DT as special keywords to the regex that
creates the POS tag in the match element
-a new rule checks that there is a space character between sentences
-Esperanto:
-added/improved several rules
-a new rule checks that there is a space character between sentences
-French:
-updated POS tag dictionary and Hunspell dictionary to Dicollecte-5.0.2
-added/improved several rules
-a new rule checks that there is a space character between sentences
-German:
-fixed several false alarms
-added/improved several rules
-a new rule checks that there is a space character between sentences
-Japanese:
-added some rules (thanks to Silvan Jegen)
-Polish:
-updated POS tag dictionary to PoliMorfologik 2.1
-cleaned up spelling rules and hyphenation rules, added some frequent
misspellings to generate proper suggestions
-added simple compounding support for compound words containing prefixes
such as "anty" or "mini", and adjectives with numerals such as "trzynasto"
-removed annoying false alarms
-added a number of new rules
-changed word tokenization so that the hyphen at the end of a
word is no longer its part; multi-word expressions with a hyphen
are also split into their component parts to avoid many false
alarms in spell-checking
-a new rule checks that there is a space character between sentences
-Portuguese:
-added/improved several rules
-added dozens of compounds: it now has 3400+ compound words
-a new rule checks that there is a space character between sentences
-Russian:
-added a few rules
-added some new rules, thanks to Julia Semenenko from WebSpellChecker.net
-added many false friends rules (Russian/English)
-updated POS tag dictionary and synthesizer dictionary from AOT.ru(Seman) rev. 242
-added frequency information for spell-checking dictionary from AOT.ru(Seman)
-Slovenian:
-fixed sentence detection so that sentences that begin with a lowercase
character can now be detected
-added common punctuation and mathematical characters (=#*∗×·+÷) to the
set of tokenizing characters in most languages
-added new element <unify-ignore/> in XML rules to mark the parts of the unified
sequence that are not checked (useful to ignore punctuation, connectives, or not
inflected words)
-added new element <antipattern> (which can include <token>, <and>, <unify>, and <marker>),
useful for marking complex multiword exceptions in rules and rulegroups (a single
antipattern can be shared by multiple rules in the same group)
-spell checking:
-Immunized tokens are now ignored and don't cause a spelling error anymore;
one can also add particular contextual expressions to ignored words by using
a new action of the disambiguator: "ignore_spelling".
-The words in ignore.txt are checked using case conversions allowed in the
speller dictionary (in MorfologikSpeller-based dictionaries).
This usually means that the word added in lowercase will be accepted
also when it is found at the beginning of the sentence; but if it is added in
uppercase, it won't be accepted when written in lowercase.
-updated Morfologik libraries to 1.9.0 to speed up suggestion generation in spell
checks, not only when using replacement pairs but also for ignoring diacritics.
-stand-alone GUI:
-The "More..." dialog now contains a link to community.languagetool.org
with details about the matching rule
-API:
-incompatible change: changed the return type of Rule.getLocQualityIssueType()
from String to ITSIssueType
-incompatible change: changed the parameter of Rule.setLocQualityIssueType()
from String to ITSIssueType
-deprecated Rule.isSpellingRule(), please use Rule.isDictionaryBasedSpellingRule()
instead
2.4.1 (2014-01-08)
-updated Morfologik libraries to 1.8.3 to fix slow suggestions
in the spell checker, which affected at least en-US
2.4 (2013-12-30)
-Breton:
-SRX sentence tokenization
-added/improved a few rules
-fixed some false alarms
-fixed incorrect suggestions thanks to added tests on corrections
-Catalan:
-added/improved several rules
-fixed false alarms
-made additions and fixes to the tagger dictionary
-removed some words from synthesis dictionary (see filter-archaic.txt)
-added frequency data to the tagger dictionary; frequency wordlist comes from the Gaia
project, with Apache License, version 2.0
(https://github.com/mozilla-b2g/gaia/tree/master/keyboard/dictionaries).
-English:
-added/improved a few rules
-fixed some false alarms
-French:
-added/improved several rules
-fixed some false alarms
-German:
-added/improved several rules
-added a synthesizer - the agreement rule can now make suggestions
for some errors (not all suggestions are correct, though)
-Polish:
-added/improved several rules, especially for hyphen and dash usage
-added frequency information for spell-checking dictionary;
frequency wordlist comes from the Gaia project, with Apache License,
version 2.0
(https://github.com/mozilla-b2g/gaia/tree/master/keyboard/dictionaries).
-fixed some false alarms
-Portuguese:
-added/improved several rules (it now includes gender rules "a"/"o")
-it now has 3400+ compound words
-the JAR file has been renamed to languagetool.jar, from formerly
languagetool-standalone.jar to avoid confusion about what 'standalone'
means in this context (github issue #29)
-for languages with many rules (like French or German) performance
on long texts has been increased by about 20-30%
-fix for thread-safety (could cause hang in MultiWordChunker)
-fixed a bug where chunk annotations were not tested in <and> groups
-fix: \1 and <match no="..."/> had not been evaluated in <short>...</short>
-fixed a bug in the unification mechanism that discarded some of the matching
interpretations prematurely
-added support for chunk annotations in the disambiguator, and fixed one bug
in filtering tokens with chunk annotations
-updated Morfologik libraries to 1.8.2 (bug fixes, stricter input sanity checking,
add frequency data to dictionaries)
-added the option of including frequency data to tagging or spelling dictionaries.
The expected format of the frequency wordlists is the one in the Gaia
project, with Apache License, version 2.0
(https://github.com/mozilla-b2g/gaia/tree/master/keyboard/dictionaries)
-new command line tools to export and create binary dictionaries:
org.languagetool.dev.DictionaryExporter
org.languagetool.dev.POSDictionaryBuilder
-LibreOffice/OpenOffice integration:
-added a workaround for incorrect sentence detection for the case that a
footnote appeared after a sentence full stop (Sourceforge bug #191)
-stand-alone GUI:
-The dialog opened by the "More..." item in the context menu of an error
will now also display correct and incorrect example sentences
-API:
-SentenceTokenizer is now an interface, the implementation has been moved to
RegexSentenceTokenizer, but this is deprecated and SRXSentenceTokenizer
should be used instead
-Some methods from org.languagetool.tools.StringTools have been moved to
the org.languagetool.gui.Tools class in the languagetool-gui-commons project
-LanguageToolListener.languageToolEventOccured() has been renamed to
LanguageToolListener.languageToolEventOccurred()
-org.languagetool.tools.SymbolLocator isn't public anymore (shouldn't affect anybody)
-removed DanishSentenceTokenizer which had been deprecated for three years
-Rule.getCorrectExamples() and Rule.getIncorrectExamples() don't return null anymore
but an empty list if there are no examples. Consequently, setCorrectExamples() and
setIncorrectExamples() don't accept null anymore.
-Rule.getId() may return any string now, not just ASCII-only strings (actually this
has been the case before, as the ASCII-only restriction was never enforced and only
mentioned in the javadoc)
-languagetool-wikipedia: the command line options for checking a Wikipedia dump
have been simplified. The command can now be called like this:
java -jar languagetool-wikipedia.jar check-data -l en -f enwiki-20130621-pages-articles.xml
Call just "java -jar languagetool-wikipedia.jar check-data" to get a usage message.
More than one file can be specified with the -f option. Additionally to Wikipedia
XML dumps, CSV files from Tatoeba (http://tatoeba.org) are now also supported,
they need to be filtered first to contain only the relevant language.
2.3.1 (2013-10-07, released on Maven Central only)
-fixes for thread-safety
2.3 (2013-09-30)
-Breton:
-added/improved a few rules
-fixed false alarms
-updated POS dictionary from Apertium (svn r47282)
-Catalan:
-added support for language code ca-ES-valencia (Catalan Valencian),
to be used in LibreOffice 4.2.0
-added a simple replace rule with hundreds of replacement suggestions
-added/improved several rules
-fixed false alarms
-Chinese:
-added a workaround for a StringIndexOutOfBoundsException
(http://sourceforge.net/p/languagetool/bugs/186/)
-English:
-added replacement patterns for the spelling checker to make suggestions
better (now offers 'taught' for 'teached')
-added/improved a few rules
-French:
-added/improved a few rules
-fixed false alarms
-updated POS tag dictionary and Hunspell dictionary to Dicollecte-4.12
-German:
-added/improved several rules
-Portuguese:
-added/improved a few rules
-it now has 3300+ compound words
-Ukrainian:
-added/improved several rules
-the source code has been moved to github:
https://github.com/languagetool-org/languagetool
-LanguageTool requires Java 7 now
-LanguageTool makes use of multiple threads now for text checking on modern
hardware, improving performance (this affects the stand-alone version, the
command line version and the LibreOffice/OpenOffice extension)
-Rule syntax:
-preliminary support for new min/max attributes that allow to match an
element that appears the given number of times. For example:
<token min="0">foo</token> will match nothing or "foo", i.e. "foo" is optional
<token max="2">foo</token> will match "foo" or "foo foo"
<token min="0" max="2">foo</token> will match nothing, "foo", or "foo foo"
Use max="-1" to allow unlimited occurrences.
For min, only 0 or 1 is supported (1 is the default).
-support for OR-statements. For example:
<or>
<token>a</token>
<token postag="V"/>
</or>
Internally and in run-time, a rule containing OR-statements is converted into
several rules without OR-statements.
-English now has a chunker to detect, amongst others, singular and plural noun chunks.
This is documented at http://wiki.languagetool.org/using-chunks
-standalone version:
-The standalone version now underlines errors with a red (spelling errors) or
blue (other errors) line (Panagiotis Minos)
-Remember the language selection for the next start
-Improved window and dialog placement in a multi-monitor setup
-embedded server: uses default port (8081) again if started without arguments
-updated the morfologik-stemming library to version 1.7.1 to enable better suggestions,
including proper handling of diacritics and replacement patterns (equivalents of MAP
and REP features in hunspell dictionaries)
-OpenOffice/LibreOffice integration:
-fix: the "About" dialog didn't work in Apache OpenOffice 4.0
-fix: country specific rules (like for British English) didn't work
-API:
-In class Language, getCountryVariants() has been renamed to getCountries(), and a new method
getVariant has been added.
-Some methods have been deprecated
-Some methods have been moved from the Tools class (languagetool-core) to the
new CommandLineTools class (languagetool-commandline)
-AbstractRuleDisambiguator has been renamed XmlRuleDisambiguator and is not abstract anymore.
The <Language>RuleDisambiguator classes have been removed, XmlRuleDisambiguator can be
used directly instead.
-A new method JLanguageTool.check(AnnotatedText) has been introduced that allows
you to check text with markup. Use AnnotatedTextBuilder to build up the input.
-Thread-safety has been improved. The recommended use case is now to
create a new JLanguageTool object for each thread, but to create the
language only once (e.g. new English()) and use that for all JLanguageTool
instances. This changed the API of some public classes, but for the standard
use case of checking texts with the JLanguageTool object it shouldn't make a
difference. (patch by Stefan Lotties)
-JLanguageTool.loadFalseFriendRules() now behaves like JLanguageTool.loadPatternRules():
it looks in the class path first, and then, if the given file is not found there, in
the filesystem
-Introduced the Chunker interface that can assign chunks (also known as phrases)
to tokens. For example, for noun phrases like "a fast computer" the chunker could assign
an 'NP-singular' (noun phrase, singular) chunk to each of the tokens in that phrase.
In the grammar.xml, such a token can then be matched with this syntax:
<token chunk="NP-singular"></token>
-The new class MultiThreadedJLanguageTool makes use of as many threads
as the computer has processors. In our tests this has improved text checking
time by about 70% on an Intel i7 processor when used on 30KB text.
-AnalyzedTokenReadings now implements Iterable so it can be used in foreach loops
-AnalyzedGermanTokenReadings has been removed, AnalyzedTokenReadings can be used instead
-Embedded HTTP server: the server now uses 10 threads instead of 1 (thanks to
Panagiotis Minos)
-text extraction from Wikipedia dumps has been improved
2.2 (2013-06-30)
-Breton:
-added/improved several rules
-fixed some false alarms
-updated POS dictionary from Apertium (svn r45122)
-Catalan:
-added/improved many rules
-fixed false alarms
-rules have been categorized according to the upcoming Internationalization Tag Set (ITS)
Version 2.0 standard from W3C.
-Dutch:
-updated rules to fix false alarms, thanks to Ruud Baars
-The Dutch spell checking has been switched back to Hunspell for now to avoid too
many false alarms because of unknown compounds. Unfortunately, Dutch spell
checking does not provide suggestions anymore, for performance reasons.
The dictionary used is the one at
http://www.opentaal.org/bestanden/doc_download/20-woordenlijst-v-210g-voor-openofficeorg-3
-English:
-added/improved a few rules
-Esperanto:
-added/improved several rules
-fixed some false alarms
-updated <url> links to PMEG.
-French:
-added/improved several rules
-fixed some false alarms
-updated POS tag dictionary and Hunspell dictionary to Dicollecte-4.10
-German:
-added/improved several rules
-Greek:
-added a few rules (by Panagiotis Minos)
-Italian:
-small rule improvement
-Japanese:
-avoid an ArrayOutOfBoundsException in the POS tagger
-Khmer:
-added some rules (by Nathan Wells)
-Polish:
-added a few new rules
-Portuguese:
-added/improved a few rules
-it now has around 2000 compound words
-Russian:
-added some new rules (thanks for these rules to Julia Semenenko)
-fixed some false alarms
-added new segmentation rules
-added false-friend rule
-added bitext rule
-added new style rules
-Ukrainian:
-new POS dictionary
-new synthesizer dictionary
-new spelling dictionary
-new grammar rules
-updated sentence tokenizer rules
-disambiguator implemented
-word tokenizer updated to ignore accent and soft hyphen and understand different apostrophes
-HTTP server:
-enabling and disabling rules at the same time (keeping the rest of the default options) is now allowed.
To disable all the rules except those explicitly enabled, you can use the parameter enabledOnly=yes. Ex.:
http://localhost:8081/?language=en&enabled=STRANGE_RULE,ANOTHER_RULE&enabledOnly=yes&text=my+text
-Fix bug "java.lang.StringIndexOutOfBoundsException" in DifferentLengthRule
(Ex.: http://localhost:8081/?language=ru&text=№&srctext=No.&motherTongue=en )
-Worked around the "There is an incompatible JNA native library installed on this system" error
-Updated Tika (used for language detection) from 0.9 to 1.3
-The "--version" parameter of languagetool-commandline.jar now also prints
the build date
-Several small bug fixes, code cleanups, and Javadoc improvements
2.1 (2013-03-31)
-Breton: