-
Notifications
You must be signed in to change notification settings - Fork 83
/
CHANGES.txt
4440 lines (3357 loc) · 195 KB
/
CHANGES.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Lucene Change Log
======================= Trunk (not yet released) =======================
Changes in backwards compatibility policy
* LUCENE-1483: Removed utility class oal.util.SorterTemplate; this
class is no longer used by Lucene. (Gunnar Wagenknecht via Mike
McCandless)
* LUCENE-2123: Removed the protected inner class ScoreTerm from
FuzzyQuery. The class was never intended to be public.
(Uwe Schindler, Mike McCandless)
* LUCENE-2135: Added FieldCache.purge(IndexReader) method to the
interface. Anyone implementing FieldCache externally will need to
fix their code to implement this, on upgrading. (Mike McCandless)
* LUCENE-1923: Renamed SegmentInfo & SegmentInfos segString method to
toString. These are advanced APIs and subject to change suddenly.
(Tim Smith via Mike McCandless)
* LUCENE-2190: Removed deprecated customScore() and customExplain()
methods from experimental CustomScoreQuery. (Uwe Schindler)
* LUCENE-2286: Enabled DefaultSimilarity.setDiscountOverlaps by default.
This means that terms with a position increment gap of zero do not
affect the norms calculation by default. (Robert Muir)
Changes in runtime behavior
* LUCENE-1923: Made IndexReader.toString() produce something
meaningful (Tim Smith via Mike McCandless)
* LUCENE-2179: CharArraySet.clear() is now functional.
(Robert Muir, Uwe Schindler)
API Changes
* LUCENE-2076: Rename FSDirectory.getFile -> getDirectory. (George
Aroush via Mike McCandless)
* LUCENE-1260: Change norm encode (float->byte) and decode
(byte->float) to be instance methods not static methods. This way a
custom Similarity can alter how norms are encoded, though they must
still be encoded as a single byte (Johan Kindgren via Mike
McCandless)
* LUCENE-2103: NoLockFactory should have a private constructor;
until Lucene 4.0 the default one will be deprecated.
(Shai Erera via Uwe Schindler)
* LUCENE-2177: Deprecate the Field ctors that take byte[] and Store.
Since the removal of compressed fields, Store can only be YES, so
it's not necessary to specify. (Erik Hatcher via Mike McCandless)
* LUCENE-2200: Several final classes had non-overriding protected
members. These were converted to private and unused protected
constructors removed. (Steven Rowe via Robert Muir)
* LUCENE-2240: SimpleAnalyzer and WhitespaceAnalyzer now have
Version ctors. (Simon Willnauer via Uwe Schindler)
* LUCENE-2259: Add IndexWriter.deleteUnusedFiles, to attempt removing
unused files. This is only useful on Windows, which prevents
deletion of open files. IndexWriter will eventually remove these
files itself; this method just lets you do so when you know the
files are no longer open by IndexReaders. (luocanrao via Mike
McCandless)
* LUCENE-2281: added doBeforeFlush to IndexWriter to allow extensions to perform
operations before flush starts. Also exposed doAfterFlush as protected instead
of package-private. (Shai Erera via Mike McCandless)
* LUCENE-2282: IndexFileNames is exposed as a public class allowing for easier
use by external code. In addition it offers a matchExtension method which
callers can use to query whether a certain file matches a certain extension.
(Shai Erera via Mike McCandless)
* LUCENE-124: Add a TopTermsBoostOnlyBooleanQueryRewrite to MultiTermQuery.
This rewrite method is similar to TopTermsScoringBooleanQueryRewrite, but
only scores terms by their boost values. For example, this can be used
with FuzzyQuery to ensure that exact matches are always scored higher,
because only the boost will be used in scoring. (Robert Muir)
* LUCENE-2015: Add a static method foldToASCII to ASCIIFoldingFilter to
expose its folding logic. (Cédrik Lime via Robert Muir)
* LUCENE-2294: IndexWriter constructors have been deprecated in favor of a
single ctor which accepts IndexWriterConfig and a Directory. You can set all
the parameters related to IndexWriter on IndexWriterConfig. The different
setter/getter methods were deprecated as well. One should call
writer.getConfig().getXYZ() to query for a parameter XYZ.
Additionally, the setter/getter related to MergePolicy were deprecated as
well. One should interact with the MergePolicy directly.
(Shai Erera via Mike McCandless)
Bug fixes
* LUCENE-2119: Don't throw NegativeArraySizeException if you pass
Integer.MAX_VALUE as nDocs to IndexSearcher search methods. (Paul
Taylor via Mike McCandless)
* LUCENE-2142: FieldCacheImpl.getStringIndex no longer throws an
exception when term count exceeds doc count. (Mike McCandless)
* LUCENE-2104: NativeFSLock.release() would silently fail if the lock is held by
another thread/process. (Shai Erera via Uwe Schindler)
* LUCENE-2216: OpenBitSet.hashCode returned different hash codes for
sets that only differed by trailing zeros. (Dawid Weiss, yonik)
* LUCENE-2235: Implement missing PerFieldAnalyzerWrapper.getOffsetGap().
(Javier Godoy via Uwe Schindler)
* LUCENE-2249: ParallelMultiSearcher should shut down thread pool on
close. (Martin Traverso via Uwe Schindler)
* LUCENE-2273: FieldCacheImpl.getCacheEntries() used WeakHashMap
incorrectly and lead to ConcurrentModificationException.
(Uwe Schindler, Robert Muir)
* LUCENE-2283: Use shared memory pool for term vector and stored
fields buffers. This memory will be reclaimed if needed according to
the configured RAM Buffer Size for the IndexWriter. This also fixes
potentially excessive memory usage when many threads are indexing a
mix of small and large documents. (Tim Smith via Mike McCandless)
* LUCENE-2300: If IndexWriter is pooling reader (because NRT reader
has been obtained), and addIndexes* is run, do not pool the
readers from the external directory. This is harmless (NRT reader is
correct), but a waste of resources. (Mike McCandless)
New features
* LUCENE-2128: Parallelized fetching document frequencies during weight
creation. (Israel Tsadok, Simon Willnauer via Uwe Schindler)
* LUCENE-2069: Added Unicode 4 support to CharArraySet. Due to the switch
to Java 5, supplementary characters are now lowercased correctly if the
set is created as case insensitive.
CharArraySet now requires a Version argument to preserve
backwards compatibility. If Version < 3.1 is passed to the constructor,
CharArraySet yields the old behavior. (Simon Willnauer)
* LUCENE-2069: Added Unicode 4 support to LowerCaseFilter. Due to the switch
to Java 5, supplementary characters are now lowercased correctly.
LowerCaseFilter now requires a Version argument to preserve
backwards compatibility. If Version < 3.1 is passed to the constructor,
LowerCaseFilter yields the old behavior. (Simon Willnauer, Robert Muir)
* LUCENE-2034: Added ReusableAnalyzerBase, an abstract subclass of Analyzer
that makes it easier to reuse TokenStreams correctly. This issue also added
StopwordAnalyzerBase, which improves consistency of all Analyzers that use
stopwords, and implement many analyzers in contrib with it.
(Simon Willnauer via Robert Muir)
* LUCENE-2198: Support protected words in stemming TokenFilters using a
new KeywordAttribute. (Simon Willnauer via Uwe Schindler)
* LUCENE-2183, LUCENE-2240, LUCENE-2241: Added Unicode 4 support
to CharTokenizer and its subclasses. CharTokenizer now has new
int-API which is conditionally preferred to the old char-API depending
on the provided Version. Version < 3.1 will use the char-API.
(Simon Willnauer via Uwe Schindler)
* LUCENE-2247: Added a CharArrayMap<V> for performance improvements
in some stemmers and synonym filters. (Uwe Schindler)
* LUCENE-2314: Added AttributeSource.copyTo(AttributeSource) that
allows to use cloneAttributes() and this method as a replacement
for captureState()/restoreState(), if the state itsself
needs to be inspected/modified. (Uwe Schindler)
* LUCENE-2293: Expose control over max number of threads that
IndexWriter will allow to run concurrently while indexing
documents (previously this was hardwired to 5), using
IndexWriterConfig.setMaxThreadStates. (Mike McCandless)
Optimizations
* LUCENE-2075: Terms dict cache is now shared across threads instead
of being stored separately in thread local storage. Also fixed
terms dict so that the cache is used when seeking the thread local
term enum, which will be important for MultiTermQuery impls that do
lots of seeking (Mike McCandless, Uwe Schindler, Robert Muir, Yonik
Seeley)
* LUCENE-2136: If the multi reader (DirectoryReader or MultiReader)
only has a single sub-reader, delegate all enum requests to it.
This avoid the overhead of using a PQ unecessarily. (Mike
McCandless)
* LUCENE-2137: Switch to AtomicInteger for some ref counting (Earwin
Burrfoot via Mike McCandless)
* LUCENE-2123, LUCENE-2261: Move FuzzyQuery rewrite to separate RewriteMode
into MultiTermQuery. The number of fuzzy expansions can be specified with
the maxExpansions parameter to FuzzyQuery, but the default is limited to
BooleanQuery.maxClauseCount() as before.
(Uwe Schindler, Robert Muir, Mike McCandless)
* LUCENE-2135: On IndexReader.close, forcefully evict any entries from
the FieldCache rather than waiting for the WeakHashMap to release
the reference (Mike McCandless)
* LUCENE-2161: Improve concurrency of IndexReader, especially in the
context of near real-time readers. (Mike McCandless)
* LUCENE-2164: ConcurrentMergeScheduler has more control over merge
threads. First, it gives smaller merges higher thread priority than
larges ones. Second, a new set/getMaxMergeCount setting will pause
the larger merges to allow smaller ones to finish. The defaults for
these settings are now dynamic, depending the number CPU cores as
reported by Runtime.getRuntime().availableProcessors() (Mike
McCandless)
* LUCENE-2169: Improved CharArraySet.copy(), if source set is
also a CharArraySet. (Simon Willnauer via Uwe Schindler)
* LUCENE-2084: Change IndexableBinaryStringTools to work on byte[] and char[]
directly, instead of Byte/CharBuffers, and modify CollationKeyFilter to
take advantage of this for faster performance.
(Steven Rowe, Uwe Schindler, Robert Muir)
* LUCENE-2188: Add a utility class for tracking deprecated overridden
methods in non-final subclasses.
(Uwe Schindler, Robert Muir)
* LUCENE-2195: Speedup CharArraySet if set is empty.
(Simon Willnauer via Robert Muir)
* LUCENE-2285: Code cleanup. (Shai Erera via Uwe Schindler)
* LUCENE-2303: Remove code duplication in Token class by subclassing
TermAttributeImpl, move DEFAULT_TYPE constant to TypeInterface, improve
null-handling for TypeAttribute. (Uwe Schindler)
Build
* LUCENE-2124: Moved the JDK-based collation support from contrib/collation
into core, and moved the ICU-based collation support into contrib/icu.
(Robert Muir)
Test Cases
* LUCENE-2037 Allow Junit4 tests in our envrionment (Erick Erickson
via Mike McCandless)
* LUCENE-1844: Speed up the unit tests (Mark Miller, Erick Erickson,
Mike McCandless)
* LUCENE-2065: Use Java 5 generics throughout our unit tests. (Kay
Kay via Mike McCandless)
* LUCENE-2155: Fix time and zone dependent localization test failures
in queryparser tests. (Uwe Schindler, Chris Male, Robert Muir)
* LUCENE-2170: Fix thread starvation problems. (Uwe Schindler)
* LUCENE-2248, LUCENE-2251, LUCENE-2285: Refactor tests to not use
Version.LUCENE_CURRENT, but instead use a global static value
from LuceneTestCase(J4), that contains the release version.
(Uwe Schindler, Simon Willnauer, Shai Erera)
* LUCENE-2313, LUCENE-2322: Add VERBOSE to LuceneTestCase(J4) to control
verbosity of tests. If VERBOSE==false (default) tests should not print
anything other than errors to System.(out|err). The setting can be
changed with -Dtests.verbose=true on test invokation.
(Shai Erera, Paul Elschot, Uwe Schindler)
* LUCENE-2318: Remove inconsistent system property code for retrieving
temp and data directories inside test cases. It is now centralized in
LuceneTestCase(J4). Also changed lots of tests to use
getClass().getResourceAsStream() to retrieve test data. Tests needing
access to "real" files from the test folder itsself, can use
LuceneTestCase(J4).getDataFile(). (Uwe Schindler)
================== Release 2.9.2 / 3.0.1 2010-02-26 ====================
Changes in backwards compatibility policy
* LUCENE-2123 (3.0.1 only): Removed the protected inner class ScoreTerm
from FuzzyQuery. The change was needed because the comparator of this
class had to be changed in an incompatible way. The class was never
intended to be public. (Uwe Schindler, Mike McCandless)
Bug fixes
* LUCENE-2092: BooleanQuery was ignoring disableCoord in its hashCode
and equals methods, cause bad things to happen when caching
BooleanQueries. (Chris Hostetter, Mike McCandless)
* LUCENE-2095: Fixes: when two threads call IndexWriter.commit() at
the same time, it's possible for commit to return control back to
one of the threads before all changes are actually committed.
(Sanne Grinovero via Mike McCandless)
* LUCENE-2132 (3.0.1 only): Fix the demo result.jsp to use QueryParser
with a Version argument. (Brian Li via Robert Muir)
* LUCENE-2166: Don't incorrectly keep warning about the same immense
term, when IndexWriter.infoStream is on. (Mike McCandless)
* LUCENE-2158: At high indexing rates, NRT reader could temporarily
lose deletions. (Mike McCandless)
* LUCENE-2182: DEFAULT_ATTRIBUTE_FACTORY was failing to load
implementation class when interface was loaded by a different
class loader. (Uwe Schindler, reported on java-user by Ahmed El-dawy)
* LUCENE-2257: Increase max number of unique terms in one segment to
termIndexInterval (default 128) * ~2.1 billion = ~274 billion.
(Tom Burton-West via Mike McCandless)
* LUCENE-2260: Fixed AttributeSource to not hold a strong
reference to the Attribute/AttributeImpl classes which prevents
unloading of custom attributes loaded by other classloaders
(e.g. in Solr plugins). (Uwe Schindler)
* LUCENE-1941: Fix Min/MaxPayloadFunction returns 0 when
only one payload is present. (Erik Hatcher, Mike McCandless
via Uwe Schindler)
* LUCENE-2270: Queries consisting of all zero-boost clauses
(for example, text:foo^0) sorted incorrectly and produced
invalid docids. (yonik)
API Changes
* LUCENE-1609 (3.0.1 only): Restore IndexReader.getTermInfosIndexDivisor
(it was accidentally removed in 3.0.0) (Mike McCandless)
* LUCENE-1972 (3.0.1 only): Restore SortField.getComparatorSource
(it was accidentally removed in 3.0.0) (John Wang via Uwe Schindler)
* LUCENE-2190: Added a new class CustomScoreProvider to function package
that can be subclassed to provide custom scoring to CustomScoreQuery.
The methods in CustomScoreQuery that did this before were deprecated
and replaced by a method getCustomScoreProvider(IndexReader) that
returns a custom score implementation using the above class. The change
is necessary with per-segment searching, as CustomScoreQuery is
a stateless class (like all other Queries) and does not know about
the currently searched segment. This API works similar to Filter's
getDocIdSet(IndexReader). (Paul chez Jamespot via Mike McCandless,
Uwe Schindler)
* LUCENE-2080: Deprecate Version.LUCENE_CURRENT, as using this constant
will cause backwards compatibility problems when upgrading Lucene. See
the Version javadocs for additional information.
(Robert Muir)
Optimizations
* LUCENE-2086: When resolving deleted terms, do so in term sort order
for better performance (Bogdan Ghidireac via Mike McCandless)
* LUCENE-2123 (partly, 3.0.1 only): Fixes a slowdown / memory issue
added by LUCENE-504. (Uwe Schindler, Robert Muir, Mike McCandless)
* LUCENE-2258: Remove unneeded synchronization in FuzzyTermEnum.
(Uwe Schindler, Robert Muir)
Test Cases
* LUCENE-2114: Change TestFilteredSearch to test on multi-segment
index as well. (Simon Willnauer via Mike McCandless)
* LUCENE-2211: Improves BaseTokenStreamTestCase to use a fake attribute
that checks if clearAttributes() was called correctly.
(Uwe Schindler, Robert Muir)
* LUCENE-2207, LUCENE-2219: Improve BaseTokenStreamTestCase to check if
end() is implemented correctly. (Koji Sekiguchi, Robert Muir)
Documentation
* LUCENE-2114: Improve javadocs of Filter to call out that the
provided reader is per-segment (Simon Willnauer via Mike
McCandless)
======================= Release 3.0.0 2009-11-25 =======================
Changes in backwards compatibility policy
* LUCENE-1979: Change return type of SnapshotDeletionPolicy#snapshot()
from IndexCommitPoint to IndexCommit. Code that uses this method
needs to be recompiled against Lucene 3.0 in order to work. The
previously deprecated IndexCommitPoint is also removed.
(Michael Busch)
* o.a.l.Lock.isLocked() is now allowed to throw an IOException.
(Mike McCandless)
* LUCENE-2030: CachingWrapperFilter and CachingSpanFilter now hide
the internal cache implementation for thread safety, before it was
declared protected. (Peter Lenahan, Uwe Schindler, Simon Willnauer)
* LUCENE-2053: If you call Thread.interrupt() on a thread inside
Lucene, Lucene will do its best to interrupt the thread. However,
instead of throwing InterruptedException (which is a checked
exception), you'll get an oal.util.ThreadInterruptedException (an
unchecked exception, subclassing RuntimeException). The interrupt
status on the thread is cleared when this exception is thrown.
(Mike McCandless)
* LUCENE-2052: Some methods in Lucene core were changed to accept
Java 5 varargs. This is not a backwards compatibility problem as
long as you not try to override such a method. We left common
overridden methods unchanged and added varargs to constructors,
static, or final methods (MultiSearcher,...). (Uwe Schindler)
* LUCENE-1558: IndexReader.open(Directory) now opens a readOnly=true
reader, and new IndexSearcher(Directory) does the same. Note that
this is a change in the default from 2.9, when these methods were
previously deprecated. (Mike McCandless)
* LUCENE-1753: Make not yet final TokenStreams final to enforce
decorator pattern. (Uwe Schindler)
Changes in runtime behavior
* LUCENE-1677: Remove the system property to set SegmentReader class
implementation. (Uwe Schindler)
* LUCENE-1960: As a consequence of the removal of Field.Store.COMPRESS,
support for this type of fields was removed. Lucene 3.0 is still able
to read indexes with compressed fields, but as soon as merges occur
or the index is optimized, all compressed fields are decompressed
and converted to Field.Store.YES. Because of this, indexes with
compressed fields can suddenly get larger. Also the first merge with
decompression cannot be done in raw mode, it is therefore slower.
This change has no effect for code that uses such old indexes,
they behave as before (fields are automatically decompressed
during read). Indexes converted to Lucene 3.0 format cannot be read
anymore with previous versions.
It is recommended to optimize your indexes after upgrading to convert
to the new format and decompress all fields.
If you want compressed fields, you can use CompressionTools, that
creates compressed byte[] to be added as binary stored field. This
cannot be done automatically, as you also have to decompress such
fields when reading. You have to reindex to do that.
(Michael Busch, Uwe Schindler)
* LUCENE-2060: Changed ConcurrentMergeScheduler's default for
maxNumThreads from 3 to 1, because in practice we get the most
gains from running a single merge in the background. More than one
concurrent merge causes a lot of thrashing (though it's possible on
SSD storage that there would be net gains). (Jason Rutherglen,
Mike McCandless)
API Changes
* LUCENE-1257, LUCENE-1984, LUCENE-1985, LUCENE-2057, LUCENE-1833, LUCENE-2012,
LUCENE-1998: Port to Java 1.5:
- Add generics to public and internal APIs (see below).
- Replace new Integer(int), new Double(double),... by static valueOf() calls.
- Replace for-loops with Iterator by foreach loops.
- Replace StringBuffer with StringBuilder.
- Replace o.a.l.util.Parameter by Java 5 enums (see below).
- Add @Override annotations.
(Uwe Schindler, Robert Muir, Karl Wettin, Paul Elschot, Kay Kay, Shai Erera,
DM Smith)
* Generify Lucene API:
- TokenStream/AttributeSource: Now addAttribute()/getAttribute() return an
instance of the requested attribute interface and no cast needed anymore
(LUCENE-1855).
- NumericRangeQuery, NumericRangeFilter, and FieldCacheRangeFilter
now have Integer, Long, Float, Double as type param (LUCENE-1857).
- Document.getFields() returns List<Fieldable>.
- Query.extractTerms(Set<Term>)
- CharArraySet and stop word sets in core/contrib
- PriorityQueue (LUCENE-1935)
- TopDocCollector
- DisjunctionMaxQuery (LUCENE-1984)
- MultiTermQueryWrapperFilter
- CloseableThreadLocal
- MapOfSets
- o.a.l.util.cache package
- lot's of internal APIs of IndexWriter
(Uwe Schindler, Michael Busch, Kay Kay, Robert Muir, Adriano Crestani)
* LUCENE-1944, LUCENE-1856, LUCENE-1957, LUCENE-1960, LUCENE-1961,
LUCENE-1968, LUCENE-1970, LUCENE-1946, LUCENE-1971, LUCENE-1975,
LUCENE-1972, LUCENE-1978, LUCENE-944, LUCENE-1979, LUCENE-1973, LUCENE-2011:
Remove deprecated methods/constructors/classes:
- Remove all String/File directory paths in IndexReader /
IndexSearcher / IndexWriter.
- Remove FSDirectory.getDirectory()
- Make FSDirectory abstract.
- Remove Field.Store.COMPRESS (see above).
- Remove Filter.bits(IndexReader) method and make
Filter.getDocIdSet(IndexReader) abstract.
- Remove old DocIdSetIterator methods and make the new ones abstract.
- Remove some methods in PriorityQueue.
- Remove old TokenStream API and backwards compatibility layer.
- Remove RangeQuery, RangeFilter and ConstantScoreRangeQuery.
- Remove SpanQuery.getTerms().
- Remove ExtendedFieldCache, custom and auto caches, SortField.AUTO.
- Remove old-style custom sort.
- Remove legacy search setting in SortField.
- Remove Hits and all references from core and contrib.
- Remove HitCollector and its TopDocs support implementations.
- Remove term field and accessors in MultiTermQuery
(and fix Highlighter).
- Remove deprecated methods in BooleanQuery.
- Remove deprecated methods in Similarity.
- Remove BoostingTermQuery.
- Remove MultiValueSource.
- Remove Scorer.explain(int).
...and some other minor ones (Uwe Schindler, Michael Busch, Mark Miller)
* LUCENE-1925: Make IndexSearcher's subReaders and docStarts members
protected; add expert ctor to directly specify reader, subReaders
and docStarts. (John Wang, Tim Smith via Mike McCandless)
* LUCENE-1945: All public classes that have a close() method now
also implement java.io.Closeable (IndexReader, IndexWriter, Directory,...).
(Uwe Schindler)
* LUCENE-1998: Change all Parameter instances to Java 5 enums. This
is no backwards-break, only a change of the super class. Parameter
was deprecated and will be removed in a later version.
(DM Smith, Uwe Schindler)
Bug fixes
* LUCENE-1951: When the text provided to WildcardQuery has no wildcard
characters (ie matches a single term), don't lose the boost and
rewrite method settings. Also, rewrite to PrefixQuery if the
wildcard is form "foo*", for slightly faster performance. (Robert
Muir via Mike McCandless)
* LUCENE-2013: SpanRegexQuery does not work with QueryScorer.
(Benjamin Keil via Mark Miller)
* LUCENE-2088: addAttribute() should only accept interfaces that
extend Attribute. (Shai Erera, Uwe Schindler)
* LUCENE-2045: Fix silly FileNotFoundException hit if you enable
infoStream on IndexWriter and then add an empty document and commit
(Shai Erera via Mike McCandless)
* LUCENE-2046: IndexReader should not see the index as changed, after
IndexWriter.prepareCommit has been called but before
IndexWriter.commit is called. (Peter Keegan via Mike McCandless)
New features
* LUCENE-1933: Provide a convenience AttributeFactory that creates a
Token instance for all basic attributes. (Uwe Schindler)
* LUCENE-2041: Parallelize the rest of ParallelMultiSearcher. Lots of
code refactoring and Java 5 concurrent support in MultiSearcher.
(Joey Surls, Simon Willnauer via Uwe Schindler)
* LUCENE-2051: Add CharArraySet.copy() as a simple method to copy
any Set<?> to a CharArraySet that is optimized, if Set<?> is already
an CharArraySet. (Simon Willnauer)
Optimizations
* LUCENE-1183: Optimize Levenshtein Distance computation in
FuzzyQuery. (Cédrik Lime via Mike McCandless)
* LUCENE-2006: Optimization of FieldDocSortedHitQueue to always
use Comparable<?> interface. (Uwe Schindler, Mark Miller)
* LUCENE-2087: Remove recursion in NumericRangeTermEnum.
(Uwe Schindler)
Build
* LUCENE-486: Remove test->demo dependencies. (Michael Busch)
* LUCENE-2024: Raise build requirements to Java 1.5 and ANT 1.7.0
(Uwe Schindler, Mike McCandless)
======================= Release 2.9.1 2009-11-06 =======================
Changes in backwards compatibility policy
* LUCENE-2002: Add required Version matchVersion argument when
constructing QueryParser or MultiFieldQueryParser and, default (as
of 2.9) enablePositionIncrements to true to match
StandardAnalyzer's 2.9 default (Uwe Schindler, Mike McCandless)
Bug fixes
* LUCENE-1974: Fixed nasty bug in BooleanQuery (when it used
BooleanScorer for scoring), whereby some matching documents fail to
be collected. (Fulin Tang via Mike McCandless)
* LUCENE-1124: Make sure FuzzyQuery always matches the precise term.
([email protected] via Mike McCandless)
* LUCENE-1976: Fix IndexReader.isCurrent() to return the right thing
when the reader is a near real-time reader. (Jake Mannix via Mike
McCandless)
* LUCENE-1986: Fix NPE when scoring PayloadNearQuery (Peter Keegan,
Mark Miller via Mike McCandless)
* LUCENE-1992: Fix thread hazard if a merge is committing just as an
exception occurs during sync (Uwe Schindler, Mike McCandless)
* LUCENE-1995: Note in javadocs that IndexWriter.setRAMBufferSizeMB
cannot exceed 2048 MB, and throw IllegalArgumentException if it
does. (Aaron McKee, Yonik Seeley, Mike McCandless)
* LUCENE-2004: Fix Constants.LUCENE_MAIN_VERSION to not be inlined
by client code. (Uwe Schindler)
* LUCENE-2016: Replace illegal U+FFFF character with the replacement
char (U+FFFD) during indexing, to prevent silent index corruption.
(Peter Keegan, Mike McCandless)
API Changes
* Un-deprecate search(Weight weight, Filter filter, int n) from
Searchable interface (deprecated by accident). (Uwe Schindler)
* Un-deprecate o.a.l.util.Version constants. (Mike McCandless)
* LUCENE-1987: Un-deprecate some ctors of Token, as they will not
be removed in 3.0 and are still useful. Also add some missing
o.a.l.util.Version constants for enabling invalid acronym
settings in StandardAnalyzer to be compatible with the coming
Lucene 3.0. (Uwe Schindler)
* LUCENE-1973: Un-deprecate IndexSearcher.setDefaultFieldSortScoring,
to allow controlling per-IndexSearcher whether scores are computed
when sorting by field. (Uwe Schindler, Mike McCandless)
* LUCENE-2043: Make IndexReader.commit(Map<String,String>) public.
(Mike McCandless)
Documentation
* LUCENE-1955: Fix Hits deprecation notice to point users in right
direction. (Mike McCandless, Mark Miller)
* Fix javadoc about score tracking done by search methods in Searcher
and IndexSearcher. (Mike McCandless)
* LUCENE-2008: Javadoc improvements for TokenStream/Tokenizer/Token
(Luke Nezda via Mike McCandless)
======================= Release 2.9.0 2009-09-23 =======================
Changes in backwards compatibility policy
* LUCENE-1575: Searchable.search(Weight, Filter, int, Sort) no
longer computes a document score for each hit by default. If
document score tracking is still needed, you can call
IndexSearcher.setDefaultFieldSortScoring(true, true) to enable
both per-hit and maxScore tracking; however, this is deprecated
and will be removed in 3.0.
Alternatively, use Searchable.search(Weight, Filter, Collector)
and pass in a TopFieldCollector instance, using the following code
sample:
<code>
TopFieldCollector tfc = TopFieldCollector.create(sort, numHits, fillFields,
true /* trackDocScores */,
true /* trackMaxScore */,
false /* docsInOrder */);
searcher.search(query, tfc);
TopDocs results = tfc.topDocs();
</code>
Note that your Sort object cannot use SortField.AUTO when you
directly instantiate TopFieldCollector.
Also, the method search(Weight, Filter, Collector) was added to
the Searchable interface and the Searcher abstract class to
replace the deprecated HitCollector versions. If you either
implement Searchable or extend Searcher, you should change your
code to implement this method. If you already extend
IndexSearcher, no further changes are needed to use Collector.
Finally, the values Float.NaN and Float.NEGATIVE_INFINITY are not
valid scores. Lucene uses these values internally in certain
places, so if you have hits with such scores, it will cause
problems. (Shai Erera via Mike McCandless)
* LUCENE-1687: All methods and parsers from the interface ExtendedFieldCache
have been moved into FieldCache. ExtendedFieldCache is now deprecated and
contains only a few declarations for binary backwards compatibility.
ExtendedFieldCache will be removed in version 3.0. Users of FieldCache and
ExtendedFieldCache will be able to plug in Lucene 2.9 without recompilation.
The auto cache (FieldCache.getAuto) is now deprecated. Due to the merge of
ExtendedFieldCache and FieldCache, FieldCache can now additionally return
long[] and double[] arrays in addition to int[] and float[] and StringIndex.
The interface changes are only notable for users implementing the interfaces,
which was unlikely done, because there is no possibility to change
Lucene's FieldCache implementation. (Grant Ingersoll, Uwe Schindler)
* LUCENE-1630, LUCENE-1771: Weight, previously an interface, is now an abstract
class. Some of the method signatures have changed, but it should be fairly
easy to see what adjustments must be made to existing code to sync up
with the new API. You can find more detail in the API Changes section.
Going forward Searchable will be kept for convenience only and may
be changed between minor releases without any deprecation
process. It is not recommended that you implement it, but rather extend
Searcher.
(Shai Erera, Chris Hostetter, Martin Ruckli, Mark Miller via Mike McCandless)
* LUCENE-1422, LUCENE-1693: The new Attribute based TokenStream API (see below)
has some backwards breaks in rare cases. We did our best to make the
transition as easy as possible and you are not likely to run into any problems.
If your tokenizers still implement next(Token) or next(), the calls are
automatically wrapped. The indexer and query parser use the new API
(eg use incrementToken() calls). All core TokenStreams are implemented using
the new API. You can mix old and new API style TokenFilters/TokenStream.
Problems only occur when you have done the following:
You have overridden next(Token) or next() in one of the non-abstract core
TokenStreams/-Filters. These classes should normally be final, but some
of them are not. In this case, next(Token)/next() would never be called.
To fail early with a hard compile/runtime error, the next(Token)/next()
methods in these TokenStreams/-Filters were made final in this release.
(Michael Busch, Uwe Schindler)
* LUCENE-1763: MergePolicy now requires an IndexWriter instance to
be passed upon instantiation. As a result, IndexWriter was removed
as a method argument from all MergePolicy methods. (Shai Erera via
Mike McCandless)
* LUCENE-1748: LUCENE-1001 introduced PayloadSpans, but this was a back
compat break and caused custom SpanQuery implementations to fail at runtime
in a variety of ways. This issue attempts to remedy things by causing
a compile time break on custom SpanQuery implementations and removing
the PayloadSpans class, with its functionality now moved to Spans. To
help in alleviating future back compat pain, Spans has been changed from
an interface to an abstract class.
(Hugh Cayless, Mark Miller)
* LUCENE-1808: Query.createWeight has been changed from protected to
public. This will be a back compat break if you have overridden this
method - but you are likely already affected by the LUCENE-1693 (make Weight
abstract rather than an interface) back compat break if you have overridden
Query.creatWeight, so we have taken the opportunity to make this change.
(Tim Smith, Shai Erera via Mark Miller)
* LUCENE-1708 - IndexReader.document() no longer checks if the document is
deleted. You can call IndexReader.isDeleted(n) prior to calling document(n).
(Shai Erera via Mike McCandless)
Changes in runtime behavior
* LUCENE-1424: QueryParser now by default uses constant score auto
rewriting when it generates a WildcardQuery and PrefixQuery (it
already does so for TermRangeQuery, as well). Call
setMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE)
to revert to slower BooleanQuery rewriting method. (Mark Miller via Mike
McCandless)
* LUCENE-1575: As of 2.9, the core collectors as well as
IndexSearcher's search methods that return top N results, no
longer filter documents with scores <= 0.0. If you rely on this
functionality you can use PositiveScoresOnlyCollector like this:
<code>
TopDocsCollector tdc = new TopScoreDocCollector(10);
Collector c = new PositiveScoresOnlyCollector(tdc);
searcher.search(query, c);
TopDocs hits = tdc.topDocs();
...
</code>
* LUCENE-1604: IndexReader.norms(String field) is now allowed to
return null if the field has no norms, as long as you've
previously called IndexReader.setDisableFakeNorms(true). This
setting now defaults to false (to preserve the fake norms back
compatible behavior) but in 3.0 will be hardwired to true. (Shon
Vella via Mike McCandless).
* LUCENE-1624: If you open IndexWriter with create=true and
autoCommit=false on an existing index, IndexWriter no longer
writes an empty commit when it's created. (Paul Taylor via Mike
McCandless)
* LUCENE-1593: When you call Sort() or Sort.setSort(String field,
boolean reverse), the resulting SortField array no longer ends
with SortField.FIELD_DOC (it was unnecessary as Lucene breaks ties
internally by docID). (Shai Erera via Michael McCandless)
* LUCENE-1542: When the first token(s) have 0 position increment,
IndexWriter used to incorrectly record the position as -1, if no
payload is present, or Integer.MAX_VALUE if a payload is present.
This causes positional queries to fail to match. The bug is now
fixed, but if your app relies on the buggy behavior then you must
call IndexWriter.setAllowMinus1Position(). That API is deprecated
so you must fix your application, and rebuild your index, to not
rely on this behavior by the 3.0 release of Lucene. (Jonathan
Mamou, Mark Miller via Mike McCandless)
* LUCENE-1715: Finalizers have been removed from the 4 core classes
that still had them, since they will cause GC to take longer, thus
tying up memory for longer, and at best they mask buggy app code.
DirectoryReader (returned from IndexReader.open) & IndexWriter
previously released the write lock during finalize.
SimpleFSDirectory.FSIndexInput closed the descriptor in its
finalizer, and NativeFSLock released the lock. It's possible
applications will be affected by this, but only if the application
is failing to close reader/writers. (Brian Groose via Mike
McCandless)
* LUCENE-1717: Fixed IndexWriter to account for RAM usage of
buffered deletions. (Mike McCandless)
* LUCENE-1727: Ensure that fields are stored & retrieved in the
exact order in which they were added to the document. This was
true in all Lucene releases before 2.3, but was broken in 2.3 and
2.4, and is now fixed in 2.9. (Mike McCandless)
* LUCENE-1678: The addition of Analyzer.reusableTokenStream
accidentally broke back compatibility of external analyzers that
subclassed core analyzers that implemented tokenStream but not
reusableTokenStream. This is now fixed, such that if
reusableTokenStream is invoked on such a subclass, that method
will forcefully fallback to tokenStream. (Mike McCandless)
* LUCENE-1801: Token.clear() and Token.clearNoTermBuffer() now also clear
startOffset, endOffset and type. This is not likely to affect any
Tokenizer chains, as Tokenizers normally always set these three values.
This change was made to be conform to the new AttributeImpl.clear() and
AttributeSource.clearAttributes() to work identical for Token as one for all
AttributeImpl and the 6 separate AttributeImpls. (Uwe Schindler, Michael Busch)
* LUCENE-1483: When searching over multiple segments, a new Scorer is now created
for each segment. Searching has been telescoped out a level and IndexSearcher now
operates much like MultiSearcher does. The Weight is created only once for the top
level Searcher, but each Scorer is passed a per-segment IndexReader. This will
result in doc ids in the Scorer being internal to the per-segment IndexReader. It
has always been outside of the API to count on a given IndexReader to contain every
doc id in the index - and if you have been ignoring MultiSearcher in your custom code
and counting on this fact, you will find your code no longer works correctly. If a
custom Scorer implementation uses any caches/filters that rely on being based on the
top level IndexReader, it will need to be updated to correctly use contextless
caches/filters eg you can't count on the IndexReader to contain any given doc id or
all of the doc ids. (Mark Miller, Mike McCandless)
* LUCENE-1846: DateTools now uses the US locale to format the numbers in its
date/time strings instead of the default locale. For most locales there will
be no change in the index format, as DateFormatSymbols is using ASCII digits.
The usage of the US locale is important to guarantee correct ordering of
generated terms. (Uwe Schindler)
* LUCENE-1860: MultiTermQuery now defaults to
CONSTANT_SCORE_AUTO_REWRITE_DEFAULT rewrite method (previously it
was SCORING_BOOLEAN_QUERY_REWRITE). This means that PrefixQuery
and WildcardQuery will now produce constant score for all matching
docs, equal to the boost of the query. (Mike McCandless)
API Changes
* LUCENE-1419: Add expert API to set custom indexing chain. This API is
package-protected for now, so we don't have to officially support it.
Yet, it will give us the possibility to try out different consumers
in the chain. (Michael Busch)
* LUCENE-1427: DocIdSet.iterator() is now allowed to throw
IOException. (Paul Elschot, Mike McCandless)
* LUCENE-1422, LUCENE-1693: New TokenStream API that uses a new class called
AttributeSource instead of the Token class, which is now a utility class that
holds common Token attributes. All attributes that the Token class had have
been moved into separate classes: TermAttribute, OffsetAttribute,
PositionIncrementAttribute, PayloadAttribute, TypeAttribute and FlagsAttribute.
The new API is much more flexible; it allows to combine the Attributes
arbitrarily and also to define custom Attributes. The new API has the same
performance as the old next(Token) approach. For conformance with this new
API Tee-/SinkTokenizer was deprecated and replaced by a new TeeSinkTokenFilter.
(Michael Busch, Uwe Schindler; additional contributions and bug fixes by
Daniel Shane, Doron Cohen)
* LUCENE-1467: Add nextDoc() and next(int) methods to OpenBitSetIterator.
These methods can be used to avoid additional calls to doc().
(Michael Busch)
* LUCENE-1468: Deprecate Directory.list(), which sometimes (in
FSDirectory) filters out files that don't look like index files, in
favor of new Directory.listAll(), which does no filtering. Also,
listAll() will never return null; instead, it throws an IOException
(or subclass). Specifically, FSDirectory.listAll() will throw the
newly added NoSuchDirectoryException if the directory does not
exist. (Marcel Reutegger, Mike McCandless)
* LUCENE-1546: Add IndexReader.flush(Map commitUserData), allowing
you to record an opaque commitUserData (maps String -> String) into
the commit written by IndexReader. This matches IndexWriter's
commit methods. (Jason Rutherglen via Mike McCandless)
* LUCENE-652: Added org.apache.lucene.document.CompressionTools, to
enable compressing & decompressing binary content, external to
Lucene's indexing. Deprecated Field.Store.COMPRESS.
* LUCENE-1561: Renamed Field.omitTf to Field.omitTermFreqAndPositions
(Otis Gospodnetic via Mike McCandless)
* LUCENE-1500: Added new InvalidTokenOffsetsException to Highlighter methods
to denote issues when offsets in TokenStream tokens exceed the length of the
provided text. (Mark Harwood)
* LUCENE-1575, LUCENE-1483: HitCollector is now deprecated in favor of
a new Collector abstract class. For easy migration, people can use
HitCollectorWrapper which translates (wraps) HitCollector into
Collector. Note that this class is also deprecated and will be
removed when HitCollector is removed. Also TimeLimitedCollector
is deprecated in favor of the new TimeLimitingCollector which
extends Collector. (Shai Erera, Mark Miller, Mike McCandless)
* LUCENE-1592: The method TermsEnum.skipTo() was deprecated, because
it is used nowhere in core/contrib and there is only a very ineffective
default implementation available. If you want to position a TermEnum
to another Term, create a new one using IndexReader.terms(Term).
(Uwe Schindler)
* LUCENE-1621: MultiTermQuery.getTerm() has been deprecated as it does
not make sense for all subclasses of MultiTermQuery. Check individual
subclasses to see if they support getTerm(). (Mark Miller)
* LUCENE-1636: Make TokenFilter.input final so it's set only
once. (Wouter Heijke, Uwe Schindler via Mike McCandless).
* LUCENE-1658, LUCENE-1451: Renamed FSDirectory to SimpleFSDirectory
(but left an FSDirectory base class). Added an FSDirectory.open
static method to pick a good default FSDirectory implementation
given the OS. FSDirectories should now be instantiated using
FSDirectory.open or with public constructors rather than
FSDirectory.getDirectory(), which has been deprecated.
(Michael McCandless, Uwe Schindler, yonik)
* LUCENE-1665: Deprecate SortField.AUTO, to be removed in 3.0.
Instead, when sorting by field, the application should explicitly
state the type of the field. (Mike McCandless)
* LUCENE-1660: StopFilter, StandardAnalyzer, StopAnalyzer now
require up front specification of enablePositionIncrement (Mike
McCandless)
* LUCENE-1614: DocIdSetIterator's next() and skipTo() were deprecated in favor
of the new nextDoc() and advance(). The new methods return the doc Id they
landed on, saving an extra call to doc() in most cases.
For easy migration of the code, you can change the calls to next() to
nextDoc() != DocIdSetIterator.NO_MORE_DOCS and similarly for skipTo().
However it is advised that you take advantage of the returned doc ID and not
call doc() following those two.
Also, doc() was deprecated in favor of docID(). docID() should return -1 or
NO_MORE_DOCS if nextDoc/advance were not called yet, or NO_MORE_DOCS if the
iterator has exhausted. Otherwise it should return the current doc ID.
(Shai Erera via Mike McCandless)
* LUCENE-1672: All ctors/opens and other methods using String/File to
specify the directory in IndexReader, IndexWriter, and IndexSearcher
were deprecated. You should instantiate the Directory manually before
and pass it to these classes (LUCENE-1451, LUCENE-1658).
(Uwe Schindler)
* LUCENE-1407: Move RemoteSearchable, RemoteCachingWrapperFilter out
of Lucene's core into new contrib/remote package. Searchable no
longer extends java.rmi.Remote (Simon Willnauer via Mike
McCandless)
* LUCENE-1677: The global property
org.apache.lucene.SegmentReader.class, and
ReadOnlySegmentReader.class are now deprecated, to be removed in
3.0. src/gcj/* has been removed. (Earwin Burrfoot via Mike
McCandless)
* LUCENE-1673: Deprecated NumberTools in favour of the new
NumericRangeQuery and its new indexing format for numeric or
date values. (Uwe Schindler)
* LUCENE-1630, LUCENE-1771: Weight is now an abstract class, and adds
a scorer(IndexReader, boolean /* scoreDocsInOrder */, boolean /*
topScorer */) method instead of scorer(IndexReader). IndexSearcher uses
this method to obtain a scorer matching the capabilities of the Collector
wrt orderedness of docIDs. Some Scorers (like BooleanScorer) are much more
efficient if out-of-order documents scoring is allowed by a Collector.
Collector must now implement acceptsDocsOutOfOrder. If you write a
Collector which does not care about doc ID orderness, it is recommended
that you return true. Weight has a scoresDocsOutOfOrder method, which by
default returns false. If you create a Weight which will score documents
out of order if requested, you should override that method to return true.
BooleanQuery's setAllowDocsOutOfOrder and getAllowDocsOutOfOrder have been
deprecated as they are not needed anymore. BooleanQuery will now score docs
out of order when used with a Collector that can accept docs out of order.
Finally, Weight#explain now takes a sub-reader and sub-docID, rather than
a top level reader and docID.