Track min/max on numerics in field data per segment #5829

kimchy · 2014-04-16T12:53:36Z

If we track for numeric values the min/max values in field data, we can potentially use it in several places to optimize execution.

For example, in range filter, if the field data for a field is loaded, it can be used to check if the term / range filter needs to be executed at all, or it can work as a match all. Potentially, also adding improvements to boolean filter to have a special case for match all.

Another option to use this is in aggs, where this can be used to do bucket estimations.

mattrco · 2014-05-09T20:15:19Z

If no one else is currently working on this, I'd like to attempt it as a first contribution.

Is there a particular milestone this would be useful for? Thanks.

mikemccand · 2014-05-09T20:17:53Z

Note that we added this to Lucene, in https://issues.apache.org/jira/browse/LUCENE-5610 which will be available when ES upgrades to Lucene 4.9. So in ES we just need to call the methods in NumericUtils and then act accordingly...

mattrco · 2014-05-10T09:02:30Z

Thanks. I'll keep an eye on this for when the 4.9 upgrade is happening.

jpountz · 2014-08-01T08:56:28Z

There seems to be activity related to this issue at https://issues.apache.org/jira/browse/LUCENE-5860

adadevoh · 2015-04-26T23:35:16Z

Hi, I just wanted to ask, what was the fix for this?

mikemccand · 2015-05-01T09:40:21Z

#10523 already exposed the min/max APIs added in LUCENE-5860, on an index level, but for this issue nothing has been done to e.g. optimize range filters based on the min/max of a segment, because it's currently too costly for Lucene's postings APIs to compute the max numeric value: it requires a binary search over the terms because of how the numeric prefix terms are encoded.

Once we cutover to auto-prefix encoding for numeric terms, this becomes much cheaper and I think optimizations like this become more realistic.

I think higher level optimizations could be very worthwhile, e.g. for time-based indices, knowing that a given index won't have any hits because there is a top-level range filter, should be a big speed up in many cases ... there is a separate issue to explore this but I can't find it right now.

jpountz · 2016-08-24T15:15:17Z

The discussed optimizations have been implemented in 5.0.

kimchy added the enhancement label Apr 16, 2014

s1monw added the low hanging fruit label Apr 16, 2014

clintongormley added the adoptme label Jul 9, 2014

jpountz assigned mikemccand Aug 1, 2014

jpountz removed the adoptme label Aug 1, 2014

clintongormley mentioned this issue May 15, 2015

Field stats filter #11187

Closed

clintongormley added :Search/Search stalled and removed good first issue labels Oct 14, 2015

jpountz closed this as completed Aug 24, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track min/max on numerics in field data per segment #5829

Track min/max on numerics in field data per segment #5829

kimchy commented Apr 16, 2014

mattrco commented May 9, 2014

mikemccand commented May 9, 2014

mattrco commented May 10, 2014

jpountz commented Aug 1, 2014

adadevoh commented Apr 26, 2015

mikemccand commented May 1, 2015

jpountz commented Aug 24, 2016

Track min/max on numerics in field data per segment #5829

Track min/max on numerics in field data per segment #5829

Comments

kimchy commented Apr 16, 2014

mattrco commented May 9, 2014

mikemccand commented May 9, 2014

mattrco commented May 10, 2014

jpountz commented Aug 1, 2014

adadevoh commented Apr 26, 2015

mikemccand commented May 1, 2015

jpountz commented Aug 24, 2016