Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track min/max on numerics in field data per segment #5829

Closed
kimchy opened this issue Apr 16, 2014 · 7 comments
Closed

Track min/max on numerics in field data per segment #5829

kimchy opened this issue Apr 16, 2014 · 7 comments
Assignees
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories stalled

Comments

@kimchy
Copy link
Member

kimchy commented Apr 16, 2014

If we track for numeric values the min/max values in field data, we can potentially use it in several places to optimize execution.

For example, in range filter, if the field data for a field is loaded, it can be used to check if the term / range filter needs to be executed at all, or it can work as a match all. Potentially, also adding improvements to boolean filter to have a special case for match all.

Another option to use this is in aggs, where this can be used to do bucket estimations.

@mattrco
Copy link

mattrco commented May 9, 2014

If no one else is currently working on this, I'd like to attempt it as a first contribution.

Is there a particular milestone this would be useful for? Thanks.

@mikemccand
Copy link
Contributor

Note that we added this to Lucene, in https://issues.apache.org/jira/browse/LUCENE-5610 which will be available when ES upgrades to Lucene 4.9. So in ES we just need to call the methods in NumericUtils and then act accordingly...

@mattrco
Copy link

mattrco commented May 10, 2014

Thanks. I'll keep an eye on this for when the 4.9 upgrade is happening.

@jpountz
Copy link
Contributor

jpountz commented Aug 1, 2014

There seems to be activity related to this issue at https://issues.apache.org/jira/browse/LUCENE-5860

@adadevoh
Copy link

Hi, I just wanted to ask, what was the fix for this?

@mikemccand
Copy link
Contributor

#10523 already exposed the min/max APIs added in LUCENE-5860, on an index level, but for this issue nothing has been done to e.g. optimize range filters based on the min/max of a segment, because it's currently too costly for Lucene's postings APIs to compute the max numeric value: it requires a binary search over the terms because of how the numeric prefix terms are encoded.

Once we cutover to auto-prefix encoding for numeric terms, this becomes much cheaper and I think optimizations like this become more realistic.

I think higher level optimizations could be very worthwhile, e.g. for time-based indices, knowing that a given index won't have any hits because there is a top-level range filter, should be a big speed up in many cases ... there is a separate issue to explore this but I can't find it right now.

@clintongormley clintongormley added :Search/Search Search-related issues that do not fall into other categories stalled and removed good first issue low hanging fruit labels Oct 14, 2015
@jpountz
Copy link
Contributor

jpountz commented Aug 24, 2016

The discussed optimizations have been implemented in 5.0.

@jpountz jpountz closed this as completed Aug 24, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories stalled
Projects
None yet
Development

No branches or pull requests

7 participants