-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Per-field boosting of the _all field is broken unless very specific conditions are met #4315
Comments
Could you also consider a wider scope of
|
@roytmana The two issues you are mentioning are actually quite tough to implement, so I would like to concentrate on just fixing boosting on the _all field for now. |
@roytmana a similar method could be applied indeed. But I'm not fully happy with the way per-field boosting works for the _all field so I would like that we consider improving it before applying the same logic to other places. In particular, this doesn't work with all queries (eg. phrase queries) and is quite wasteful storage-wise (4 bytes per occurrence of a term whose field has a boost which is not 1: I wouldn't be surprised to see that it sometimes almost doubles the size of the inverted index for the _all field). |
@jpountz Great thank you for the info. I just wanted to bring these two cases up so you could consider them as you work on _all implementation. Hopefully multifield will follow soon :-) and an arbitrary snippet boosting after that |
_all boosting used to rely on the fact that the TokenStream doesn't eagerly consume the input java.io.Reader. This fixes the issue by using binary search in order to find the right boost given a token's start offset. Close elastic#4315
@jpountz do you mind if I create another ticket with expanded scope as discussed in my first reply toy your post as I feel ability to boos individual text fragments and particularly multifields is very powerful feature? |
_all boosting used to rely on the fact that the TokenStream doesn't eagerly consume the input java.io.Reader. This fixes the issue by using binary search in order to find the right boost given a token's start offset. Close #4315
@roytmana please open a ticket. I do think the ability to boost individual text fragments is very interesting! |
_all boosting used to rely on the fact that the TokenStream doesn't eagerly consume the input java.io.Reader. This fixes the issue by using binary search in order to find the right boost given a token's start offset. Close elastic#4315
The _all field uses payloads in order to be able to store per-field boosts in a single index field. However, the way it is implemented relies on the fact that the token stream doesn't eagerly consume the input
java.io.Reader
(seeAllEntries.read
). So in practice, boost on the _all field doesn't work when under any of these circumstances:standard
tokenizer,The text was updated successfully, but these errors were encountered: