-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A high level way of retrieving values for certain fields #49028
Comments
Pinging @elastic/es-search (:Search/Search) |
We discussed this issue in our search meeting and we've spotted two enhancements that could help to retrieve values more easily:
|
Discussed in the meeting today, adding team-discuss to clarify the remaining scope once @jimczi is back (are we okay with the current plan or do we need to do a higher level api to handle the retrieval). |
I can imagine this as being necessary as well for feature extraction for our planned LTR work, both at training and inference time to extract document only features (i.e. features that are not query/context dependent). |
We have run into this problem in Kibana, where we are primarily asking users to interact with dotted field names like Proposal: Add a new parameter The kibana sample data contains both
The example request is easy to write for any user of Elasticsearch, and the response contains information that is from both Limitations of current APIsI have been testing with ECS-based schemas like metricbeat, which on my cluster contains 3904 named paths in the mapping. Not all of these fields are actively used, but because the mapping is so large it causes problems. Here are the limitations I've found
All of these limitations make it hard to avoid using |
I caught up with @jimczi offline to clarify our earlier discussion. Instead of immediately pushing ahead with the We can continue the discussion about field retrieval on this issue, building on @wylieconlon's helpful analysis. I'll remove 'team discuss' for now, but we can add it back if there's a particular item we'd like to discuss in person. |
+1 to move forward with something along the lines of @wylieconlon 's above proposal. |
Great, I've assigned this to myself and am working on a design doc. Once the design is more settled I'll post it here or open a new meta-issue. |
I opened a meta-issue to track implementation details: #55363. |
Closing, since the feature branch was merged in #60100. |
Describe the feature:
More and more use cases arise that treat elasticsearch as a data store. Yet the landscape for retrieving fields today is complex. In fact, it requires expertise about a lot of different aspects. One needs to understand mappings, doc_values, stored fields. Complexities like becoming aware of the max doc_value field limit and then working around it by detecting a user requested more fields and trying to fetch them from
_source
instead.Then, of course, there is multi-fields. Which variant should I pick? How do I even detect that a field has multi-fields in order to avoid retrieving the same field multiple times? There is an answer to this of course (check there is a parent field that is not an object) but this is hopefully illustrating how complex this is.
Writing code to do this for ML I have multiple stories about the complexities that arise. I think other users must have gone through a similar process.
I propose a new API that simply retrieves values given a list of fields. The API does not intend to do this in the most performant way. Rather, it intends to do it in the most user friendly way. It is an API that targets users that do not know the inner workings of elasticsearch and that have not yet detected a performance issue so that they begin an optimization journey (see "is it faster to retrieve from _source or doc_values" types of questions).
The text was updated successfully, but these errors were encountered: