-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Field key suggester API for flattened field #73968
Comments
Pinging @elastic/es-search (Team:Search) |
When I filed #43805, I initially envisioned a solution specific to flattened fields. But I think we should instead consider a general API to provide field name suggestions for all fields in the matching indices. This API would handle flattened subfields 'under the hood' and surface them as suggestions. A general API seems easier to work with for clients like Kibana, so they don't need to track which fields are non-standard like flattened and issue special requests. |
I'd be happy to pick this up because the implementation would be similar to the terms_enum api I just finished working on if that's OK with you. I'm assuming it has the same auto-complete-as-you-type performance requirement? |
I don't think we need to resort to indexing more data. We can already know field names with the data we have today: for common field types, these are field names in the mappings, and for flattened fields, these are the prefixes of terms in the terms dictionary associated with the field?
I agree with that argument: as a user I would really like that the UI suggest
One benefit I can see of supporting this is that if we get a request for field name suggestions of numeric fields, then we could skip all
I think it would be frustrating if we required an additional call to the
I usually prefer copying code (option a), which would make it easier for field name suggestions and field value suggestions to evolve independently. |
I wonder if Kibana should use a hybrid model and only rely on an elasticsearch API for looking up the flattened fields.
As I mentioned earlier, flattened fields are an uncontrolled part of the schema (docs can introduce millions of unique values) so rogue docs have the potential to flood the top suggestions with garbage, especially if we are going to rely on infix matching to surface results.
|
@jpountz one aspect of this has been tricky to code so I wanted to check some assumptions: The Lucene indexed terms for flattened fields only contain the bit of the field name from the object onwards so if the flattened object is called UpdateI got the matching across dot boundaries working OK and I assume this is the desired behaviour. |
To me it still seems cleanest and easiest to handle if we had a unified field suggestion API. Even with this API, we have some flexibility to determine how suggestions are produced. So to ensure diverse suggestions, maybe we could directly incorporate some some of your ideas (like listing mapped fields above flattened subfields) and just document how the API makes these trade-offs. I'd be really curious if our future users in Kibana have an opinion on this point too. |
Currently I have an implementation where search string patterns will match any part of the logical field name regardless of how that is held physically i.e. there is an assumption that searching for There is the question of performance though.
@jimczi suggested the flattened cost would be too high to bear and we should restrict matching on flattened fields to be prefix-based.
If we want to avoid the cost of things like 2b by requiring a prefix to match inside flattened fields then the user would have to type a full stop e.g. The difference between leaf and branch nodes.One of the usability questions raised here is that this probably introduces two types of fields when it comes to suggestions:
|
Wouldn't this concern be addressed by having an upper limit on the number of field names that we look at for flattened fields like I suggested in my previous comment? If possible it would be nice if moving from |
We currently have time as a limiting factor and in the terms_enum PR I originally prototyped a number-of-scanned-terms limit. Either way - @jimczi has always had an objection to limits like this where the execution cost on large datasets always runs to some worst-case upper limit of the largest-tolerable setting rather than offering functionality that we can maintain fixed, small look-up costs that don't increase with data sizes. This fixed cost is done by limiting functionality e.g. offering prefix search only. |
Closing in favour of #74816 which provides name suggestion for all field types |
We need a new API that for a given flattened field lists field keys starting with a given string.
For example, the request below lists first 10 field keys for
x_pack_telemetry
flattened field that start with "x_pack_telemetry.stack_stats".Implementation-wise this could be either:
The text was updated successfully, but these errors were encountered: