-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[data.search.aggs] Support for multi-terms aggregation #77632
Comments
Pinging @elastic/kibana-app-arch (Team:AppArch) |
The tricky part here is selecting the separator to make sure it cannot exist in the concatenated fields otherwise you encounter conflicts while concatenating fields. In the past we have discussed supporting this use case directly in elasticsearch elastic/elasticsearch#23818 As far as I remember, there were a few obstacles like supporting multi value fields. Perhaps, we can revisit it again. @polyfractal WDYT? |
Igor and I chatted about this offline yesterday. A few notes for posterity:
In the "concat as string" cases, we also have to worry about how different datatypes are handled now that everything looks like a string. E.g. |
Why not simply encode them as a JSON array? Or some other compact/deterministic encoding? So example above would be then |
@imotov it looks like support for this was added to Elasticsearch in elastic/elasticsearch#67597, documented here, which is exciting! Let's shift this issue to focus on how the integration with Kibana aggConfigs would work. I think the work needed is:
|
I'm not sure whether we want to have separate columns or a single column, somehow joining the terms of all fields. I guess it depends on how we are planning to show this feature to the user. Do we want to have a field multi select in Lens? If yes, I think a single column is more consistent (at least it's more in line with my own mental model of how the Lens UI works). If we want to merge multiple dimensions into a single multi field terms agg behind the scenes, multiple columns make sense to me. In both cases we would need to answer a bunch of questions of how it will integrate with other things (how does formatting work, how does filtering work, how do we communicate the difference to nested "top values") |
@flash1293 I've made some assumptions here that might not be clear. Do you share these assumptions?
In my previous comment I assumed that tabify needs to create multiple columns, one for each field, with a predictable ID. By writing out my assumptions here I think that's not a requirement, but we would need a more-complex cell structure to compensate. Two examples:
Now that I've written it out, I think option 2 is totally fine. This would simplify the logic needed in |
I think we have a similar situation with ranges - they are more complex than a single number but there might be a formatter defined already for just a single number we want to use (kind of higher order formatter). I think we can do the same thing here. The cell contains:
(array would be possible as well) The formatter params look like this:
The Lens rendering code has to work with a non-primitive type (something which is true today already because of ranges), and can just pass the object to the higher order formatter which will take care of the rest. Filtering can work in the same way - we just pass the cell value object as is, the filter building logic turns it into 3 filters (one per field) and show the multi field modal. I mostly agree with your assumptions, the only thing I'm not sure about is the table rendering - if we have three columns instead of one, sorting via the column action popover becomes confusing, as well as hiding the column and other column based settings. We would need special logic for that in all places we keep state about a "column". IMHO the advantages of that are not worth the additional UI complexity (it introduces lots of edge cases like what if the user adds three fields, changes the widths of each of them to something else, then deletes of the fields). |
Looking at the es documentation, what @flash1293 is suggesting makes sense to me |
Fixed by #116928 |
Edit: This has now been implemented in Elasticsearch as a new aggregation called
multi_terms
.For example, I want to be able to look at the top 10 CPU usage across each container, on each host name, on each datacenter. Because I only want the top 10 results, it's not possible to build this query without using scripted terms.
The ideal result of this query is the following table, which is not something that you can currently build with
esaggs
across three fields:The terms documentation describes the tradeoffs with using scripted terms.
User input limitations
Implementation proposal
There are two possible implementations on the aggconfig level:
a) Don't deprecate the
field
parameter, but create a newfields
parameter on the terms aggregation, with a new editor component. Introduce a newwrite
function on the aggconfig to write the correct field or script as needed in the configuration.b) Introduce the concept of multi-select on all fields in aggconfig, and change from
field: string
tofield: string | string[]
. Scripted fields are already support for all aggregations, but are most useful for the Terms agg.The other thing that needs to change is the
tabify
logic. Currently each aggconfig is converted into a single column, but with this proposal I would want to convert a single aggconfig into multiple columns.Example requests
In this example, I want to see the top 5 pairs of geo.src and geo.dest based on the maximum transferred bytes between those pairs. This type of query is useful for finding outliers.
I can express this query as ES SQL:
Which returns the following result:
I can also express it as DSL:
The DSL query returns:
Notice that in the DSL query, I had to separate each term using a separator parameter. I would expect that this separator has a default implementation, but might be user-configurable.
The text was updated successfully, but these errors were encountered: