Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "missing" and "other" values to terms agg #1961

Closed
ghost opened this issue Nov 18, 2014 · 80 comments · Fixed by #15525
Closed

Add "missing" and "other" values to terms agg #1961

ghost opened this issue Nov 18, 2014 · 80 comments · Fixed by #15525
Labels
Feature:elasticsearch Feature:Visualizations Generic visualization features (in case no more specific feature label is available) high hanging fruit release_note:enhancement

Comments

@ghost
Copy link

ghost commented Nov 18, 2014

In kibana 3, in the pie chart definition, there are two check boxes for "missing" and "other" values.

It seems this option is gone in kibana 4.
If I do a terms aggregation on a field with 20 values and only select the top 7, the percent in the pie chart will no take the last 13 terms into consideration. In this case, I would like to be able to include a slice in the pie chart with the "other" values.

Do I miss something? Is the inclusion of "other" of "missing" values planed in kibana 4?

@rashidkpc
Copy link
Contributor

Unfortunately this functionality was removed from Elasticsearch, Kibana has no way of calculating these values. You can follow the elasticsearch team's progress here: elastic/elasticsearch#5324

@ghost
Copy link
Author

ghost commented Nov 18, 2014

Thanks for the quick answer.

That's a major pain. Pie chart are rarely useful without being able to work on the complete data set (and drawing the complete data set is most of the time not doable nor interesting).
We just want to see / show the proportion of the total taken by each of the top 10 terms.

@rashidkpc
Copy link
Contributor

Yep, you may wish to weigh in on the above referenced elasticsearch ticket.

@rashidkpc rashidkpc changed the title Pie chart: add "missing" and "other" values Add "missing" and "other" values to terms agg Jan 9, 2015
@bradvido
Copy link

bradvido commented May 4, 2015

Couldn't this be done in Kibana4 right now?

missing
Add a missing aggregation at the same level as the terms aggregation. Then appending the missing bucket results at the end of the terms bucket array.
This is what we do with some custom reporting built on top of ES aggregations, and it works very well. It just requires some more logic on the client side to merge the missing bucket into the results.

other
Isn't this exactly what sum_other_doc_count of the terms aggregation is supposed to fix? It seems surprising that Kibana4 is not using this value since it's already being returned with the aggregation buckets.

As has been stated, you really need to see the whole data set for these graphs to be useful. When the data is sparse, missing and other values are really valuable metrics.

@jccq
Copy link

jccq commented May 12, 2015

This seems about to be closed, i wonder if it can be used to close this issue elastic/elasticsearch#11042

@janbernhardt
Copy link

+1 absolutely essential

@ajrasch
Copy link

ajrasch commented May 21, 2015

+1000

On Thu, May 21, 2015 at 5:21 AM, Jan Bernhardt [email protected]
wrote:

+1 absolutely essential


Reply to this email directly or view it on GitHub
#1961 (comment).

@celesteking
Copy link

I guess people using this soft in financial sector got a heart attack now. Haha.

@qcho
Copy link

qcho commented May 26, 2015

+1. must have

@dev-shubh
Copy link

+1 absolutely essential

@joelsvensson
Copy link

+1

2 similar comments
@manuel-sousa
Copy link

+1

@JulienPalard
Copy link

+1

@tojocky
Copy link

tojocky commented Jun 8, 2015

This prevent me to upgrade to version 4.0!

@pickworth
Copy link

I figured this one out:

Click Advanced in your buckets field, then in the EXCLUDE PATTERN, type: !*

This will exclude missing values!

@jdanbrown
Copy link

Any updates since elastic/elasticsearch#5324 (comment), e.g. along the lines proposed above in #1961 (comment)? Terms viz that don't sum up to the total is a huge feature regression for us upgrading from kibana 3.

@snarahari
Copy link

Must have. Preventing migration from K3 to K4.

@bertol83
Copy link

+1

@acheriat
Copy link

In kibana 4, when we use a data table (visualisation ), why rows with nulls dates (fields : date can be null), or missing values are systematically eliminated ?
Do you have a solution (especially for reconsidering the rows with null fields with dates in the data tables)?

@spalger
Copy link
Contributor

spalger commented Sep 9, 2015

Now that elastic/elasticsearch#11042 is merged this is no longer an upstream issue. The "others" functionality desired here is actually not fixed by elastic/elasticsearch#11042, which simply allows defining a value for documents which do not have a value.

@alxbog
Copy link

alxbog commented Feb 17, 2017

+1

6 similar comments
@jbwl
Copy link

jbwl commented Feb 21, 2017

+1

@z-matth
Copy link

z-matth commented Feb 23, 2017

+1

@bhatiaabhinav
Copy link

+1

@boomin614
Copy link

+1

@leosulake
Copy link

+1

@kiblik
Copy link

kiblik commented Mar 28, 2017

+1

@epixa epixa removed the P4 label Apr 25, 2017
@pjcard
Copy link

pjcard commented Apr 28, 2017

+1

@exhuma
Copy link

exhuma commented Jun 1, 2017

Is there any update on this? I've been running into this issue several times now and kept silent so far as it seems essential for graphing. So I'm still assuming that this is on the radar of the devs, but would love to see an update to see where we're at.

I have a lot of vertical bar-charts which use split-bars on terms. I currently set the size to something like 9999 so I get at least something usable. But that creates many splits and I am usually only interested in the "Top <n>" entries. An "other" bucket would make this much more usable. I could then use the "magnifying-glass-button" on the terms in the visualisation legend to exclude terms which will reveal more and more items from the "other" bucket.

From what I can tell googling around a bit, it seems that ES already offers this value in sum_other_doc_count so I'm a bit puzzled why this is not exposed in Kibana.

Following this thread, I can see that this only makes sense with the "Count" metric. Would it be possible to offer this in the visualisation options via a simple "Enable <other> bucket" checkbox?

@ChrisDev83
Copy link

ChrisDev83 commented Jun 7, 2017

If its of any use to others, I recently had a similar problem where I wanted to show a pie chart of terms which also took the missing values into consideration.

What I did in the end was create a chart with two filters :
_exists_:"MyField"
and
_missing_:"MyField"

I then added a subbucket for Terms on the field "MyField".

This gave me the visibility I was after.

@zhangskd
Copy link

zhangskd commented Jul 3, 2017

+1

@zhangskd
Copy link

zhangskd commented Jul 3, 2017

Being very disappointed in kibana4/5, serveral years have past, such an essential feature is still unsupported.

@daiglej-LSPD
Copy link

+1

@5ean
Copy link

5ean commented Aug 31, 2017

really need this feature.

@leifker
Copy link

leifker commented Sep 1, 2017

+1

@nakedible-p
Copy link

We need both "other" and "missing". +1

@davidban77
Copy link

+1

1 similar comment
@sedelnik
Copy link

sedelnik commented Nov 1, 2017

+1

@ulir
Copy link

ulir commented Nov 13, 2017

It's exactly as exhuma said (June 1) - I kept from posting the 1001st +1 here, always thinking that some kind of fix would come anyway. Would need to come, given that e.g. pie charts on a field with more than a dozen distinct values simply make no sense! (except you do what we all do: specify a size of 1000, which is a major pain for performance.)
And if the implementation is simple for count and very hard for more complex metrics, enable it for count and ease the pain of 95% of the users!
Elastic team, it would be great to get any kind of feedback on this one.

@LuigiClemente-Awin
Copy link

+1

@jetnet
Copy link

jetnet commented Nov 27, 2017

just a workaround for "others" slice:
use the "Filters" agg and add {"other_bucket": true} to the "Json input" field

others workaround

@nreese
Copy link
Contributor

nreese commented Dec 6, 2017

@ppisljar @thomasneirynck I think others can be implemented on the Terms aggregation with a modifyAggConfigOnSearchRequestStart that gathers the terms list before the real terms aggregation request is generated.

Then the terms aggregation can implement the function getRequestAggs that, when others is enabled, adds a sibling aggregation called others. The others aggregation would just be a filter aggregation that excludes the terms list gathered by modifyAggConfigOnSearchRequestStart and then asks for the requested metrics on that bucket.

Filtering would not affect the flow because any applied filters will get accounted for in the pre-flight request to gather the terms list. The pre-flight request is executed each time before the aggregation is created.

histogram aggregation provides a working example of the pre-flight request. It fetches the min and max so that when the actual histogram aggregation is requested, an appropriate interval can be used to avoid requesting too many buckets.

@nreese
Copy link
Contributor

nreese commented Dec 7, 2017

@ppisljar and I chatted about the above solution.

It does not work when dealing with nested aggregations. For example, a date_histogram containing a terms aggregation (user wants to see the top terms per day). A separate sibling filter aggregation will be required for each date_histogram bucket. How would modifyAggConfigOnSearchRequestStart know which bucket(s) its in?

We decided that aggregations need a post-flight concept. That way, the sibling aggregation(s) can be created for each parent bucket, fetched, and then merged into the results

@exhuma
Copy link

exhuma commented Dec 8, 2017

@jetnet Do you know if something similar is possible with "terms" instead of "filters"? I have a chart where the terms are unknown on a given time-slice. In this particular example they contain IP addresses of network routers causing error on a network. What I want to know is the "Top 10" IPs in that time-slice, but would also need to see the "others" slice. Mainly, this would help me to see if (and how many) other IPs are causing issues. If I set the number of slices to 10, and see 10 slices, I could be looking at 10 failing IPs, or 12, or 5000. The only way to make this "visible" is by adding an "other" slice.

As I don't know which devices cause error at any given time, I can't "hard-code" those values in the visualisation filters.

@ppisljar
Copy link
Member

ppisljar commented Dec 9, 2017 via email

@epixa
Copy link
Contributor

epixa commented Feb 6, 2018

For those following along, support for "other" and "missing" buckets has just been released in 6.2.0: https://www.elastic.co/blog/kibana-6-2-0-released

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:elasticsearch Feature:Visualizations Generic visualization features (in case no more specific feature label is available) high hanging fruit release_note:enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.