Optionally include 'Set Difference' for filter aggregation. #7261

Kallin · 2014-08-13T18:03:04Z

A project that I'm working on involves breaking down an index via many nested filter aggregations. Imagine it being for something like a website visitor funnel:

Of all the visitors, bucket the ones who created an account
Of account creators, bucket those over age 30
etc..

What I also end up doing are creating the Difference filters, albeit manually.

1b. Of all the visitors, bucket the ones who didn't create an account.
2b. Of those account creators under 30, show me those who liked dogs.
3b. etc..

This has proven very powerful for segmenting data, but it is very verbose and error prone to create all these 'difference' filters manually. What would be great is if I could optionally have the difference filter created for me whenever I create a filter.

For example, in the filter agg doc it suggests:

{
    "aggs" : {
        "in_stock_products" : {
            "filter" : { "range" : { "stock" : { "gt" : 0 } } },
            "aggs" : {
                "avg_price" : { "avg" : { "field" : "price" } }
            }
        }
    }
}

What if we allowed a parameter on the filter agg like:

"filter" : { "range" : { "stock" : { "gt" : 0 } }, includeDifference: true },

And that would automatically create a bucket that includes all the docs not included in the main filter, perhaps automatically naming it in this case 'not_in_stock_products'.

The response might then look like:

{
    ...

    "aggs" : {
        "in_stock_products" : {
            "doc_count" : 100,
            "avg_price" : { "value" : 56.3 }
        },
        "not_in_stock_products" : {
            "doc_count" : 50
        }
    }
}

creating further sub-aggs on the auto-created agg could be done either inline with the original agg:

            "filter" : { "range" : { "stock" : { "gt" : 0 } } ,
            includeDifference:true, differenceAggs: {
                "avg_price" : { "avg" : { "field" : "price" } }
            } 
            },
            "aggs" : {
                "avg_price" : { "avg" : { "field" : "price" } }
            }

or perhaps simply by specifying the name of the agg it would map it to the original by convention:

{
    "aggs" : {
        "in_stock_products" : {
            "filter" : { "range" : { "stock" : { "gt" : 0 } }  includeDifference:true},
            "aggs" : {
                "avg_price" : { "avg" : { "field" : "price" } }
            }
        },
        "not_in_stock_products" : {
            "avg_price" : { "avg" : { "field" : "price" } }
        }
    }
}

What do people think about something like this? I'm sure I'm not the only one who tries to segment their data like this.

The text was updated successfully, but these errors were encountered:

Kallin · 2014-08-15T18:21:04Z

I'd be happy to contribute code for this, just wonder if the feature would be welcomed.

clintongormley · 2014-08-18T09:31:25Z

Hi @Kallin

Honestly, I don't like the includeDifference syntax. I'm wondering if this would be handled generically with the "other" option described in #5324

Kallin · 2014-08-18T14:45:29Z

This does sounds like it has similarities to the changes suggested in the 'missing' or 'other' options, though from what I gather those were specific to terms aggregations or bucketing aggregations. In this particular case I'm concerned about filter aggregations. If there's a way that it could be rolled into some other work that would be ideal. I don't see why the 'not_*' dynamic aggregation I described above couldn't be replaced with 'other', though that issue (#6804) looks to have been closed. Should I push for inclusion of this in #5324 ?

clintongormley · 2014-10-31T11:04:50Z

It does sound like the other bucket would be a good fit to solve this.

See #5324 for discussion

Kallin · 2014-10-31T14:08:27Z

as long as #5324 will handle '_other' bucket on filter aggs then I'm happy to close this.

clintongormley · 2014-10-31T14:11:43Z

@Kallin as I'm sure you've seen in the other thread, it's not an easy change, but we will be working on it :)

Kallin · 2014-10-31T14:14:49Z

Nothing worth having ever is :)

clintongormley added the discuss label Aug 13, 2014

clintongormley added help wanted adoptme and removed discuss labels Oct 31, 2014

Kallin closed this as completed Oct 31, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optionally include 'Set Difference' for filter aggregation. #7261

Optionally include 'Set Difference' for filter aggregation. #7261

Kallin commented Aug 13, 2014

Kallin commented Aug 15, 2014

clintongormley commented Aug 18, 2014

Kallin commented Aug 18, 2014

clintongormley commented Oct 31, 2014

Kallin commented Oct 31, 2014

clintongormley commented Oct 31, 2014

Kallin commented Oct 31, 2014

Optionally include 'Set Difference' for filter aggregation. #7261

Optionally include 'Set Difference' for filter aggregation. #7261

Comments

Kallin commented Aug 13, 2014

Kallin commented Aug 15, 2014

clintongormley commented Aug 18, 2014

Kallin commented Aug 18, 2014

clintongormley commented Oct 31, 2014

Kallin commented Oct 31, 2014

clintongormley commented Oct 31, 2014

Kallin commented Oct 31, 2014