Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optionally include 'Set Difference' for filter aggregation. #7261

Closed
Kallin opened this issue Aug 13, 2014 · 7 comments
Closed

Optionally include 'Set Difference' for filter aggregation. #7261

Kallin opened this issue Aug 13, 2014 · 7 comments
Labels

Comments

@Kallin
Copy link

Kallin commented Aug 13, 2014

A project that I'm working on involves breaking down an index via many nested filter aggregations. Imagine it being for something like a website visitor funnel:

  1. Of all the visitors, bucket the ones who created an account
  2. Of account creators, bucket those over age 30
  3. etc..

What I also end up doing are creating the Difference filters, albeit manually.

1b. Of all the visitors, bucket the ones who didn't create an account.
2b. Of those account creators under 30, show me those who liked dogs.
3b. etc..

This has proven very powerful for segmenting data, but it is very verbose and error prone to create all these 'difference' filters manually. What would be great is if I could optionally have the difference filter created for me whenever I create a filter.

For example, in the filter agg doc it suggests:

{
    "aggs" : {
        "in_stock_products" : {
            "filter" : { "range" : { "stock" : { "gt" : 0 } } },
            "aggs" : {
                "avg_price" : { "avg" : { "field" : "price" } }
            }
        }
    }
}

What if we allowed a parameter on the filter agg like:

"filter" : { "range" : { "stock" : { "gt" : 0 } }, includeDifference: true },

And that would automatically create a bucket that includes all the docs not included in the main filter, perhaps automatically naming it in this case 'not_in_stock_products'.

The response might then look like:

{
    ...

    "aggs" : {
        "in_stock_products" : {
            "doc_count" : 100,
            "avg_price" : { "value" : 56.3 }
        },
        "not_in_stock_products" : {
            "doc_count" : 50
        }
    }
}

creating further sub-aggs on the auto-created agg could be done either inline with the original agg:

            "filter" : { "range" : { "stock" : { "gt" : 0 } } ,
            includeDifference:true, differenceAggs: {
                "avg_price" : { "avg" : { "field" : "price" } }
            } 
            },
            "aggs" : {
                "avg_price" : { "avg" : { "field" : "price" } }
            }

or perhaps simply by specifying the name of the agg it would map it to the original by convention:

{
    "aggs" : {
        "in_stock_products" : {
            "filter" : { "range" : { "stock" : { "gt" : 0 } }  includeDifference:true},
            "aggs" : {
                "avg_price" : { "avg" : { "field" : "price" } }
            }
        },
        "not_in_stock_products" : {
            "avg_price" : { "avg" : { "field" : "price" } }
        }
    }
}

What do people think about something like this? I'm sure I'm not the only one who tries to segment their data like this.

@Kallin
Copy link
Author

Kallin commented Aug 15, 2014

I'd be happy to contribute code for this, just wonder if the feature would be welcomed.

@clintongormley
Copy link
Contributor

Hi @Kallin

Honestly, I don't like the includeDifference syntax. I'm wondering if this would be handled generically with the "other" option described in #5324

@Kallin
Copy link
Author

Kallin commented Aug 18, 2014

This does sounds like it has similarities to the changes suggested in the 'missing' or 'other' options, though from what I gather those were specific to terms aggregations or bucketing aggregations. In this particular case I'm concerned about filter aggregations. If there's a way that it could be rolled into some other work that would be ideal. I don't see why the 'not_*' dynamic aggregation I described above couldn't be replaced with 'other', though that issue (#6804) looks to have been closed. Should I push for inclusion of this in #5324 ?

@clintongormley
Copy link
Contributor

It does sound like the other bucket would be a good fit to solve this.

See #5324 for discussion

@clintongormley clintongormley added help wanted adoptme and removed discuss labels Oct 31, 2014
@Kallin
Copy link
Author

Kallin commented Oct 31, 2014

as long as #5324 will handle '_other' bucket on filter aggs then I'm happy to close this.

@clintongormley
Copy link
Contributor

@Kallin as I'm sure you've seen in the other thread, it's not an easy change, but we will be working on it :)

@Kallin
Copy link
Author

Kallin commented Oct 31, 2014

Nothing worth having ever is :)

@Kallin Kallin closed this as completed Oct 31, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants