Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[7.x] [ML] adding new KS test pipeline aggregation (#73334) #73782

Merged
merged 2 commits into from
Jun 7, 2021

Conversation

benwtrent
Copy link
Member

Backports the following commits to 7.x:

This adds a new pipeline aggregation for calculating Kolmogorov–Smirnov test for a given sample and buckets path.

For now, the buckets path resolution needs to be `_count`. But, this may be relaxed in the future. 

It accepts a parameter `fractions` that indicates the distribution of documents from some other pre-calculated sample. 

This particular version of the K-S test is Two-sample, meaning, it calculates if the `fractions` and the distribution of `_count` values in the buckets_path are taken from the same distribution.

This in combination with the hypothesis alternatives (`less`, `greater`, `two_sided`) and sampling logic (`upper_tail`, `lower_tail`, `uniform`) allow for flexibility and usefulness when comparing two samples and determining the likelihood of them being from the same overall distribution.

Usage:

```
POST correlate_latency/_search?size=0&filter_path=aggregations
{
  "aggs": {
    "buckets": {
      "terms": { <1>
        "field": "version",
        "size": 2
      },
      "aggs": {
        "latency_ranges": {
          "range": { <2>
            "field": "latency",
            "ranges": [
              { "to": 0.0 },
              { "from": 0, "to": 105 },
              { "from": 105, "to": 225 },
              { "from": 225, "to": 445 },
              { "from": 445, "to": 665 },
              { "from": 665, "to": 885 },
              { "from": 885, "to": 1115 },
              { "from": 1115, "to": 1335 },
              { "from": 1335, "to": 1555 },
              { "from": 1555, "to": 1775 },
              { "from": 1775 }
            ]
          }
        },
        "ks_test": { <3>
          "bucket_count_ks_test": {
            "buckets_path": "latency_ranges>_count",
            "alternative": ["less", "greater", "two_sided"]
          }
        }
      }
    }
  }
}
```
@benwtrent benwtrent added :ml Machine learning backport labels Jun 4, 2021
@elasticmachine elasticmachine added the Team:ML Meta label for the ML team label Jun 4, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@benwtrent
Copy link
Member Author

@elasticmachine update branch

@benwtrent
Copy link
Member Author

run elasticsearch-ci/packaging-tests-windows-sample

@benwtrent benwtrent merged commit 3392358 into elastic:7.x Jun 7, 2021
@benwtrent benwtrent deleted the backport/7.x/pr-73334 branch June 7, 2021 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport :ml Machine learning Team:ML Meta label for the ML team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants