elastic · benwtrent · Jun 4, 2021 · Apr 23, 2021 · May 24, 2021 · May 24, 2021
diff --git a/docs/reference/aggregations/pipeline/bucket-count-ks-test-aggregation.asciidoc b/docs/reference/aggregations/pipeline/bucket-count-ks-test-aggregation.asciidoc
@@ -9,11 +9,15 @@
 experimental::[]
 
 A sibling pipeline aggregation which executes a two sample Kolmogorov–Smirnov test
-(referred to as a "K-S test" from now on) against a provided distribution and
-the distribution of documents counts in the configured sibling aggregation.
+(referred to as a "K-S test" from now on) against a provided distribution, and the
+distribution implied by the documents counts in the configured sibling aggregation.
+Specifically, for some metric, assuming that the percentile intervals of the metric are
+known beforehand or have been computed by an aggregation, then one would use range
+aggregation for the sibling to compute the p-value of the distribution difference between
+the metric and the restriction of that metric to a subset of the documents. A natural use
+case is if the sibling aggregation range aggregation nested in a terms aggregation, in
+which case one compares the overall distribution of metric to its restriction to each term.
 
-This test is useful to determine if two samples (represented by `fractions` and `buckets_path`) are
-drawn from the same distribution.
 
 [[bucket-count-ks-test-agg-syntax]]
 ==== Parameters
@@ -28,13 +32,16 @@ For syntax, see <<buckets-path-syntax>>.
 A list of string values indicating which K-S test alternative to calculate.
 The valid values are: "greater", "less", "two_sided". This parameter is key for
 determining the K-S statistic used when calculating the K-S test. Default value is
-all possible alternative hypothesis.
+all possible alternative hypotheses.
 
 `fractions`::
 (Optional, list)
 A list of doubles indicating the distribution of the samples with which to compare to the
-`buckets_path` results. The default is a uniform distribution of the same length as the
-`buckets_path` buckets.
+`buckets_path` results. In typical usage this is the overall proportion of documents in
+each bucket, which is compared with the actual document proportions in each bucket
+from the sibling aggregation counts. The default is to assume that overall documents
+are uniformly distributed on these buckets, which they would be if one used equal
+percentiles of a metric to define the bucket end points.
 
 `sampling_method`::
 (Optional, string)
@@ -71,7 +78,7 @@ The uniform distribution reflects the `latency` percentile buckets. Not shown is
 which was done utilizing the
 <<search-aggregations-metrics-percentile-aggregation,percentiles>> aggregation.
 
-This example is only using the 10s percentiles.
+This example is only using the deciles of `latency`.
 
 [source,console]
 -------------------------------------------------
@@ -205,7 +212,7 @@ And the following may be the response:
           "ks_test" : {
             "less" : 2.248673241788478E-4,
             "greater" : 1.0,
-            "two_sided" : 2.248673241788478E-4
+            "two_sided" : 5.791639181800257E-4
           }
         },
         {
@@ -282,7 +289,7 @@ And the following may be the response:
           "ks_test" : {
             "less" : 0.9642895789647244,
             "greater" : 4.58718174664754E-9,
-            "two_sided" : 4.58718174664754E-9
+            "two_sided" : 5.916656831139733E-9
           }
         }
       ]

diff --git a/x-pack/plugin/ml/build.gradle b/x-pack/plugin/ml/build.gradle
@@ -67,6 +67,7 @@ dependencies {
   // ml deps
   api project(':libs:elasticsearch-grok')
   api "net.sf.supercsv:super-csv:${versions.supercsv}"
+  api "org.apache.commons:commons-math3:3.6.1"
   nativeBundle("org.elasticsearch.ml:ml-cpp:${project.version}@zip") {
     changing = true
   }

diff --git a/x-pack/plugin/ml/licenses/commons-math3-3.6.1.jar.sha1 b/x-pack/plugin/ml/licenses/commons-math3-3.6.1.jar.sha1
@@ -0,0 +1 @@
+e4ba98f1d4b3c80ec46392f25e094a6a2e58fcbf