Apply suggestions from review

- count-distinct.asciidoc - Content restructured, moving the section about approximate counts to end of doc. - count.asciidoc - Clarified that omitting the `expression` parameter in `COUNT` is equivalent to `COUNT(*)`, which counts the number of rows. - percentile.asciidoc - Moved the note about `PERCENTILE` being approximate and non-deterministic to end of doc. - stats.asciidoc - Clarified the `STATS` command - Added a note indicating that individual `null` values are skipped during aggregation
abdonpijpelink · Jan 30, 2024 · bdb1ac9 · bdb1ac9
1 parent 631d414
commit bdb1ac9
Show file tree

Hide file tree

Showing 4 changed files with 44 additions and 39 deletions.
diff --git a/docs/reference/esql/functions/count-distinct.asciidoc b/docs/reference/esql/functions/count-distinct.asciidoc
@@ -23,29 +23,6 @@ same effect as a threshold of 40000. The default value is 3000.
 
 Returns the approximate number of distinct values.
 
-[discrete]
-[[esql-agg-count-distinct-approximate]]
-==== Counts are approximate
-
-Computing exact counts requires loading values into a set and returning its
-size. This doesn't scale when working on high-cardinality sets and/or large
-values as the required memory usage and the need to communicate those
-per-shard sets between nodes would utilize too many resources of the cluster.
-
-This `COUNT_DISTINCT` function is based on the
-https://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/40671.pdf[HyperLogLog++]
-algorithm, which counts based on the hashes of the values with some interesting
-properties:
-
-include::../../aggregations/metrics/cardinality-aggregation.asciidoc[tag=explanation]
-
-The `COUNT_DISTINCT` function takes an optional second parameter to configure
-the precision threshold. The precision_threshold options allows to trade memory
-for accuracy, and defines a unique count below which counts are expected to be
-close to accurate. Above this value, counts might become a bit more fuzzy. The
-maximum supported value is 40000, thresholds above this number will have the
-same effect as a threshold of 40000. The default value is `3000`.
-
 *Supported types*
 
 Can take any field type as input.
@@ -83,3 +60,26 @@ include::{esql-specs}/stats_count_distinct.csv-spec[tag=docsCountDistinctWithExp
 |===
 include::{esql-specs}/stats_count_distinct.csv-spec[tag=docsCountDistinctWithExpression-result]
 |===
+
+[discrete]
+[[esql-agg-count-distinct-approximate]]
+==== Counts are approximate
+
+Computing exact counts requires loading values into a set and returning its
+size. This doesn't scale when working on high-cardinality sets and/or large
+values as the required memory usage and the need to communicate those
+per-shard sets between nodes would utilize too many resources of the cluster.
+
+This `COUNT_DISTINCT` function is based on the
+https://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/40671.pdf[HyperLogLog++]
+algorithm, which counts based on the hashes of the values with some interesting
+properties:
+
+include::../../aggregations/metrics/cardinality-aggregation.asciidoc[tag=explanation]
+
+The `COUNT_DISTINCT` function takes an optional second parameter to configure
+the precision threshold. The precision_threshold options allows to trade memory
+for accuracy, and defines a unique count below which counts are expected to be
+close to accurate. Above this value, counts might become a bit more fuzzy. The
+maximum supported value is 40000, thresholds above this number will have the
+same effect as a threshold of 40000. The default value is `3000`.
diff --git a/docs/reference/esql/functions/count.asciidoc b/docs/reference/esql/functions/count.asciidoc
@@ -12,12 +12,14 @@ COUNT([expression])
 *Parameters*
 
 `expression`::
-Expression that outputs values to be counted. If omitted, returns a count all
-(the number of rows).
+Expression that outputs values to be counted.
+If omitted, equivalent to `COUNT(*)` (the number of rows).
+
 
 *Description*
 
 Returns the total number (count) of input values.
+See also <<esql-agg-count-distinct-approximate>>.
 
 *Supported types*
 

diff --git a/docs/reference/esql/functions/percentile.asciidoc b/docs/reference/esql/functions/percentile.asciidoc
@@ -23,18 +23,6 @@ Returns the value at which a certain percentage of observed values occur. For
 example, the 95th percentile is the value which is greater than 95% of the
 observed values and the 50th percentile is the <<esql-agg-median>>.
 
-[discrete]
-[[esql-agg-percentile-approximate]]
-==== `PERCENTILE` is (usually) approximate
-
-include::../../aggregations/metrics/percentile-aggregation.asciidoc[tag=approximate]
-
-[WARNING]
-====
-`PERCENTILE` is also {wikipedia}/Nondeterministic_algorithm[non-deterministic].
-This means you can get slightly different results using the same data.
-====
-
 *Example*
 
 [source.merge.styled,esql]
@@ -58,3 +46,15 @@ include::{esql-specs}/stats_percentile.csv-spec[tag=docsStatsPercentileNestedExp
 |===
 include::{esql-specs}/stats_percentile.csv-spec[tag=docsStatsPercentileNestedExpression-result]
 |===
+
+[discrete]
+[[esql-agg-percentile-approximate]]
+==== `PERCENTILE` is (usually) approximate
+
+include::../../aggregations/metrics/percentile-aggregation.asciidoc[tag=approximate]
+
+[WARNING]
+====
+`PERCENTILE` is also {wikipedia}/Nondeterministic_algorithm[non-deterministic].
+This means you can get slightly different results using the same data.
+====
diff --git a/docs/reference/esql/processing-commands/stats.asciidoc b/docs/reference/esql/processing-commands/stats.asciidoc
@@ -6,7 +6,8 @@
 
 [source,esql]
 ----
-STATS [column1 =] expression1[, ..., [columnN =] expressionN] [BY grouping_expression1[, ..., grouping_expressionN]]
+STATS [column1 =] expression1[, ..., [columnN =] expressionN] 
+[BY grouping_expression1[, ..., grouping_expressionN]]
 ----
 
 *Parameters*
@@ -21,6 +22,8 @@ An expression that computes an aggregated value.
 `grouping_expressionX`::
 An expression that outputs the values to group by.
 
+NOTE: Individual `null` values are skipped when computing aggregations.
+
 *Description*
 
 The `STATS ... BY` processing command groups rows according to a common value
@@ -86,7 +89,7 @@ include::{esql-specs}/stats.csv-spec[tag=statsGroupByMultipleValues]
 ----
 
 Both the aggregating functions and the grouping expressions accept other
-functions. This can come in useful for using `STATS...BY` on multivalue columns.
+functions. This is useful for using `STATS...BY` on multivalue columns.
 For example, to calculate the average salary change, you can use `MV_AVG` to
 first average the multiple values per employee, and use the result with the
 `AVG` function: