diff --git a/docs/RFCS/distributed_sql.md b/docs/RFCS/distributed_sql.md index 5a7545193e93..dbfd8ca99734 100644 --- a/docs/RFCS/distributed_sql.md +++ b/docs/RFCS/distributed_sql.md @@ -17,6 +17,7 @@ * [Example 1](#example-1) * [Example 2](#example-2) * [Example 3](#example-3) + * [Types of aggregators](#types-of-aggregators) * [From logical to physical](#from-logical-to-physical) * [Processors](#processors) * [Joins](#joins) @@ -431,6 +432,52 @@ AGGREGATOR final: Composition: src -> countdistinctmin -> final ``` +### Types of aggregators + +- `TABLE READER` is a special agregator, with no input stream. It's configured + with spans of a table or index and the schema that it needs to read. + Like every other aggregator, it can be configured with a programmable output + filter. +- `PROGRAM` is a fully programmable no-grouping aggregator. It runs a "program" + on each individual row. The program can drop the row, or modify it + arbitrarily. +- `JOIN` performs a join on two streams, with equality constraints between + certain columns. The aggregator is grouped on the columns that are + constrained to be equal. See [Stream joins](#stream-joins). +- `JOIN READER` performs point-lookups for rows with the kyes indicated by the + input stream. It can do so by performing (potentially remote) KV reads, or by + setting up remote flows. See [Join-by-lookup](#join-by-lookup) and + [On-the-fly flows setup](#on-the-fly-flows-setup). +- `MUTATE` performs insertions/deletions/updates to KV. See section TODO. +- `SET OPERATION` takes several inputs and performs set arithmetic on them + (union, difference). +- `AGGREGATOR` is the one that does "aggregation" in the SQL sense. It groups + rows and computes an aggregate for each group. The group is configured using + the group key. `AGGREGATOR` can be configured with one or more aggregation + functions: + - `SUM` + - `COUNT` + - `COUNT DISTINCT` + - `DISTINCT` + `AGGREGATOR`'s output schema consists of the group key, plus a configurable + subset of the the generated aggregated values. The optional output filter has + access to the group key and all the aggregagated values (i.e. it can use even + values that are not ultimately outputted). +- `SORT` sorts the input according to a configurable set of columns. Note that + this is a no-grouping aggregator, hence it can be distributed arbitrarily to + the data producers. This means, of course, that it doesn't produce a global + ordering, instead it just guarantees an intra-stream ordering on each + physical output streams). The global ordering, when needed, is achieved by an + input synchronizer of a grouped processor (such as `LIMIT` or `FINAL`). +- `LIMIT` is a single-group aggregator that stops after reading so many input + rows. +- `INTENT-COLLECTOR` is a single-group aggregator, scheduled on the gateway, + that receives all the intents generated by a `MUTATE` and keeps track of them + in memory until the transaction is committed. +- `FINAL` is a single-group aggregator, scheduled on the gateway, that collects + the results of the query. This aggregator will be hooked up to the pgwire + connection to the client. + ## From logical to physical To distribute the computation that was described in terms of aggregators and