From e290d32101a6bf21020fa4bf22e555e6398345d3 Mon Sep 17 00:00:00 2001 From: David Sisson Date: Wed, 25 Sep 2024 01:41:58 -0700 Subject: [PATCH] add some documentation for RelCommon and saved computations --- site/docs/relations/_config | 3 ++- site/docs/relations/common_fields.md | 28 ++++++++++++++++++++++++++++ 2 files changed, 30 insertions(+), 1 deletion(-) create mode 100644 site/docs/relations/common_fields.md diff --git a/site/docs/relations/_config b/site/docs/relations/_config index 5a13776e1..b3a6085b8 100644 --- a/site/docs/relations/_config +++ b/site/docs/relations/_config @@ -1,6 +1,7 @@ arrange: - basics.md + - common_fields.md - logical_relations.md - physical_relations.md - user_defined_relations.md - - embedded_relations.md \ No newline at end of file + - embedded_relations.md diff --git a/site/docs/relations/common_fields.md b/site/docs/relations/common_fields.md new file mode 100644 index 000000000..240bff28a --- /dev/null +++ b/site/docs/relations/common_fields.md @@ -0,0 +1,28 @@ +# Common Fields + +Every relation contains a common section containing optional hints and emit behavior. + + +## Emit + +A relation which has a direct emit kind outputs the relation's output without reordering or selection. A relation that specifies an emit output mapping can output its output columns in any order and may leave output columns out. + +???+ note "Relation Output" + + * Relations by default provide as their output the list of all of its input columns plus any generated columns as its output columns. One notable exception is aggregations which only output new columns. + + +## Hints + +Hints provide information that can improve performance but cannot be used to control the behavior. Table statistics, runtime constraints, name hints, and saved computations all fall into this category. + +???+ note "Hint Design" + + * If a hint is not present or has incorrect data the consumer should be able to arrive at the correct result. + + +### Saved Computations + +Computations can be used to save on data structure to use elsewhere. For instance, let's say we have a plan with a HashEquiJoin and an AggregateDistinct operation. The HashEquiJoin could save its hash table as part of saved computation id #1 and the AggregateDistinct could read in computation id #1. + +Now let's try a more complicated example. We have a relation that has constructs two hash tables and we'd like one of them to go to our aggregate relation still but the other to go elsewhere. We can use the computation number to select which data structure goes where. For instance computation #1 could be hash table number 1 and computation #2 could be hash table number 2. The reciving entity just needs to know which of its data structures it needs to put that computation in. So if it has 5 hash table datastructures the LoadedComputation record needs to point to the number that it intends for that incoming data to go.