-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calculate Iceberg NDV with a Theta sketch #14290
Conversation
6584896
to
4e250f4
Compare
CI #14239 (pinot failure considered unrelated) |
Is there any performance difference from using that implementation? I realize we don't have choice, even if the performance is worse, but I'm just curious. |
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergTypes.java
Outdated
Show resolved
Hide resolved
i don't know From user perspective, this is an experimental and opt-in feature still, so perf regressions are fine. |
CI #13946 but -- hold your breath -- other faulttolerant failure is actually related. |
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/aggregation/DataSketchState.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Show resolved
Hide resolved
c481861
to
b64bd26
Compare
The faulttolerant tests expected that ANALYZE results are stable, and with the new implementation they are not. |
916783a
to
148b0ee
Compare
CI -- |
I discussed the problem of lack of stability with David. It's seems fine to roll with current implementation and improve stability (eg by calculating HLL as well) later, if needed. |
Reduce code noise by moving Symbol -> SymbolReference converstion to the construction method.
A hypothetical new format value should automatically get the same coverage (same number of assertions) as all the existing ones. Before the change, some assertions were run conditionally only if format is one of the known ones, which wasn't future-proof.
The test is analyzing _a table_ to calculate statistics.
ANALYZE can compute data size and number distinct values (NDV) using approximations and the result is not guaranteed to be stable. When testing ANALYZE with fault tolerance, allow the results to be differ between test run and the control run by some small percentage.
b989476
to
81cb1bd
Compare
(rebased to resolve conflict; build was green https://github.com/trinodb/trino/actions/runs/3142623927/jobs/5106437131) |
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergTypes.java
Outdated
Show resolved
Hide resolved
|
||
public static CompactSketch deserialize(Block block, int index) | ||
{ | ||
checkArgument(!block.isNull(index), "Value is null"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks asymmetrical from the serialize
method, it writes a null Block if there is no sketch set, but here you throw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-static deserialize(Block block, int index, DataSketchState state)
is a counter part to non-static serialize(DataSketchState state, BlockBuilder out)
This method is a helper only used
- in
deserialize(Block block, int index, DataSketchState state)
, conditionally after checking for non-null - in IcebergMetadata to pull final results (where it's guaranteed to be non-null, by means of the
@CombineFunction
@TypeParameter("T") | ||
public static void input(@TypeParameter("T") Type type, @AggregationState DataSketchState state, @BlockPosition @SqlType("T") Block block, @BlockIndex int index) | ||
{ | ||
verify(!block.isNull(index), "Input function is not expected to be called on a NULL input"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is ensured by the engine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think SQL spec says that NULL values do not contribute to aggregations and engine enforces that, but whole truth may be more complex :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it's enforced by the engine.
i thought it's still nice to check for null before reading some dummy/stupid value, so that code is self-descriptive.
For reference, I run a small experiment comparing For that, I've changed the output function of the
i was running on a 3-node cluster, should the distribution matter
Test query SELECT
format(
'SELECT
column_name, exact_count,
approx_count, round(abs(approx_count-exact_count)*1e2/exact_count, 1) approx_err,
theta_count, round(abs(theta_count-exact_count)*1e2/exact_count, 1) theta_err
FROM ( SELECT %s FROM orders)
CROSS JOIN UNNEST(ARRAY[%s]) AS _(column_name, exact_count, approx_count, theta_count)
;',
LISTAGG(
format('count(DISTINCT %1$s) AS "%1$s_exact", approx_distinct(%1$s) AS "%1$s_approx", "$iceberg_theta_stat"(%1$s) AS "%1$s_theta"', column_name),
U&',\000a') WITHIN GROUP (ORDER BY ordinal_position),
LISTAGG(
format('(''%1$s'', "%1$s_exact", "%1$s_approx", "%1$s_theta")', column_name),
U&',\000a') WITHIN GROUP (ORDER BY ordinal_position)
)
FROM information_schema.columns
WHERE table_schema = CURRENT_SCHEMA
AND table_name = 'orders'; Results
Same experiment, this time for
Observations:
|
- use applicable static imports - cast to primitive directly (same behavior, but looks nicer)
Move `ExpressionConverter.getIcebergLiteralValue` to `IcebergTypes` for re-use. Name it `convertTrinoValueToIceberg` to be consistent with existing `convertIcebergValueToTrino` method in that class.
- suppress intentional "redundant casts" - exact division - cast to primitive directly (looks nicer, same meaning) - remove redundant `instanceof` - put decimals together with other numbers
Iceberg specification defines the Apache DataSketches's Theta as the common data sketch for keeping track of distinct values in a table. This change replaces the use of HLL within Iceberg's ANALYZE with Theta sketch. The follow-up work is to store the serialized compact form of the sketch inside the Iceberg Puffin statistics file, but this requires Iceberg API changes, which are still in progress. A side effect of this change is that complex types (array, map, row) can no longer be analyzed: Trino can calculate a HyperLogLog for these types, while Iceberg does not specify binary representation for these types, which is required to feed data into a Theta sketch. However, NDV for complex types is not as useful as it is for scalar types, so this shouldn't matter in practice.
81cb1bd
to
90fb612
Compare
Added some code reuse in response to #14290 (comment), no other changes. |
Iceberg specification defines the Apache DataSketches's Theta as the
common data sketch for keeping track of distinct values in a table.
This change replaces the use of HLL within Iceberg's ANALYZE with Theta
sketch. The follow-up work is to store the serialized compact form of
the sketch inside the Iceberg Puffin statistics file, but this requires
Iceberg API changes, which are still in progress (e.g. apache/iceberg#4741, apache/iceberg#5794).