Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize AggregateFunction types and state representations #39420

Merged
merged 13 commits into from
Aug 4, 2022

Conversation

amosbird
Copy link
Collaborator

@amosbird amosbird commented Jul 20, 2022

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Normalize AggregateFunction types and state representations because optimizations like #35788 will treat count(not null columns) as count(), which might confuses distributed interpreters with the following error : Conversion from AggregateFunction(count) to AggregateFunction(count, Int64) is not supported.

@robot-clickhouse robot-clickhouse added the pr-improvement Pull request with some product improvements label Jul 20, 2022
@KochetovNicolai KochetovNicolai self-assigned this Jul 20, 2022
if (typeid(rhs) != typeid(*this))
return false;

/// All count(*) variants are of the same type.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the best solution for this problem.

Copy link
Member

@alexey-milovidov alexey-milovidov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will send a proposal how to do it better.

@alexey-milovidov alexey-milovidov self-assigned this Jul 21, 2022
@alexey-milovidov
Copy link
Member

It should use #25015 and #24820, see IAggregateFunction::getStateType method.

Please also fix the following cases:

  1. Materialized view when a user specified count(*) instead of count().
  2. qualtile* functions. The data types for quantile(0.5), quantile(0.9) and quantiles(0.5, 0.9) should be the same.
  3. -If and -Array combinators. For example, the data types for avg and avgIf should be the same.

@amosbird
Copy link
Collaborator Author

It should use #25015 and #24820, see IAggregateFunction::getStateType method.

Please also fix the following cases:

  1. Materialized view when a user specified count(*) instead of count().
  2. qualtile* functions. The data types for quantile(0.5), quantile(0.9) and quantiles(0.5, 0.9) should be the same.
  3. -If and -Array combinators. For example, the data types for avg and avgIf should be the same.

I think all cases are resolved now.

@amosbird amosbird changed the title Normalize AggregateFunctionCount type comparison Normalize AggregateFunction types and state representations Jul 21, 2022
@amosbird amosbird force-pushed the better-projection1-fix1 branch from b0cbb9f to 0e77179 Compare July 21, 2022 15:14
@amosbird amosbird marked this pull request as draft July 21, 2022 15:51
@amosbird
Copy link
Collaborator Author

Some tests don't work anymore. Turn into draft until they are all fixed.

@amosbird amosbird force-pushed the better-projection1-fix1 branch from 9a807e2 to 1dee71a Compare July 22, 2022 14:34
@amosbird
Copy link
Collaborator Author

amosbird commented Jul 23, 2022

AST fuzzer (UBSan, actions) — Logical error: 'Different order of columns in UNION subquery: countState(in(dummy, _subquery1435)) and '.

Not related to this PR, but triggers by this PR's test. A minimal reproducible case : SELECT * FROM (SELECT sqrt(0.5) AS a UNION ALL SELECT sqrt(dummy IN (SELECT 0))). Will try to fix it in another PR.

Integration tests (release, actions) [1/2] — fail: 1, passed: 1001, flaky: 0
test_merge_tree_azure_blob_storage/test.py::test_freeze_unfreeze
E helpers.client.QueryRuntimeException: Client failed! Return code: 233, stderr: Received exception from server (version 22.8.1):
E Code: 1001. DB::Exception: Received from 172.16.25.3:9000. DB::Exception: Azure::Storage::StorageException: 404 The specified blob does not exist.
E The specified blob does not exist.

Not related to this PR.

All other cases are green. It's ready for review.

@amosbird amosbird marked this pull request as ready for review July 23, 2022 02:45
@amosbird amosbird force-pushed the better-projection1-fix1 branch from 1dee71a to 5bf609e Compare July 26, 2022 14:40
@amosbird
Copy link
Collaborator Author

Will try to fix it in another PR

I cannot figure out a proper fix yet, only a working draft #39618

Added another case of aggregate state normalization: AggregateFunction(sum, Decimal(7, 2)) and AggregateFunction(sum, Decimal(9, 2))

This PR is complete but might be blocked by some existing issue due to new fuzz test. I'm not sure if it's proper to land now.

@amosbird amosbird force-pushed the better-projection1-fix1 branch 2 times, most recently from 94a0b66 to 99f8092 Compare July 28, 2022 03:36
@@ -0,0 +1,7 @@
-- Tags: no-s3-storage
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But there is no projection, so "no-s3-storage" tag and "projection" from the name can be removed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is minmax_count_projection which is used. I'll remove "no-s3-storage" tag as this projection works with s3 storage.

@amosbird amosbird force-pushed the better-projection1-fix1 branch from 99f8092 to 26acb37 Compare July 30, 2022 18:50
@@ -73,13 +73,19 @@ class IAggregateFunction : public std::enable_shared_from_this<IAggregateFunctio
/// Get the data type of internal state. By default it is AggregateFunction(name(params), argument_types...).
virtual DataTypePtr getStateType() const;

/// Same as the above but normalize state types so that variants with the same binary representation will use the same type.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need non-normalized State Type?
I thought, that if the states are the same, they should use the same type.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because getStateType() is used to do correct finalization of aggregate states.

-- If we return original state type for quantiles*
SELECT finalizeAggregation(quantilesTimingState(0.5)(number)) FROM numbers(10)
┌─finalizeAggregation(quantilesTimingState(0.5)(number))─┐
│ [5]                                                    │
└────────────────────────────────────────────────────────┘

-- If we return normalized state type for quantiles*, which param is always 1
SELECT finalizeAggregation(quantilesTimingState(0.5)(number)) FROM numbers(10)
┌─finalizeAggregation(quantilesTimingState(0.5)(number))─┐
│ [9]                                                    │
└────────────────────────────────────────────────────────┘

It's because of this

Other test failures if we only use normalized state types: https://s3.amazonaws.com/clickhouse-test-reports/39420/69347028c54edcedc9a43e6795c52c15ad6972ec/fast_test.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, understood. Let's keep it with two separate functions.

@amosbird amosbird force-pushed the better-projection1-fix1 branch from b4783a7 to 52fcf00 Compare August 1, 2022 06:05
@amosbird amosbird force-pushed the better-projection1-fix1 branch from 52fcf00 to 5580209 Compare August 1, 2022 13:04
@alexey-milovidov alexey-milovidov merged commit 9e46abc into ClickHouse:master Aug 4, 2022
den-crane added a commit to den-crane/ClickHouse that referenced this pull request Aug 12, 2022
mstetsyuk pushed a commit to mstetsyuk/ClickHouse that referenced this pull request Aug 13, 2022
…ize_aggregateFunction_types_and_state_representations

test for Decimal aggregateFunction normalization ClickHouse#39420
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-improvement Pull request with some product improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants