-
Notifications
You must be signed in to change notification settings - Fork 461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GLUTEN-8528][CH]Support approx_count_distinct #8550
base: main
Are you sure you want to change the base?
Conversation
Run Gluten ClickHouse CI on ARM |
@CodiumAI-Agent /review |
PR Reviewer Guide 🔍(Review updated until commit 92e9224)Here are some key observations to aid the review process:
|
Native 0: jdbc:hive2://localhost:10000/> select approx_count_distinct(id, 0.001), approx_count_distinct(id, 0.01), approx_count_distinct(id, 0.1) from range(1000);
+----------------------------+----------------------------+----------------------------+
| approx_count_distinct(id) | approx_count_distinct(id) | approx_count_distinct(id) |
+----------------------------+----------------------------+----------------------------+
| 999 | 996 | 928 |
+----------------------------+----------------------------+----------------------------+
1 row selected (5.82 seconds)
0: jdbc:hive2://localhost:10000/>
0: jdbc:hive2://localhost:10000/> set spark.gluten.enabled = false;
+-----------------------+--------+
| key | value |
+-----------------------+--------+
| spark.gluten.enabled | false |
+-----------------------+--------+
1 row selected (0.137 seconds)
0: jdbc:hive2://localhost:10000/> select approx_count_distinct(id, 0.001), approx_count_distinct(id, 0.01), approx_count_distinct(id, 0.1) from range(1000);
+----------------------------+----------------------------+----------------------------+
| approx_count_distinct(id) | approx_count_distinct(id) | approx_count_distinct(id) |
+----------------------------+----------------------------+----------------------------+
| 999 | 996 | 928 |
+----------------------------+----------------------------+----------------------------+
1 row selected (149.915 seconds) |
Persistent review updated to latest commit 92e9224 |
Run Gluten ClickHouse CI on ARM |
Run Gluten ClickHouse CI on ARM |
Run Gluten ClickHouse CI on ARM |
Lets' enable spark hll UT to see what will happen. |
Run Gluten ClickHouse CI on ARM |
done. |
Run Gluten ClickHouse CI on ARM |
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better add some comment about HLLPP for later feature readers.
#include <DataTypes/DataTypeNullable.h> | ||
#include <Poco/Logger.h> | ||
#include <Common/logger_useful.h> | ||
#include "DataTypes/DataTypeAggregateFunction.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use <>
instead of "
in include clause
@@ -25,6 +25,7 @@ | |||
#include <Parser/TypeParser.h> | |||
#include <Common/CHUtil.h> | |||
#include <Common/Exception.h> | |||
#include <Common/logger_useful.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Useless header
|
||
inline static const std::vector<std::vector<double>> BIAS_DATA = { | ||
// precision 4 | ||
{10, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里格式化下吧每行10个元素。
|
||
struct HyperLogLogPlusPlusData | ||
{ | ||
explicit HyperLogLogPlusPlusData(double relative_sd_ = 0.05) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use Float64 for consistency
What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
(Fixes: #8528)
How was this patch tested?
New added uts