-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature][Variant] support support schema for inner sub types in variant type #40573
base: master
Are you sure you want to change the base?
Conversation
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
run buildall |
TPC-H: Total hot run time: 38251 ms
|
TPC-DS: Total hot run time: 192755 ms
|
ClickBench: Total hot run time: 31.83 s
|
run buildall |
TeamCity be ut coverage result: |
TPC-H: Total hot run time: 38203 ms
|
TPC-DS: Total hot run time: 191297 ms
|
ClickBench: Total hot run time: 31.46 s
|
9e8806d
to
f04ceb0
Compare
run buildall |
TeamCity be ut coverage result: |
f04ceb0
to
0e0eb12
Compare
run buildall |
TeamCity be ut coverage result: |
TPC-H: Total hot run time: 38429 ms
|
TPC-DS: Total hot run time: 197213 ms
|
ClickBench: Total hot run time: 31.33 s
|
run buildall |
TPC-H: Total hot run time: 38214 ms
|
TPC-DS: Total hot run time: 192847 ms
|
TeamCity be ut coverage result: |
ClickBench: Total hot run time: 31.69 s
|
83e2850
to
bc9c36f
Compare
run buildall |
@@ -170,6 +172,8 @@ public abstract class Type { | |||
typeMap.put("MAP", Type.MAP); | |||
typeMap.put("OBJECT", Type.UNSUPPORTED); | |||
typeMap.put("ARRAY", Type.ARRAY); | |||
typeMap.put("IPV4", Type.IPV4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change need to be picked to branch-2.1.
virtual Field get_type_field(const IColumn& column, int row) const { | ||
Field field; | ||
column.get(row, field); | ||
field.set_type_info(get_type_id()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can call get_precision() and get_frac() for all data types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's slow for calling virtual functions
@@ -81,6 +81,15 @@ class DataTypeJsonb final : public IDataType { | |||
return String(value.value(), value.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe next PR
run buildall |
82ae21b
to
98224ec
Compare
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
// Currently the jsonb type should be the top level type, so we should not wrap it in array, | ||
// see create_array_of_type. | ||
// TODO we need to support array<jsonb> correctly | ||
if (UNLIKELY(field.get_type_id() == TypeIndex::JSONB && info->num_dimensions > 0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: boolean expression can be simplified by DeMorgan's theorem [readability-simplify-boolean-expr]
if (UNLIKELY(field.get_type_id() == TypeIndex::JSONB && info->num_dimensions > 0)) {
^
Additional context
be/src/common/compiler_util.h:35: expanded from macro 'UNLIKELY'
#define UNLIKELY(expr) __builtin_expect(!!(expr), 0)
^
@@ -248,7 +248,8 @@ DataTypePtr DataTypeFactory::create_data_type(const TypeDescriptor& col_desc, bo | |||
return nested; | |||
} | |||
|
|||
DataTypePtr DataTypeFactory::create_data_type(const TypeIndex& type_index, bool is_nullable) { | |||
DataTypePtr DataTypeFactory::create_data_type(const TypeIndex& type_index, bool is_nullable, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: function 'create_data_type' exceeds recommended size/complexity thresholds [readability-function-size]
DataTypePtr DataTypeFactory::create_data_type(const TypeIndex& type_index, bool is_nullable,
^
Additional context
be/src/vec/data_types/data_type_factory.cpp:250: 114 lines including whitespace and comments (threshold 80)
DataTypePtr DataTypeFactory::create_data_type(const TypeIndex& type_index, bool is_nullable,
^
@@ -20,6 +20,7 @@ | |||
#include <glog/logging.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: 'glog/logging.h' file not found [clang-diagnostic-error]
#include <glog/logging.h>
^
TeamCity be ut coverage result: |
98224ec
to
11d404d
Compare
run buildall |
TeamCity be ut coverage result: |
validateNestedType(scalarType, fieldType); | ||
if (!fieldNames.add(field.getName())) { | ||
throw new AnalysisException("Duplicate field name " + field.getName() | ||
+ " in struct " + scalarType.toSql()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in variant
@@ -276,6 +276,8 @@ TabletColumn TabletReader::materialize_column(const TabletColumn& orig) { | |||
cast_type.type); | |||
} | |||
column_with_cast_type.set_type(filed_type); | |||
column_with_cast_type.set_precision_frac(cast_type.precision, cast_type.scale); | |||
column_with_cast_type.set_is_decimal(cast_type.precision > 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not rigorous to use cast_type.precision > 0
to judge is_decimal. You can use cast_type.is_decimal_v2_type() and cast_type.is_decimal_v3_type() or add a new is_decimal_type for cast_type.
@@ -992,6 +1001,12 @@ Status VerticalSegmentWriter::_append_block_with_variant_subcolumns(RowsInBlock& | |||
auto full_path = full_path_builder.append(parent_column->name_lower_case(), false) | |||
.append(entry->path.get_parts(), false) | |||
.build(); | |||
if (typed_columns.contains(entry->path.get_path())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of typed_columns?
Field get_type_field(const IColumn& column, int row) const override { | ||
Field field; | ||
column.get(row, field); | ||
field.set_type_info(get_type_id(), 0, static_cast<int>(get_scale())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is precision == 0 right?
assert_cast<const ColumnUInt64&, TypeCheckOnRelease::DISABLE>(column); | ||
Field field; | ||
column_data.get(row, field); | ||
field.set_type_info(get_type_id(), 0, static_cast<int>(get_scale())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is precision == 0 right?
11d404d
to
a267cd5
Compare
run buildall |
TPC-H: Total hot run time: 40205 ms
|
TeamCity be ut coverage result: |
TPC-DS: Total hot run time: 197059 ms
|
ClickBench: Total hot run time: 32.62 s
|
run buildall |
9564428
to
bec313e
Compare
run buildall |
Background
Currently we support auto detect schema info from semi-structure json, but when we encounter something like '2021-01-01' we just store as string type and query like string type, if we use it as date type, it maybe slow.So in this PR, we support to specify subschemas for variant type.
Usage
Design
like struct, we store we predefined fields as static type and no need to do schema merge