Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add insertMany interface for IColumn #8925

Merged
merged 6 commits into from
Apr 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions dbms/src/Columns/ColumnConst.h
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,8 @@ class ColumnConst final : public COWPtrHelper<IColumn, ColumnConst>
s += position_vec.size();
}

void insertMany(const Field &, size_t length) override { s += length; }

void insertDefault() override { ++s; }

void insertManyDefaults(size_t length) override { s += length; }
Expand Down
7 changes: 6 additions & 1 deletion dbms/src/Columns/ColumnVector.h
Original file line number Diff line number Diff line change
Expand Up @@ -229,6 +229,11 @@ class ColumnVector final : public COWPtrHelper<ColumnVectorHelper, ColumnVector<
data[i + old_size] = src_container[position_vec[i]];
}

void insertMany(const Field & field, size_t length) override
{
data.resize_fill(data.size() + length, static_cast<T>(field.get<T>()));
}

JaySon-Huang marked this conversation as resolved.
Show resolved Hide resolved
void insertData(const char * pos, size_t /*length*/) override { data.push_back(*reinterpret_cast<const T *>(pos)); }

bool decodeTiDBRowV2Datum(
Expand All @@ -243,7 +248,7 @@ class ColumnVector final : public COWPtrHelper<ColumnVectorHelper, ColumnVector<
{
throw Exception("Invalid float value length " + std::to_string(length), ErrorCodes::LOGICAL_ERROR);
}
constexpr UInt64 SIGN_MASK = static_cast<UInt64>(1) << 63;
constexpr UInt64 SIGN_MASK = static_cast<UInt64>(1) << 63; // NOLINT(readability-identifier-naming)
auto num = readBigEndian<UInt64>(raw_value.c_str() + cursor);
if (num & SIGN_MASK)
num ^= SIGN_MASK;
Expand Down
13 changes: 12 additions & 1 deletion dbms/src/Columns/IColumn.h
Original file line number Diff line number Diff line change
Expand Up @@ -130,18 +130,29 @@ class IColumn : public COWPtr<IColumn>

/// Appends n-th element from other column with the same type.
/// Is used in merge-sort and merges. It could be implemented in inherited classes more optimally than default implementation.
/// Note: the source column and the destination column must be of the same type, can not ColumnXXX->insertFrom(ConstColumnXXX, ...)
virtual void insertFrom(const IColumn & src, size_t n) { insert(src[n]); }

/// Appends range of elements from other column.
/// Appends range of elements from other column with the same type.
/// Could be used to concatenate columns.
/// Note: the source column and the destination column must be of the same type, can not ColumnXXX->insertRangeFrom(ConstColumnXXX, ...)
virtual void insertRangeFrom(const IColumn & src, size_t start, size_t length) = 0;

/// Appends one element from other column with the same type multiple times.
/// Note: the source column and the destination column must be of the same type, can not ColumnXXX->insertManyFrom(ConstColumnXXX, ...)
virtual void insertManyFrom(const IColumn & src, size_t position, size_t length) = 0;

/// Appends disjunctive elements from other column with the same type.
/// Note: the source column and the destination column must be of the same type, can not ColumnXXX->insertDisjunctFrom(ConstColumnXXX, ...)
virtual void insertDisjunctFrom(const IColumn & src, const std::vector<size_t> & position_vec) = 0;

/// Appends one field multiple times. Can be optimized in inherited classes.
virtual void insertMany(const Field & field, size_t length)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May reduce virtual function call like this:

template <typename Derived>
std::vector<MutablePtr> scatterImpl(ColumnIndex num_columns, const Selector & selector) const

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be called as IColumn->insertMany(v)

Copy link
Contributor

@yibin87 yibin87 Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the insert virutal function is called inside the loop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compiler can help optimize this, no vtable lookup here https://gcc.godbolt.org/z/Y48rMoqqc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, think like this: different derived instances can call insertMany function, thus optimizer can't infer which insert function to be called next. Thus optimizer should lookup the vtable of current instance to find the actual insert function address. That's what "call qword ptr [rax + 8]" do, I guess.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, I think it is ok to improve it later.

{
for (size_t i = 0; i < length; ++i)
insert(field);
}

/// Appends data located in specified memory chunk if it is possible (throws an exception if it cannot be implemented).
/// Is used to optimize some computations (in aggregation, for example).
/// Parameter length could be ignored if column values have fixed size.
Expand Down
118 changes: 73 additions & 45 deletions dbms/src/Columns/tests/gtest_column_insertFrom.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ namespace tests
class TestColumnInsertFrom : public ::testing::Test
{
public:
void compareColumn(
static void compareColumn(
const ColumnWithTypeAndName & expected_col_with_type_name,
const ColumnWithTypeAndName & actual_col_with_type_name)
{
Expand All @@ -37,16 +37,16 @@ class TestColumnInsertFrom : public ::testing::Test
if unlikely (typeid_cast<const ColumnSet *>(expected.get()) || typeid_cast<const ColumnSet *>(actual.get()))
{
/// ColumnSet compares size only now, since the test ensures data is equal
const ColumnSet * expected_set = typeid_cast<const ColumnSet *>(expected.get());
const ColumnSet * actual_set = typeid_cast<const ColumnSet *>(actual.get());
const auto * expected_set = typeid_cast<const ColumnSet *>(expected.get());
const auto * actual_set = typeid_cast<const ColumnSet *>(actual.get());
ASSERT_TRUE(expected_set && actual_set);
ASSERT_TRUE(expected_set->size() == actual_set->size());
return;
}
ASSERT_COLUMN_EQ(expected, actual);
}

void doTestWork(ColumnWithTypeAndName & col_with_type_and_name)
void doTestWork(ColumnWithTypeAndName & col_with_type_and_name) const
{
auto column_ptr = col_with_type_and_name.column;
ASSERT_TRUE(rows == column_ptr->size());
Expand All @@ -57,54 +57,82 @@ class TestColumnInsertFrom : public ::testing::Test
}

/// Test insertManyFrom
for (size_t i = 0; i < 3; ++i)
cols[0]->insertFrom(*column_ptr, 1);
for (size_t i = 0; i < 3; ++i)
cols[0]->insertFrom(*column_ptr, 1);
cols[1]->insertManyFrom(*column_ptr, 1, 3);
cols[1]->insertManyFrom(*column_ptr, 1, 3);
{
ColumnWithTypeAndName ref(std::move(cols[0]), col_with_type_and_name.type, "");
ColumnWithTypeAndName result(std::move(cols[1]), col_with_type_and_name.type, "");
compareColumn(ref, result);
for (size_t i = 0; i < 3; ++i)
cols[0]->insertFrom(*column_ptr, 1);
for (size_t i = 0; i < 3; ++i)
cols[0]->insertFrom(*column_ptr, 1);
cols[1]->insertManyFrom(*column_ptr, 1, 3);
cols[1]->insertManyFrom(*column_ptr, 1, 3);
{
ColumnWithTypeAndName ref(std::move(cols[0]), col_with_type_and_name.type, "");
ColumnWithTypeAndName result(std::move(cols[1]), col_with_type_and_name.type, "");
compareColumn(ref, result);
}
}

/// Test insertDisjunctFrom
for (size_t i = 0; i < 2; ++i)
{
cols[i] = column_ptr->cloneEmpty();
}
std::vector<size_t> position_vec;
position_vec.push_back(0);
position_vec.push_back(2);
position_vec.push_back(4);
for (size_t position : position_vec)
cols[0]->insertFrom(*column_ptr, position);
for (size_t position : position_vec)
cols[0]->insertFrom(*column_ptr, position);
cols[1]->insertDisjunctFrom(*column_ptr, position_vec);
cols[1]->insertDisjunctFrom(*column_ptr, position_vec);
{
ColumnWithTypeAndName ref(std::move(cols[0]), col_with_type_and_name.type, "");
ColumnWithTypeAndName result(std::move(cols[1]), col_with_type_and_name.type, "");
compareColumn(ref, result);
for (size_t i = 0; i < 2; ++i)
{
cols[i] = column_ptr->cloneEmpty();
}
std::vector<size_t> position_vec;
position_vec.push_back(0);
position_vec.push_back(2);
position_vec.push_back(4);
for (size_t position : position_vec)
cols[0]->insertFrom(*column_ptr, position);
for (size_t position : position_vec)
cols[0]->insertFrom(*column_ptr, position);
cols[1]->insertDisjunctFrom(*column_ptr, position_vec);
cols[1]->insertDisjunctFrom(*column_ptr, position_vec);
{
ColumnWithTypeAndName ref(std::move(cols[0]), col_with_type_and_name.type, "");
ColumnWithTypeAndName result(std::move(cols[1]), col_with_type_and_name.type, "");
compareColumn(ref, result);
}
}

/// Test insertManyDefaults
for (size_t i = 0; i < 2; ++i)
{
cols[i] = column_ptr->cloneEmpty();
for (size_t i = 0; i < 2; ++i)
{
cols[i] = column_ptr->cloneEmpty();
}
for (size_t i = 0; i < 3; ++i)
cols[0]->insertDefault();
for (size_t i = 0; i < 3; ++i)
cols[0]->insertDefault();
cols[1]->insertManyDefaults(3);
cols[1]->insertManyDefaults(3);
{
ColumnWithTypeAndName ref(std::move(cols[0]), col_with_type_and_name.type, "");
ColumnWithTypeAndName result(std::move(cols[1]), col_with_type_and_name.type, "");
compareColumn(ref, result);
}
}
for (size_t i = 0; i < 3; ++i)
cols[0]->insertDefault();
for (size_t i = 0; i < 3; ++i)
cols[0]->insertDefault();
cols[1]->insertManyDefaults(3);
cols[1]->insertManyDefaults(3);

/// Test insertMany
{
ColumnWithTypeAndName ref(std::move(cols[0]), col_with_type_and_name.type, "");
ColumnWithTypeAndName result(std::move(cols[1]), col_with_type_and_name.type, "");
compareColumn(ref, result);
for (size_t i = 0; i < 2; ++i)
cols[i] = column_ptr->cloneEmpty();
for (size_t i = 0; i < 6; ++i)
cols[0]->insertFrom(*column_ptr, 1);
if (unlikely(
typeid_cast<const ColumnNothing *>(column_ptr.get())
|| typeid_cast<const ColumnSet *>(column_ptr.get())))
{
/// ColumnNothing and ColumnSet are not allowed to insertMany
return;
}
auto v = (*column_ptr)[1];
cols[1]->insertMany(v, 6);
{
ColumnWithTypeAndName ref(std::move(cols[0]), col_with_type_and_name.type, "");
ColumnWithTypeAndName result(std::move(cols[1]), col_with_type_and_name.type, "");
compareColumn(ref, result);
}
}
}
const size_t rows = 6;
Expand Down Expand Up @@ -232,10 +260,10 @@ try
{
auto string_col = createColumn<String>({"1", "2", "3", "4", "5", "6"}).column;
auto int_col = createColumn<UInt64>({1, 2, 3, 4, 5, 6}).column;
MutableColumns mutableColumns;
mutableColumns.push_back(string_col->assumeMutable());
mutableColumns.push_back(int_col->assumeMutable());
auto col = ColumnTuple::create(std::move(mutableColumns));
MutableColumns mutable_columns;
mutable_columns.push_back(string_col->assumeMutable());
mutable_columns.push_back(int_col->assumeMutable());
auto col = ColumnTuple::create(std::move(mutable_columns));
auto col_with_type_and_name
= ColumnWithTypeAndName{std::move(col), std::make_shared<DataTypeString>(), String("col")};
doTestWork(col_with_type_and_name);
Expand Down
53 changes: 0 additions & 53 deletions tests/tidb-ci/fullstack-test-dt/timestamp_with_timezone.test
Original file line number Diff line number Diff line change
Expand Up @@ -107,59 +107,6 @@ mysql> use test; set time_zone='Asia/Shanghai'; set tidb_enable_chunk_rpc=0; set
| 1 | 2020-01-02 00:11:11 | a | 1 | 2020-01-02 00:11:11 | a |
| 2 | 2020-01-03 05:11:11 | b | 2 | 2020-01-03 05:11:11 | b |
+------+---------------------+-----------+------+---------------------+-----------+
mysql> use test; set time_zone='Asia/Shanghai'; set tidb_enable_chunk_rpc=0; set session tidb_isolation_read_engines='tiflash'; set session tidb_opt_broadcast_join=1; select /*+ broadcast_join(t1,t2) */ count(t1.set_value), count(t2.set_value), t1.value from t1 join t2 on t1.value = t2.value group by t1.value order by value;
+---------------------+---------------------+---------------------+
| count(t1.set_value) | count(t2.set_value) | value |
+---------------------+---------------------+---------------------+
| 1 | 1 | 2020-01-02 00:11:11 |
| 1 | 1 | 2020-01-03 05:11:11 |
+---------------------+---------------------+---------------------+
# default encode in tidb, chunk encode in tiflash, utc timezone
mysql> use test; set time_zone='UTC'; set tidb_enable_chunk_rpc=0; set session tidb_isolation_read_engines='tiflash'; set session tidb_opt_broadcast_join=1; select /*+ broadcast_join(t1,t2) */ t1.id, t1.value, t2.id, t2.value from t1 join t2 on t1.value = t2.value order by t1.id;
+------+---------------------+------+---------------------+
| id | value | id | value |
+------+---------------------+------+---------------------+
| 1 | 2020-01-01 16:11:11 | 1 | 2020-01-01 16:11:11 |
| 2 | 2020-01-02 21:11:11 | 2 | 2020-01-02 21:11:11 |
+------+---------------------+------+---------------------+
mysql> use test; set time_zone='UTC'; set tidb_enable_chunk_rpc=0; set session tidb_isolation_read_engines='tiflash'; set session tidb_opt_broadcast_join=1; select /*+ broadcast_join(t1,t2) */ count(*), t1.value from t1 join t2 on t1.value = t2.value group by t1.value order by value;
+----------+---------------------+
| count(*) | value |
+----------+---------------------+
| 1 | 2020-01-01 16:11:11 |
| 1 | 2020-01-02 21:11:11 |
+----------+---------------------+
# chunk encode in tidb, chunk encode in tiflash, non-utc timezone
mysql> use test; set time_zone='Asia/Shanghai'; set tidb_enable_chunk_rpc=1; set session tidb_isolation_read_engines='tiflash'; set session tidb_opt_broadcast_join=1; select /*+ broadcast_join(t1,t2) */ t1.id, t1.value, t2.id, t2.value from t1 join t2 on t1.value = t2.value order by t1.id;
+------+---------------------+------+---------------------+
| id | value | id | value |
+------+---------------------+------+---------------------+
| 1 | 2020-01-02 00:11:11 | 1 | 2020-01-02 00:11:11 |
| 2 | 2020-01-03 05:11:11 | 2 | 2020-01-03 05:11:11 |
+------+---------------------+------+---------------------+
mysql> use test; set time_zone='Asia/Shanghai'; set tidb_enable_chunk_rpc=1; set session tidb_isolation_read_engines='tiflash'; set session tidb_opt_broadcast_join=1; select /*+ broadcast_join(t1,t2) */ count(*), t1.value from t1 join t2 on t1.value = t2.value group by t1.value order by value;
+----------+---------------------+
| count(*) | value |
+----------+---------------------+
| 1 | 2020-01-02 00:11:11 |
| 1 | 2020-01-03 05:11:11 |
+----------+---------------------+

# chunk encode in tidb, chunk encode in tiflash, utc timezone
mysql> use test; set time_zone='UTC'; set tidb_enable_chunk_rpc=1; set session tidb_isolation_read_engines='tiflash'; set session tidb_opt_broadcast_join=1; select /*+ broadcast_join(t1,t2) */ t1.id, t1.value, t2.id, t2.value from t1 join t2 on t1.value = t2.value order by t1.id;
+------+---------------------+------+---------------------+
| id | value | id | value |
+------+---------------------+------+---------------------+
| 1 | 2020-01-01 16:11:11 | 1 | 2020-01-01 16:11:11 |
| 2 | 2020-01-02 21:11:11 | 2 | 2020-01-02 21:11:11 |
+------+---------------------+------+---------------------+
mysql> use test; set time_zone='UTC'; set tidb_enable_chunk_rpc=1; set session tidb_isolation_read_engines='tiflash'; set session tidb_opt_broadcast_join=1; select /*+ broadcast_join(t1,t2) */ count(*), t1.value from t1 join t2 on t1.value = t2.value group by t1.value order by value;
+----------+---------------------+
| count(*) | value |
+----------+---------------------+
| 1 | 2020-01-01 16:11:11 |
| 1 | 2020-01-02 21:11:11 |
+----------+---------------------+

mysql> drop table if exists test.t1;
mysql> drop table if exists test.t2;