Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: aggregator recovery #1622

Merged
merged 40 commits into from
Apr 25, 2022
Merged

Conversation

nautaa
Copy link
Collaborator

@nautaa nautaa commented Apr 13, 2022

support aggregator recovery from tablet restart.

  1. recreate aggregator from persistent metadata.
  2. reconstruct aggr buffer map from aggr table.
  3. update aggr from base table binlog.
    • recovery binlog offset starts from the minimize offset of all keys latest offset.
  4. add DecodeAggrval.
  5. refactor AggrBuffer.

recover logic

  • if aggr table recovers first
  1. create the aggregator
  2. init will fail
  3. when base table recovers, re-set base replicator and re-init
  • if base table recovers first
  1. when the aggr recovers, create the aggregator and the init will succeed

src/storage/aggregator.cc Outdated Show resolved Hide resolved
@github-actions
Copy link
Contributor

github-actions bot commented Apr 13, 2022

SDK Test Report

  73 files    73 suites   6m 57s ⏱️
173 tests 170 ✔️ 3 💤 0
213 runs  210 ✔️ 3 💤 0

Results for commit abe6314.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Contributor

github-actions bot commented Apr 13, 2022

Linux Test Report

       56 files       185 suites   1h 7m 43s ⏱️
  8 386 tests   8 378 ✔️ 8 💤 0
12 302 runs  12 294 ✔️ 8 💤 0

Results for commit abe6314.

♻️ This comment has been updated with latest results.

@nautaa
Copy link
Collaborator Author

nautaa commented Apr 14, 2022

  • recovery binlog offset starts from the minimize offset of all keys latest offset.

e.g.

key = 1, binlog offset = 2,4,6. the latest offset is 6. and then offset 7 is cached in volatile aggr.
key = 2, binlog offset = 9, 10. the latest offset is 10.

we need to scan the binlog after 6.

src/replica/log_replicator.h Outdated Show resolved Hide resolved
src/tablet/tablet_impl.cc Show resolved Hide resolved
src/tablet/tablet_impl.cc Outdated Show resolved Hide resolved
@nautaa nautaa marked this pull request as ready for review April 15, 2022 10:50
@codecov
Copy link

codecov bot commented Apr 15, 2022

Codecov Report

Merging #1622 (abe6314) into main (1afcf58) will increase coverage by 5.25%.
The diff coverage is 77.36%.

@@             Coverage Diff              @@
##               main    #1622      +/-   ##
============================================
+ Coverage     67.82%   73.08%   +5.25%     
  Complexity      323      323              
============================================
  Files           609      610       +1     
  Lines        118679   119206     +527     
  Branches       1000     1000              
============================================
+ Hits          80491    87117    +6626     
+ Misses        37981    31882    -6099     
  Partials        207      207              
Impacted Files Coverage Δ
src/storage/disk_table.h 88.07% <ø> (+0.91%) ⬆️
src/storage/mem_table.h 96.15% <ø> (+3.84%) ⬆️
src/storage/snapshot_test.cc 99.79% <ø> (+4.45%) ⬆️
src/storage/table.h 91.66% <ø> (+2.08%) ⬆️
src/tablet/tablet_impl.h 100.00% <ø> (ø)
src/storage/aggregator.cc 74.57% <54.87%> (-10.60%) ⬇️
src/tablet/tablet_impl.cc 57.95% <57.53%> (+14.39%) ⬆️
src/sdk/sql_cluster_router.cc 53.95% <75.00%> (+0.01%) ⬆️
src/storage/aggregator_test.cc 96.59% <94.02%> (-1.13%) ⬇️
src/replica/log_replicator.cc 81.87% <100.00%> (+2.07%) ⬆️
... and 45 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 309144e...abe6314. Read the comment docs.

src/storage/aggregator.cc Outdated Show resolved Hide resolved
src/storage/aggregator.cc Show resolved Hide resolved
src/storage/aggregator.h Outdated Show resolved Hide resolved
src/tablet/tablet_impl.cc Show resolved Hide resolved
src/tablet/tablet_impl.cc Outdated Show resolved Hide resolved
src/storage/aggregator.cc Show resolved Hide resolved
src/storage/aggregator.cc Show resolved Hide resolved
@zhanghaohit
Copy link
Collaborator

src/storage/aggregator.cc Outdated Show resolved Hide resolved
src/storage/aggregator.h Outdated Show resolved Hide resolved
src/storage/aggregator.cc Show resolved Hide resolved
src/storage/aggregator.h Outdated Show resolved Hide resolved
src/storage/aggregator.h Outdated Show resolved Hide resolved
src/tablet/tablet_impl.cc Outdated Show resolved Hide resolved
@zhanghaohit
Copy link
Collaborator

The reason why we are using binlog to do the recover of agg table, is the out of order issue which updates the already flushed results. Binlog cleaning makes it not easy to do it correct. As the deleted binlog may contains the data of the volatile AggrBuffer. FlushAll is heavy and definitely blocking the normal update.

Why not we do it in two steps:

  • recover the volatile AggrBuffer using the base table directly (can use WindowIterator)
  • recover the out of order cases using the binlog for the data that are already persistent in the aggr table (only recover the data before the start_ts of current AggrBuffer) - Here the binlog only is enough as the deleted binlog are guaranteed already applied and persistent in the aggr table.

@nautaa nautaa force-pushed the feat/aggregator_recovery branch from 1eb6630 to 5a19d4f Compare April 21, 2022 15:44
src/storage/aggregator.cc Show resolved Hide resolved
src/storage/aggregator.cc Show resolved Hide resolved
src/tablet/tablet_impl_test.cc Show resolved Hide resolved
src/storage/aggregator.cc Show resolved Hide resolved
@nautaa nautaa requested a review from zhanghaohit April 22, 2022 08:15
Copy link
Collaborator

@zhanghaohit zhanghaohit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor comments

fix the cpplint

src/storage/aggregator.cc Show resolved Hide resolved
src/storage/aggregator.cc Outdated Show resolved Hide resolved
src/tablet/tablet_impl_test.cc Show resolved Hide resolved
@@ -181,7 +181,8 @@ void AddDefaultAggregatorBaseSchema(::openmldb::api::TableMeta* table_meta) {
SchemaCodec::SetColumnDesc(table_meta->add_column_desc(), "col3", openmldb::type::DataType::kInt);
SchemaCodec::SetColumnDesc(table_meta->add_column_desc(), "col4", openmldb::type::DataType::kDouble);

SchemaCodec::SetIndex(table_meta->add_column_key(), "idx", "id", "ts_col", ::openmldb::type::kAbsoluteTime, 0, 0);
SchemaCodec::SetIndex(table_meta->add_column_key(), "idx1", "id", "ts_col", ::openmldb::type::kAbsoluteTime, 0, 0);
SchemaCodec::SetIndex(table_meta->add_column_key(), "idx2", "col3", "ts_col", ::openmldb::type::kAbsoluteTime, 0, 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cpplint

@zhanghaohit zhanghaohit self-requested a review April 24, 2022 01:32
Copy link
Collaborator

@zhanghaohit zhanghaohit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build openmldb compiling and installing storage-engine openmldb storage engine. nameserver & tablet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support fault tolerance for pre-aggregation tables
3 participants