Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: disable PersistStats RocksDB task for v2 #14111

Merged
merged 5 commits into from
Feb 8, 2023

Conversation

tabokie
Copy link
Member

@tabokie tabokie commented Jan 31, 2023

Signed-off-by: tabokie [email protected]

What is changed and how it works?

Issue Number: Ref #12842

What's Changed:

Disable PersistStats task for v2. The task pins private memory buffer for each instance, if not disabled it may easily cause OOM. InfoLog task, on the other hand, is a no-op (for our logger implementation) so it's fine to let them run.

Related changes

Check List

Tests

  • Unit test

Release note

None

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Jan 31, 2023

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • BusyJay
  • tonyxuqqi

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 31, 2023
@tabokie tabokie changed the title server: offload engine background work server: offload engine periodic work Jan 31, 2023
@tabokie tabokie requested review from tonyxuqqi and BusyJay January 31, 2023 10:32
Signed-off-by: tabokie <[email protected]>
@ti-chi-bot ti-chi-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 31, 2023
tablet_registry.for_each_opened_tablet(|_, cached| {
if let Some(tablet) = cached.latest() {
// rocksdb::kDefaultFlushInfoLogPeriodSec = 10.
let _ = tablet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why need to be flushed explicitly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently flushing on certain size limit could cause foreground latency jitter. The periodic background flush is a safety net in case the existing manual flush sites (mostly during compaction and stuff) aren't called for a long time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we talking about info log or flushing memtables? How can it cause foreground latency? And what I mean is isn't log just written to file? Why need to flush?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, rocksdb's native logger will buffer first before writing to a file. Even for direct pwrite there's the fflush to flush the OS page buffer.

);
// rocksdb::DBOptions::stats_dump_period_sec = 600.
if count % 60 == 0 {
let _ = tablet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about making it triggered on demand like an explicit debug request? Seems useless in most case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Statistics account for a period of time. Printing it on demand is mostly useless because the time span it covers is too large.

@BusyJay
Copy link
Member

BusyJay commented Jan 31, 2023

Note rocksdb logger should be created for each tablet instead of sharing. And we need to add the {region_id}_{tablet_index} to every line of log, otherwise we will never know which rocksdb output which log.

@tabokie
Copy link
Member Author

tabokie commented Jan 31, 2023

Note rocksdb logger should be created for each tablet instead of sharing.

However we choose to create loggers (which is out of scope for this PR), we will be flushing them in TiKV. The only difference is whether to traverse the entire tablet_registry or return on first hit. (updated to set config instead of delegating the scheduler, because it turns out DumpStats is still needed for each instance, and RocksDB's native code seems more appropriate)

@tabokie tabokie changed the title server: offload engine periodic work server: disable PersistStats RocksDB task for v2 Feb 2, 2023
@ti-chi-bot ti-chi-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Feb 2, 2023
Signed-off-by: tabokie <[email protected]>
@tabokie tabokie force-pushed the 230131-offload-periodic-work branch from cd589da to 2420245 Compare February 7, 2023 03:10
@ti-chi-bot ti-chi-bot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 7, 2023
@tabokie tabokie requested a review from BusyJay February 7, 2023 03:16
@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Feb 7, 2023
@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Feb 8, 2023
@tabokie
Copy link
Member Author

tabokie commented Feb 8, 2023

/merge

@ti-chi-bot
Copy link
Member

@tabokie: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 52e5c36

@ti-chi-bot ti-chi-bot added status/can-merge Indicates a PR has been approved by a committer. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 8, 2023
@ti-chi-bot ti-chi-bot merged commit dda37a4 into tikv:master Feb 8, 2023
@ti-chi-bot ti-chi-bot added this to the Pool milestone Feb 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note-none Denotes a PR that doesn't merit a release note. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants