Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine algorithm for WriteAmpBasedRateLimiter #213

Merged
merged 69 commits into from
Dec 18, 2020

Conversation

tabokie
Copy link
Member

@tabokie tabokie commented Nov 23, 2020

Bugfix

Only compaction triggers auto-tuner to collect necessary data for training rate limit. When compaction frequency is low, data from long period of time is fused into one sample, causing inaccurate estimation. Fix this issue by looping through missing timeslice.

Recent window size (10s) is too small, make it 30s.

Better support for low pressure scenarios

Before this PR, flush flow is padded to 20MB/s which makes rate limit always larger than 28MB/s. After removing this restriction, we notice that it's easier to accumulate pending bytes under low pressure. Adjust the padding calculation to partially resolve this problem.

Also notice that with new formula, the minimal rate limit is still around 28MB/s.

Control reshuffle

Remove the use of long term sampler, instead enlarge the window of short term sampler. Reduce the use of ratio_delta which often causes unnecessary jitters. With algorithm being simplified, we can now deduce the actual limit by prometheus expression sum(rate(tikv_engine_compaction_flow_bytes{instance=~"$instance", db="kv", type="bytes_written"}[5m]))

Signed-off-by: tabokie <[email protected]>
Signed-off-by: tabokie <[email protected]>
Signed-off-by: tabokie <[email protected]>
Signed-off-by: tabokie <[email protected]>
Signed-off-by: tabokie <[email protected]>
Signed-off-by: tabokie <[email protected]>
This reverts commit dc096f2.

Signed-off-by: tabokie <[email protected]>
Signed-off-by: tabokie <[email protected]>
Signed-off-by: tabokie <[email protected]>
Signed-off-by: tabokie <[email protected]>
Signed-off-by: tabokie <[email protected]>
Signed-off-by: tabokie <[email protected]>
Signed-off-by: tabokie <[email protected]>
Signed-off-by: tabokie <[email protected]>
Signed-off-by: tabokie <[email protected]>
Signed-off-by: tabokie <[email protected]>
Signed-off-by: tabokie <[email protected]>
Signed-off-by: tabokie <[email protected]>
Signed-off-by: tabokie <[email protected]>
Signed-off-by: tabokie <[email protected]>
Signed-off-by: tabokie <[email protected]>
Signed-off-by: tabokie <[email protected]>
Signed-off-by: tabokie <[email protected]>
Signed-off-by: tabokie <[email protected]>
Signed-off-by: tabokie <[email protected]>
Signed-off-by: tabokie <[email protected]>
Signed-off-by: tabokie <[email protected]>
Signed-off-by: tabokie <[email protected]>
Signed-off-by: tabokie <[email protected]>
Signed-off-by: tabokie <[email protected]>
Signed-off-by: tabokie <[email protected]>
Signed-off-by: tabokie <[email protected]>
Signed-off-by: Xinye Tao <[email protected]>
Signed-off-by: tabokie <[email protected]>
@tabokie tabokie merged commit 7d209a8 into tikv:6.4.tikv Dec 18, 2020
@tabokie tabokie deleted the limiterv3-pr branch December 23, 2020 03:17
@tabokie tabokie mentioned this pull request May 9, 2022
39 tasks
tabokie added a commit to tabokie/rocksdb that referenced this pull request May 10, 2022
* Bugfix
Only compaction triggers auto-tuner to collect necessary data for training rate limit. When compaction frequency is low, data from long period of time is fused into one sample, causing inaccurate estimation. Fix this issue by looping through missing timeslice.
Recent window size (10s) is too small, make it 30s.

* Better support for low pressure scenarios
Before this PR, flush flow is padded to 20MB/s which makes rate limit always larger than 28MB/s. After removing this restriction, we notice that it's easier to accumulate pending bytes under low pressure. Adjust the padding calculation to partially resolve this problem.
Also notice that with new formula, the minimal rate limit is still around 28MB/s.

* Control reshuffle
Remove the use of long term sampler, instead enlarge the window of short term sampler. Reduce the use of `ratio_delta` which often causes unnecessary jitters. With algorithm being simplified, we can now deduce the actual limit by prometheus expression `sum(rate(tikv_engine_compaction_flow_bytes{instance=~"$instance", db="kv", type="bytes_written"}[5m]))`

* Normal pace up
Add normal pace in addition to critical pace up to reduce pending bytes issue.

Signed-off-by: tabokie <[email protected]>
tabokie added a commit that referenced this pull request May 11, 2022
* Bugfix
Only compaction triggers auto-tuner to collect necessary data for training rate limit. When compaction frequency is low, data from long period of time is fused into one sample, causing inaccurate estimation. Fix this issue by looping through missing timeslice.
Recent window size (10s) is too small, make it 30s.

* Better support for low pressure scenarios
Before this PR, flush flow is padded to 20MB/s which makes rate limit always larger than 28MB/s. After removing this restriction, we notice that it's easier to accumulate pending bytes under low pressure. Adjust the padding calculation to partially resolve this problem.
Also notice that with new formula, the minimal rate limit is still around 28MB/s.

* Control reshuffle
Remove the use of long term sampler, instead enlarge the window of short term sampler. Reduce the use of `ratio_delta` which often causes unnecessary jitters. With algorithm being simplified, we can now deduce the actual limit by prometheus expression `sum(rate(tikv_engine_compaction_flow_bytes{instance=~"$instance", db="kv", type="bytes_written"}[5m]))`

* Normal pace up
Add normal pace in addition to critical pace up to reduce pending bytes issue.

Signed-off-by: tabokie <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants