Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High Availability: improve the HA of TiKV. #15909

Open
4 tasks done
LykxSassinator opened this issue Nov 2, 2023 · 2 comments
Open
4 tasks done

High Availability: improve the HA of TiKV. #15909

LykxSassinator opened this issue Nov 2, 2023 · 2 comments
Labels
type/enhancement The issue or PR belongs to an enhancement.

Comments

@LykxSassinator
Copy link
Contributor

LykxSassinator commented Nov 2, 2023

Development Task

It's a long-term issue to improve the HA of TiKV, and we'll keep tracking them and find some approaches to fix or mitigate these problems step by step.

Currently, we've found several issues needs to be tackled:

And by now, we have following tracking tasts:

  • Fine-tune slow-score to make it enabled by default.
    • Fine-tune slow-score detection algorithm to reducing the false-positive rate.
    • Enable evict-slow-score scheduler by default.
  • Backport the previous works on detecting network I/O jitters to raftstore from raftstore-v2.
@LykxSassinator LykxSassinator added the type/enhancement The issue or PR belongs to an enhancement. label Nov 2, 2023
ti-chi-bot bot pushed a commit that referenced this issue Nov 16, 2023
ti-chi-bot bot pushed a commit that referenced this issue Nov 17, 2023
ref #15909, close #16011

Signed-off-by: lucasliang <[email protected]>

Co-authored-by: lucasliang <[email protected]>
Co-authored-by: tonyxuqqi <[email protected]>
ti-chi-bot bot pushed a commit that referenced this issue Nov 23, 2023
…15908)

ref #15909

Make raftstore perspect the jitters of network-io by backporting the implementation from raftstore-v2.

Signed-off-by: lucasliang <[email protected]>
ti-chi-bot bot pushed a commit that referenced this issue Dec 5, 2023
ref #15909

In the previous implementation, SlowScore identified a node as slow if it had hotspot regions.
That is, previous SlowScore has fairly high false-positive rate. Moreover, this approach needs 
adjustment in sensitivity to promptly detect I/O jitters.

To address this, this pr refines the algorithm by incorporating CPU usage as an additional
condition to determine whether a node is slow. And based on our testing records, this
modification significantly reduces the false-positive rate.

Additionally, this pr has updated the default value of `inspect-interval` to `100ms` to enhance
sensitivity and improve overall performance.

Signed-off-by: lucasliang <[email protected]>

Co-authored-by: tonyxuqqi <[email protected]>
LykxSassinator added a commit to LykxSassinator/tikv that referenced this issue Dec 7, 2023
ref tikv#15909

In the previous implementation, SlowScore identified a node as slow if it had hotspot regions.
That is, previous SlowScore has fairly high false-positive rate. Moreover, this approach needs
adjustment in sensitivity to promptly detect I/O jitters.

To address this, this pr refines the algorithm by incorporating CPU usage as an additional
condition to determine whether a node is slow. And based on our testing records, this
modification significantly reduces the false-positive rate.

Additionally, this pr has updated the default value of `inspect-interval` to `100ms` to enhance
sensitivity and improve overall performance.

Signed-off-by: lucasliang <[email protected]>

Co-authored-by: tonyxuqqi <[email protected]>
Signed-off-by: lucasliang <[email protected]>
ti-chi-bot bot pushed a commit that referenced this issue Dec 8, 2023
ref #15909

Cherry-pick the fine-tuning works on SlowScore from nightly to release-7.5.

Signed-off-by: lucasliang <[email protected]>

Co-authored-by: tonyxuqqi <[email protected]>
ti-chi-bot bot pushed a commit to tikv/pd that referenced this issue Dec 18, 2023
close #7564, ref tikv/tikv#15909

Enable `evict-slow-store` scheduler by default.

Signed-off-by: lucasliang <[email protected]>
@LykxSassinator
Copy link
Contributor Author

LykxSassinator commented May 31, 2024

This issue is holding for tracking all HA issues found in TiKV. If anyone else finds other issues on HA, pls help us to link it with them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

No branches or pull requests

1 participant