-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High Availability: improve the HA of TiKV. #15909
Labels
type/enhancement
The issue or PR belongs to an enhancement.
Comments
LykxSassinator
added
the
type/enhancement
The issue or PR belongs to an enhancement.
label
Nov 2, 2023
ti-chi-bot bot
pushed a commit
that referenced
this issue
Nov 16, 2023
ref #15909 Signed-off-by: lucasliang <[email protected]>
ti-chi-bot bot
pushed a commit
that referenced
this issue
Nov 17, 2023
ref #15909, close #16011 Signed-off-by: lucasliang <[email protected]> Co-authored-by: lucasliang <[email protected]> Co-authored-by: tonyxuqqi <[email protected]>
ti-chi-bot bot
pushed a commit
that referenced
this issue
Nov 23, 2023
…15908) ref #15909 Make raftstore perspect the jitters of network-io by backporting the implementation from raftstore-v2. Signed-off-by: lucasliang <[email protected]>
2 tasks
ti-chi-bot bot
pushed a commit
that referenced
this issue
Dec 5, 2023
ref #15909 In the previous implementation, SlowScore identified a node as slow if it had hotspot regions. That is, previous SlowScore has fairly high false-positive rate. Moreover, this approach needs adjustment in sensitivity to promptly detect I/O jitters. To address this, this pr refines the algorithm by incorporating CPU usage as an additional condition to determine whether a node is slow. And based on our testing records, this modification significantly reduces the false-positive rate. Additionally, this pr has updated the default value of `inspect-interval` to `100ms` to enhance sensitivity and improve overall performance. Signed-off-by: lucasliang <[email protected]> Co-authored-by: tonyxuqqi <[email protected]>
LykxSassinator
added a commit
to LykxSassinator/tikv
that referenced
this issue
Dec 7, 2023
ref tikv#15909 In the previous implementation, SlowScore identified a node as slow if it had hotspot regions. That is, previous SlowScore has fairly high false-positive rate. Moreover, this approach needs adjustment in sensitivity to promptly detect I/O jitters. To address this, this pr refines the algorithm by incorporating CPU usage as an additional condition to determine whether a node is slow. And based on our testing records, this modification significantly reduces the false-positive rate. Additionally, this pr has updated the default value of `inspect-interval` to `100ms` to enhance sensitivity and improve overall performance. Signed-off-by: lucasliang <[email protected]> Co-authored-by: tonyxuqqi <[email protected]> Signed-off-by: lucasliang <[email protected]>
This was referenced Dec 7, 2023
ti-chi-bot bot
pushed a commit
that referenced
this issue
Dec 8, 2023
ref #15909 Cherry-pick the fine-tuning works on SlowScore from nightly to release-7.5. Signed-off-by: lucasliang <[email protected]> Co-authored-by: tonyxuqqi <[email protected]>
ti-chi-bot bot
pushed a commit
to tikv/pd
that referenced
this issue
Dec 18, 2023
close #7564, ref tikv/tikv#15909 Enable `evict-slow-store` scheduler by default. Signed-off-by: lucasliang <[email protected]>
This issue is holding for tracking all HA issues found in TiKV. If anyone else finds other issues on HA, pls help us to link it with them. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Development Task
It's a long-term issue to improve the HA of TiKV, and we'll keep tracking them and find some approaches to fix or mitigate these problems step by step.
Currently, we've found several issues needs to be tackled:
And by now, we have following tracking tasts:
The text was updated successfully, but these errors were encountered: