-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support deterministic failover schedule for placement rules #37251
Comments
An alternative to this proposal, is to use the leader-weight property that pd can set on stores. But it currently doesn't work as expected:
The reason is because in (3) the new leader is chosen by an election by the tikv raft group, which has no knowledge (or concern) for leader-weight. But what I would like to suggest, is that if a heartbeat is sent from a leader in a zero leader-weight store, a forced leader transfer occurs. I took a look at a quick hack to do this, but it didn't work :-) I'm hoping someone who knows pd better can help here. |
Hi @morgo, if you set the leader weight to zero, the score calculation would like An alternative method, use
you can check the implementation in: https://github.com/tikv/pd/blob/master/server/schedulers/label.go#L117-L124 |
This is great! Thank you @nolouch |
I test this scenario with the rule and this scheduler. and found that
it shows try to create an operator but failed. it is caused by the placement rule is explicitly specifies that this store should place followers, which is reasonable in the error. After I change the policy, it works. for failover, the placement policy should change from :
to
the difference between them can check the raw rule in PDthe raw rule in PD will change from :
to
|
Hi, @morgo. an easier way is just to use one policy (works in 3 regions) like:
If want to apply to the cluster level, I think we can use the raw placement rule, example: show details
|
I'd prefer to keep it in SQL rules, so its easier for other users on my team to change them if needed. It's okay though, the only other schema I need to change is It can be done with: mysql -e "ALTER DATABASE mysql PLACEMENT POLICY=defaultpolicy;"
for TABLE in `mysql mysql -BNe "SHOW TABLES"`; do
mysql mysql -e "ALTER TABLE $TABLE PLACEMENT POLICY=defaultpolicy;"
done; |
Well, do you think the |
This is essentially this feature request: #29677 There are some strange behaviors that need to be determined, but yes: I think the feature |
@morgo I confirmed that this placement policy cannot achieve the purpose of automatic switching, the problem needs the |
BTW, Do you need to set the placement policy for the |
I really suggest use cluster-level setting with raw placement rule for this scenario now. in my test met fewer problems. use
|
@nolouch Is this follower rule enforced in PD side or in TikV raft protocol? |
@nolouch Happy to try with one policy. I'm getting an error with what you pasted above though :(
Using pd-ctl from v6.2.0. |
@morgo sorry, I updated the comment in #37251 (comment). you can try again. |
To be used in a kubernetes environment (until pingcap/tidb-operator#4678 is implemented), "region" should be changed to "topology.kubernetes.io/region". |
@morgo |
@SunRunAway what is "witness"? This is not mentioned anywhere in our documentation. |
@kolbe I'm discussing a developing feature, see tikv/tikv#12876 |
Enhancement
My deployment scenario involves two "primary" regions in AWS:
I have been experimenting with placement rules with a third region:
us-east-2
. This region should only be used for quorum, as there are no application servers hosted in it. So I define a placement policy as follows:Because the pd-server supports a
weight
concept, whenus-east-1
fails I can have the pd-leader be deterministic andus-west-2
will become the leader. However, there is no deterministic behavior of where the leader of regions fordefaultpolicy
will go. They will likely balance acrossus-west-2
andus-east-2
, which is not the desired behavior.Ideally I want the priority for the leader to be in-order of the region-list. This means that
us-west-2
will become the new leader for all regions. Perhaps this could be conveyed with syntax like:In fact, if this worked deterministically for leader-scheduling and follower-scheduling, and extension of this is I could create the following:
Since the default followers is 2, it would mean that
us-west-1
won't get regions scheduled unless one of the other regions fails, which suits me perfectly. It will also mean that commit latency is only initially bad when failover tous-west-2
first occurs. Over time as regions as migrated tous-west-1
, the performance should be ~restored as quorum can be achieved on the west coast.This is a really common deployment pattern in the continental USA, so I'm hoping it can be implemented :-)
The text was updated successfully, but these errors were encountered: