-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent placement rules when retrying truncating tables/partitions #31540
Comments
There are two possible solutions in my mind.:
I prefer the solution 1, simple and easy, compatibility is possible with a careful taking on job arguments. Solution 2 is good for compatibility since no need of job argument changes. But it does need a new middle state, which is different from previous DDLs. |
For solution 1, we still need to handle some corner cases. Because the ddl operation is concurrent before inqueue, it can happens that after allocating the new partition ids, the number of partition changes. Though it may be not so easy to happen but I think we should add some checks... |
If there are eventually redundant PD rules for TiFlash A, and if PD can't schedule regions away from TiFlash A, it may cause failure when scale in. |
From the code, partitions to be truncated are decided at the time of submiting/enqueuing. Line 3480 in aa7ad03
So we don't actually truncate new partitions if new partitions are added after submitting. I don't think there is a problem, only if you say the current behavior is a bug.. EDIT: TruncateTable does have the problem, though, https://github.com/pingcap/tidb/blob/master/ddl/table.go#L603-L611 |
It does not affect the behavior, at least for the current code base. As long as the id is allocated increasing monotonically. But it does damage the performance of PD, if there are hundreds of redundant PD rules. |
In current version of TiFlash Cluster Manager(the old one that is in use), all pd rules are set when TableID is allocated, so will result in no redundant pd rules. However, while the CM is going to be moved into TiDB server, pd rules will be set immediately while handling ddlJob, which can result in redundant pd rules. I think these two behaviors are different |
Yeah, but you misinterpret me. I mean redundant rules does not affect the schedule of PD, since id is allocated increasing monotonically, which means redundant rules are just no-op rules. |
So, the redundant rule |
Yes, rules are applied to regions. If there are no regions(no finished DDL to generate regions of that ID), rules are, well, redundant, as it says. And it does not affect the schedule at all. |
Bug Report
tidb/ddl/table.go
Lines 603 to 611 in 4f30a14
tidb/ddl/partition.go
Lines 1177 to 1192 in aa7ad03
Since truncating tables/partitions will allocate id just before placement rules operations, every retrying job will lead to placement rules of different IDs. It breaks idempotence.
It is reported by @CalvinNeo when testing tiflash placement rules, caused by
writing conflict of concurrent DDL
.While it is a correctness problem, retrying only occurs with heavy DDL load. As an experimental feature, it does not need to be critical or major bug.
1. Minimal reproduce step (Required)
Start many concurrent session to execute truncate DDLs.
2. What did you expect to see? (Required)
Writing conflict leads to retrying, but retrying jobs have same IDs.
3. What did you see instead (Required)
Writing conflict leads to retrying, but retrying jobs have different IDs.
4. What is your TiDB version? (Required)
The text was updated successfully, but these errors were encountered: