Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent placement rules when retrying truncating tables/partitions #31540

Open
Tracked by #18030
xhebox opened this issue Jan 11, 2022 · 9 comments
Open
Tracked by #18030

Inconsistent placement rules when retrying truncating tables/partitions #31540

xhebox opened this issue Jan 11, 2022 · 9 comments
Assignees
Labels
affects-5.4 This bug affects the 5.4.x(LTS) versions. feature/developing the related feature is in development severity/moderate sig/sql-infra SIG: SQL Infra type/bug The issue is confirmed as a bug.

Comments

@xhebox
Copy link
Contributor

xhebox commented Jan 11, 2022

Bug Report

tidb/ddl/table.go

Lines 603 to 611 in 4f30a14

var oldPartitionIDs []int64
if tblInfo.GetPartitionInfo() != nil {
oldPartitionIDs = getPartitionIDs(tblInfo)
// We use the new partition ID because all the old data is encoded with the old partition ID, it can not be accessed anymore.
err = truncateTableByReassignPartitionIDs(t, tblInfo)
if err != nil {
return ver, errors.Trace(err)
}
}

tidb/ddl/partition.go

Lines 1177 to 1192 in aa7ad03

newPartitions := make([]model.PartitionDefinition, 0, len(oldIDs))
for _, oldID := range oldIDs {
for i := 0; i < len(pi.Definitions); i++ {
def := &pi.Definitions[i]
if def.ID == oldID {
pid, err1 := t.GenGlobalID()
if err1 != nil {
return ver, errors.Trace(err1)
}
def.ID = pid
// Shallow copy only use the def.ID in event handle.
newPartitions = append(newPartitions, *def)
break
}
}
}

Since truncating tables/partitions will allocate id just before placement rules operations, every retrying job will lead to placement rules of different IDs. It breaks idempotence.

It is reported by @CalvinNeo when testing tiflash placement rules, caused by writing conflict of concurrent DDL.

While it is a correctness problem, retrying only occurs with heavy DDL load. As an experimental feature, it does not need to be critical or major bug.

1. Minimal reproduce step (Required)

Start many concurrent session to execute truncate DDLs.

2. What did you expect to see? (Required)

Writing conflict leads to retrying, but retrying jobs have same IDs.

3. What did you see instead (Required)

Writing conflict leads to retrying, but retrying jobs have different IDs.

4. What is your TiDB version? (Required)

@xhebox xhebox added type/bug The issue is confirmed as a bug. severity/moderate feature/developing the related feature is in development labels Jan 11, 2022
@xhebox
Copy link
Contributor Author

xhebox commented Jan 11, 2022

There are two possible solutions in my mind.:

  1. we move id allocation into the first submit phase of DDL job
  2. middle state truncation job, split id allocation and placement rules into two phases.

I prefer the solution 1, simple and easy, compatibility is possible with a careful taking on job arguments. Solution 2 is good for compatibility since no need of job argument changes. But it does need a new middle state, which is different from previous DDLs.

@xhebox xhebox added the affects-5.4 This bug affects the 5.4.x(LTS) versions. label Jan 11, 2022
@lcwangchao
Copy link
Collaborator

There are two possible solutions in my mind.:

  1. we move id allocation into the first submit phase of DDL job
  2. middle state truncation job, split id allocation and placement rules into two phases.

I prefer the solution 1, simple and easy, compatibility is possible with a careful taking on job arguments. Solution 2 is good for compatibility since no need of job argument changes. But it does need a new middle state, which is different from previous DDLs.

For solution 1, we still need to handle some corner cases. Because the ddl operation is concurrent before inqueue, it can happens that after allocating the new partition ids, the number of partition changes. Though it may be not so easy to happen but I think we should add some checks...

@CalvinNeo
Copy link
Member

If there are eventually redundant PD rules for TiFlash A, and if PD can't schedule regions away from TiFlash A, it may cause failure when scale in.

@xhebox xhebox self-assigned this Jan 11, 2022
@xhebox
Copy link
Contributor Author

xhebox commented Jan 11, 2022

There are two possible solutions in my mind.:

  1. we move id allocation into the first submit phase of DDL job
  2. middle state truncation job, split id allocation and placement rules into two phases.

I prefer the solution 1, simple and easy, compatibility is possible with a careful taking on job arguments. Solution 2 is good for compatibility since no need of job argument changes. But it does need a new middle state, which is different from previous DDLs.

For solution 1, we still need to handle some corner cases. Because the ddl operation is concurrent before inqueue, it can happens that after allocating the new partition ids, the number of partition changes. Though it may be not so easy to happen but I think we should add some checks...

From the code, partitions to be truncated are decided at the time of submiting/enqueuing. pids are responsible for that:

Args: []interface{}{pids},

So we don't actually truncate new partitions if new partitions are added after submitting. I don't think there is a problem, only if you say the current behavior is a bug..

EDIT: TruncateTable does have the problem, though, https://github.com/pingcap/tidb/blob/master/ddl/table.go#L603-L611

@xhebox
Copy link
Contributor Author

xhebox commented Jan 11, 2022

If there are eventually redundant PD rules for TiFlash A, and if PD can't schedule regions away from TiFlash A, it may cause failure when scale in.

It does not affect the behavior, at least for the current code base. As long as the id is allocated increasing monotonically. But it does damage the performance of PD, if there are hundreds of redundant PD rules.

@CalvinNeo
Copy link
Member

CalvinNeo commented Jan 11, 2022

If there are eventually redundant PD rules for TiFlash A, and if PD can't schedule regions away from TiFlash A, it may cause failure when scale in.

It does not affect the behavior, at least for the current code base. As long as the id is allocated increasing monotonically. But it does damage the performance of PD, if there are hundreds of redundant PD rules.

In current version of TiFlash Cluster Manager(the old one that is in use), all pd rules are set when TableID is allocated, so will result in no redundant pd rules.

However, while the CM is going to be moved into TiDB server, pd rules will be set immediately while handling ddlJob, which can result in redundant pd rules.

I think these two behaviors are different

@xhebox
Copy link
Contributor Author

xhebox commented Jan 11, 2022

I think these two behaviors are different

Yeah, but you misinterpret me. I mean redundant rules does not affect the schedule of PD, since id is allocated increasing monotonically, which means redundant rules are just no-op rules.

@CalvinNeo
Copy link
Member

I think these two behaviors are different

Yeah, but you misinterpret me. I mean redundant rules does not affect the schedule of PD, since id is allocated increasing monotonically, which means redundant rules are just no-op rules.

So, the redundant rule table-%v-r set for TiFlash actually manages a non-existing table. So there will be no scheduled region from PD at the beginning?

@xhebox
Copy link
Contributor Author

xhebox commented Jan 11, 2022

I think these two behaviors are different

Yeah, but you misinterpret me. I mean redundant rules does not affect the schedule of PD, since id is allocated increasing monotonically, which means redundant rules are just no-op rules.

So, the redundant rule table-%v-r set for TiFlash actually manages a non-existing table. So there will be no scheduled region from PD at the beginning?

Yes, rules are applied to regions. If there are no regions(no finished DDL to generate regions of that ID), rules are, well, redundant, as it says. And it does not affect the schedule at all.

@jebter jebter added the sig/sql-infra SIG: SQL Infra label Aug 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-5.4 This bug affects the 5.4.x(LTS) versions. feature/developing the related feature is in development severity/moderate sig/sql-infra SIG: SQL Infra type/bug The issue is confirmed as a bug.
Projects
None yet
Development

No branches or pull requests

4 participants