-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: start index drop job for primary key changes immediately #47624
sql: start index drop job for primary key changes immediately #47624
Conversation
I think the alter column set data type implementation will want something similar to this too |
❌ The GitHub CI (Cockroach) build has failed on ff660830. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
ff66083
to
fc96d9d
Compare
FWIW, on my laptop, |
// change operations. This is wrapped in SucceedsSoon to handle cases where | ||
// dropped indexes are expected to be GC'ed immediately after the schema | ||
// change completes. | ||
testutils.SucceedsSoon(t, func() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was the PrimaryKeyChangeWithOperations flaky now (the old indexes got cleaned up too fast?)
@@ -2992,14 +3001,68 @@ CREATE TABLE t.test (k INT NOT NULL, v INT); | |||
}) | |||
} | |||
|
|||
// TestPrimaryKeyDropIndexNotCancelable tests that the job to drop indexes after | |||
// a primary key change is not cancelable. | |||
func TestPrimaryKeyDropIndexNotCancelable(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the job finish now in the logic test so we have to move it here?
pkg/sql/schema_changer.go
Outdated
update := func(txn *kv.Txn, descs map[sqlbase.ID]*sqlbase.MutableTableDescriptor) error { | ||
// Reset vars here because update function can be called multiple times in a retry. | ||
isRollback = false | ||
indexGCJobs = nil | ||
pkChangeIndexDropJob = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change this to a slice like "childJobs" or something so that it is easy to add new jobs to start immediately?
Thanks for getting to this lucy! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems sane to me. I'll let @rohany sign off.
Reviewed 2 of 3 files at r1.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @lucy-zhang, and @rohany)
pkg/sql/schema_changer.go, line 786 at r1 (raw file):
Previously, rohany (Rohan Yadav) wrote…
Can we change this to a slice like "childJobs" or something so that it is easy to add new jobs to start immediately?
Seems like a reasonable idea to me
Previously, the job to drop old indexes after a primary key change was created at the end of the schema change and left for the job registry to adopt at some later time. This PR switches the job creation over to `CreateStartableJobWithTxn`, which allows for starting the job immediately after the transaction which creates the job is committed. This eliminates one source of waiting time before other schema changes, which are queued behind the mutation to drop the indexes, can run. Note, however, that successive schema changes don't start running immediately after the mutations ahead of them in the queue are all cleared; if they've ever hit a retry error due to not being first in line, then they will have to wait to be re-adopted by the job registry. This is why the existing primary key change tests still need to have the adopt interval set to a low value in order to finish quickly. Release note (performance improvement): The cleanup job which runs after a primary key change to remove old indexes, which blocks other schema changes from running, now starts immediately after the primary key swap is complete. This reduces the amount of waiting time before subsequent schema changes can run.
fc96d9d
to
9f647aa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @rohany)
pkg/sql/schema_changer.go, line 786 at r1 (raw file):
Previously, ajwerner wrote…
Seems like a reasonable idea to me
Done.
pkg/sql/schema_changer_test.go, line 411 at r1 (raw file):
Previously, rohany (Rohan Yadav) wrote…
Was the PrimaryKeyChangeWithOperations flaky now (the old indexes got cleaned up too fast?)
Yeah, but the problem wasn't the indexes being GC'ed too quickly per se, since the test previously had the default GC TTL set for the table. Without setting the GC TTL to a low value, I kept getting a value that was lower than 2 times the number of rows in the table but that didn't decrease over time. I think what's going on is that some writes to the table are now happening after the index starts being dropped, so the dropped non-public index doesn't necessarily have the expected KV count. Does that seem plausible to you?
pkg/sql/schema_changer_test.go, line 3006 at r1 (raw file):
Previously, rohany (Rohan Yadav) wrote…
Does the job finish now in the logic test so we have to move it here?
Yeah, that's why.
❌ The GitHub CI (Cockroach) build has failed on 9f647aa1. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @lucy-zhang)
pkg/sql/schema_changer_test.go, line 411 at r1 (raw file):
Previously, lucy-zhang (Lucy Zhang) wrote…
Yeah, but the problem wasn't the indexes being GC'ed too quickly per se, since the test previously had the default GC TTL set for the table. Without setting the GC TTL to a low value, I kept getting a value that was lower than 2 times the number of rows in the table but that didn't decrease over time. I think what's going on is that some writes to the table are now happening after the index starts being dropped, so the dropped non-public index doesn't necessarily have the expected KV count. Does that seem plausible to you?
Yes, i think that makes sense. The test originally had a multiple of 2 (one multiple for the new primary key and one for the old key). Changing the multiple to 1 and having the gc trigger immediately seems correct.
TFTRs bors r+ |
(hopefully the infra flake is gone) |
Build failed (retrying...) |
Build failed (retrying...) |
Build failed (retrying...) |
Build failed (retrying...) |
Build failed |
bors r+ |
Build succeeded |
I'm wondering, is this something that is backportable to a point release of 20.1? It makes the previous problem of an apparent hang after primary key changes almost disappear. |
Yeah, I opened a backport PR #47818 (though I figured it wasn't worth getting reviews until we actually start merging to the release branch for 20.1.1, which I believe we're not doing right now). |
Previously, the job to drop old indexes after a primary key change was
created at the end of the schema change and left for the job registry to
adopt at some later time. This PR switches the job creation over to
CreateStartableJobWithTxn
, which allows for starting the jobimmediately after the transaction which creates the job is committed.
This eliminates one source of waiting time before other schema changes,
which are queued behind the mutation to drop the indexes, can run. Note,
however, that successive schema changes don't start running immediately
after the mutations ahead of them in the queue are all cleared; if
they've ever hit a retry error due to not being first in line, then they
will have to wait to be re-adopted by the job registry. This is why the
existing primary key change tests still need to have the adopt interval
set to a low value in order to finish quickly.
Addresses #45150.
Release note (performance improvement): The cleanup job which runs after
a primary key change to remove old indexes, which blocks other schema
changes from running, now starts immediately after the primary key swap
is complete. This reduces the amount of waiting time before subsequent
schema changes can run.