-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: wait to start index GC job until schema changer is done #46929
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @lucy-zhang, and @pbardea)
pkg/sql/schema_changer.go, line 981 at r1 (raw file):
) }) if err != nil {
can you add:
if indexGCJob != nil {
if rollbackErr := indexGCJob.CleanupOnRollback(ctx); rollbackErr != nil {
log.Warningf(ctx, "failed to cleanup job: %v", err)
}
}
pkg/sql/schema_changer.go, line 984 at r1 (raw file):
return nil, err } // TODO (lucy): Can we do the same thing with the PK drop index job?
same thing meaning... I get that this may make sense in the context of this PR but in general this TODO isn't adequately detailed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I screwed up and treated the index GC job as a single job, when in fact there can be multiple indexes per transaction to be dropped. Now it's a slice of jobs.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @pbardea)
pkg/sql/schema_changer.go, line 981 at r1 (raw file):
Previously, ajwerner wrote…
can you add:
if indexGCJob != nil { if rollbackErr := indexGCJob.CleanupOnRollback(ctx); rollbackErr != nil { log.Warningf(ctx, "failed to cleanup job: %v", err) } }
Done. Unfortunately we still won't run the cleanup if Publish()
has to retry, but I guess this is a best-effort thing anyway.
pkg/sql/schema_changer.go, line 984 at r1 (raw file):
Previously, ajwerner wrote…
same thing meaning... I get that this may make sense in the context of this PR but in general this TODO isn't adequately detailed.
Done. (I moved it to where we actually create that index drop job.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r2.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @pbardea)
pkg/sql/schema_changer.go, line 981 at r1 (raw file):
Previously, lucy-zhang (Lucy Zhang) wrote…
Done. Unfortunately we still won't run the cleanup if
Publish()
has to retry, but I guess this is a best-effort thing anyway.
yeah... about that.. I've had that on my TODO list for a bit now.
The non-idempotent part of all of this is the fact that we create a new job id every time. If we created the ID above the call to create then it'd all be good. I'll go refactor this eventually. Until then it's potentially a very slow leak that does make me sad.
Or in this case I think that would look like having a slice of ids which we allocate the first pass through and then reuse on retries. That's a bit nasty but I don't have a better answer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 2 of 0 LGTMs obtained (waiting on @lucy-zhang)
pkg/sql/schema_changer.go, line 1415 at r2 (raw file):
descriptorIDs = append(descriptorIDs, table.ID) } }
I think we may have a missing case here to properly handle the empty database case. Something like
} else if details.ParentID != sqlbase.InvalidID {
descriptorIDs = []sqlbase.ID{details.ParentID}
}
so that jobs dropping empty databases aren't cleaned up accidentally.
Also happy to dig into the drop empty database case in a separate PR.
9d9e89b
to
e27c69b
Compare
Previously, we were creating and starting the index GC job in a separate transaction from the one used in `PublishMultiple()` in `SchemaChanger.done()`, meaning that the index GC job could be run before the original transaction to finalize all schema changes on the table descriptor had been committed. This led to unpredictable behavior while testing. It could also potentially cause multiple GC jobs to be created if the `PublishMultiple()` closure were retried. In this PR, the GC job is now created as a `StartableJob` that is started only after the table descriptor in the finalized state has been published. Release note (bug fix): Fixed a bug introduced in 20.1 that could cause multiple index GC jobs to be created for the same schema change in rare cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @ajwerner and @pbardea)
pkg/sql/schema_changer.go, line 1415 at r2 (raw file):
Previously, pbardea (Paul Bardea) wrote…
I think we may have a missing case here to properly handle the empty database case. Something like
} else if details.ParentID != sqlbase.InvalidID { descriptorIDs = []sqlbase.ID{details.ParentID} }
so that jobs dropping empty databases aren't cleaned up accidentally.
Also happy to dig into the drop empty database case in a separate PR.
Dropping an empty database doesn't cause a GC job to be created (either in 19.2 or now; in 19.2, there's no job at all).
I think an earlier iteration of this PR might have accidentally changed this (or maybe that was on my local branch, I don't remember). But in any case I've now added a test to ensure there's no GC job for an empty database.
❌ The GitHub CI (Cockroach) build has failed on f8582f1e. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
Hmm. I managed to also repro the |
Build failed |
The flake was #47546. bors r+ |
Build failed |
This |
bors r+ |
??? bors r+ |
Build succeeded |
Previously, we were creating and starting the index GC job in a separate
transaction from the one used in
PublishMultiple()
inSchemaChanger.done()
, meaning that the index GC job could be runbefore the original transaction to finalize all schema changes on the
table descriptor had been committed. This led to unpredictable behavior
while testing. It could also potentially cause multiple GC jobs to be
created if the
PublishMultiple()
closure were retried.In this PR, the GC job is now created as a
StartableJob
that isstarted only after the table descriptor in the finalized state has been
published.
Release justification: Bug fix for new feature.
Release note (bug fix): Fixed a bug introduced in beta.3 that could
cause multiple index GC jobs to be created for the same schema change in
rare cases.