Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: wait for one version lease if no job created for a descriptor #95620

Merged

Conversation

chengxiong-ruan
Copy link
Contributor

Previously we only wait for one version in jobs for descriptors performed with schema changes. The problem is that we don't always create jobs for all descriptors modified. We need to wait for one version for these descriptors too. This pr adds logic to wait if there is no jobs created for a descriptor has new version.

Fixes: #90600

Release note: None

@blathers-crl
Copy link

blathers-crl bot commented Jan 20, 2023

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@chengxiong-ruan chengxiong-ruan requested a review from a team January 20, 2023 19:43
@chengxiong-ruan
Copy link
Contributor Author

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

I promise there'll be tests failing like round trips and some tests relevant to DROP....

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @chengxiong-ruan)


pkg/sql/conn_executor_exec.go line 1079 at r1 (raw file):

			return err
		}
	}

I don't think this is where it belongs. I think it belongs around here:

if err := ex.server.cfg.JobRegistry.Run(
ex.ctxHolder.connCtx, *ex.extraTxnState.jobs,
); err != nil {
handleErr(err)
}

Code quote:

	//If there is any descriptor has new version. We want to make sure there is
	//only one version of the descriptor in all nodes. In schema changer jobs,
	//`WaitForOneVersion` has been called for the descriptors included in jobs. So
	//we just need to do this for descriptors not in jobs.
	var descIDsInJobs catalog.DescriptorIDSet
	for _, jobID := range *ex.extraTxnState.jobs {
		job, err := ex.server.cfg.JobRegistry.LoadJob(ctx, jobID)
		if err != nil {
			return err
		}
		payload := job.Payload()
		for _, descID := range payload.DescriptorIDs {
			descIDsInJobs.Add(descID)
		}
	}
	for _, idVersion := range withNewVersion {
		if descIDsInJobs.Contains(idVersion.ID) {
			continue
		}
		if _, err := WaitToUpdateLeases(ctx, ex.planner.LeaseMgr(), idVersion.ID); err != nil {
			return err
		}
	}

@chengxiong-ruan chengxiong-ruan force-pushed the schema-changes-wait-for-one-version branch from 7e09c3c to 3fb41f8 Compare January 23, 2023 16:29
@chengxiong-ruan
Copy link
Contributor Author

pkg/sql/conn_executor_exec.go line 1079 at r1 (raw file):

Previously, ajwerner wrote…

I don't think this is where it belongs. I think it belongs around here:

if err := ex.server.cfg.JobRegistry.Run(
ex.ctxHolder.connCtx, *ex.extraTxnState.jobs,
); err != nil {
handleErr(err)
}

thanks for pointing that out! Moved it there.

@chengxiong-ruan chengxiong-ruan force-pushed the schema-changes-wait-for-one-version branch 3 times, most recently from 977380b to 829421d Compare January 24, 2023 02:25
@chengxiong-ruan chengxiong-ruan marked this pull request as ready for review January 24, 2023 02:28
@chengxiong-ruan
Copy link
Contributor Author

I didn't come up with a good way of testing this, but given the amount of test failed and fixed, plus I tried this within the CREATE FUNCTION scenario where schema need to converge at one version before transaction returns (to at least make logic tests working). I think it should work.

@chengxiong-ruan chengxiong-ruan force-pushed the schema-changes-wait-for-one-version branch from 829421d to bb49416 Compare January 24, 2023 03:46
return nil
}

func (ex *connExecutor) descIDsInJobs() (catalog.DescriptorIDSet, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know about this -- this is a very expensive thing to do.

I think we can do better. Namely, we buffer jobs we create with schema changes for a while. We can extract the desc IDs from that set and we can also extract a set from the declarative schema changer state if we create one, maybe. I'd like it to be the case that we don't go to the kv-store to decide what to wait on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree. There is sometimes other jobs like query statistics. I think 2 options:
(1) How about we only do this if there is any new version descriptors. This would only affect schema changes with jobs.
(2) Cache job records in the extraTxnState as well in addition to the job ids. In legacy schema changer it seems obvious that how job records are passed around. It's not clear to me how we can plumb the job record out of schema changer yet, but I can take a look.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the declarative schema changer, all you need to know is whether a job was created or not.

@chengxiong-ruan chengxiong-ruan force-pushed the schema-changes-wait-for-one-version branch from bb49416 to 33791b4 Compare January 25, 2023 01:47
@chengxiong-ruan chengxiong-ruan requested a review from a team January 25, 2023 01:47
@chengxiong-ruan chengxiong-ruan force-pushed the schema-changes-wait-for-one-version branch from 33791b4 to 55cf28f Compare January 25, 2023 02:55
@chengxiong-ruan chengxiong-ruan requested a review from a team as a code owner January 25, 2023 02:55
@chengxiong-ruan chengxiong-ruan force-pushed the schema-changes-wait-for-one-version branch from 55cf28f to 6760adb Compare January 25, 2023 03:26
Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is close. I want to tighten up the code around waiting. The stuff around jobs code is superficial.

Comment on lines 3785 to 3786
j.uniqueJobsToCreate = make(map[descpb.ID]*jobs.Record)
j.nonUniqueJobsToCreate = nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clear the map rather than making a new one. There's a compiler intrinsic, and nobody else has the map.

for id := range j.uniqueJobsToCreate {
    delete(j.uniqueJobsToCreate, id)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

}

func (j *jobsInfo) releaseJobsToRecreate() {
j.uniqueJobsToCreate = make(map[descpb.ID]*jobs.Record)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here about clearing the map as opposed to allocating a new one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

// Get descriptor IDs from legacy schema changer jobs.
var descIDsInJobs catalog.DescriptorIDSet
if err := ex.extraTxnState.jobsInfo.forEachJobToCreate(func(jobRecord *jobs.Record) error {
for _, descID := range jobRecord.DescriptorIDs {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this cause problems for other types of jobs than schema changes like gc jobs or changefeeds? can we make sure the job has a relevant type where we know there will be waiting? Maybe rename this function to reference schema change jobs, and describe on this function why those schema change jobs are special?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be good to describe in commentary here the semantics of this field for the different types of schema changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jobs like gc and changefeeds are fine. I renamed it to descIDsInSchemaChangeJobs as suggested and add more comments. I also move the iteration on DescriptorIDs into the switch block so that we only use these ids when it's actually a schema change job.

@@ -3687,3 +3759,56 @@ func init() {
return planGistFromCtx(ctx)
})
}

type jobsInfo struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can come up with a better name for this structure. txnJobsCollection?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, done.

// queued up for the given ID. The cache remains valid only for the current
// transaction and it is cleared after the transaction is committed.
schemaChangeJobRecords map[descpb.ID]*jobs.Record
jobsInfo *jobsInfo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like jobs is a good name for this field

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -3687,3 +3759,56 @@ func init() {
return planGistFromCtx(ctx)
})
}

type jobsInfo struct {
jobsCreated jobsCollection
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: name this created. It's obviously about jobs in this context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.


type jobsInfo struct {
jobsCreated jobsCollection
uniqueJobsToCreate map[descpb.ID]*jobs.Record
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: name this uniqueToCreate and the other one nonUniqueToCreate and use comments to explain what makes them different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -3113,6 +3112,79 @@ func (ex *connExecutor) handleWaitingForDescriptorIDGeneratorMigration(ctx conte
return ex.resetTransactionOnSchemaChangeRetry(ctx)
}

func (ex *connExecutor) waitOneVersionForNewVersionDescriptorsWithoutJobs(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file is too big already. What do you think about moving all this job related code to a new file conn_executor_jobs.go and moving the jobsCollection and its friends to a file jobs_collection.go?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. moved them to new files.

Comment on lines 3789 to 3797
func (j *jobsInfo) numJobsToCreate() int {
return len(j.uniqueJobsToCreate) + len(j.nonUniqueJobsToCreate)
}

func (j *jobsInfo) hasJobsToCreate() bool {
return j.numJobsToCreate() > 0
}

func (j *jobsInfo) forEachJobToCreate(fn func(jobRecord *jobs.Record) error) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: i think you can remove the word jobs from all these methods and improve readability. x.jobs.forEachToCreate, x.jobs.numToCreate, x.jobs.hasAnyToCreate? That's just my taste.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

nonUniqueToCreate []*jobs.Record
}

func newJobsInfo() *txnJobsCollection {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename this to match the type

return catalog.DescriptorIDSet{}, err
}

// Get descriptor IDs with declarative schema changer jobs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the best way to do this? I think it's fine, but would it be better if we were more explicit about it? For one, I don't know if HasConcurrentSchemaChanges is what you want. I think it works, but it's brute force and relies on the loop above. Could we instead check on the declarative schema changer state and see if it has a job, and, if it does not, go and collect the descriptor IDs which have been modified?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. My mistake that I didn't realize HasConcurrentSchemaChanges also contains checks of legacy mutation jobs on tables. I added a check to see if there is a declarative schema change job. If there is a job, then see which descriptors are affected, but only check descriptors not included in the legacy jobs above.

"github.com/cockroachdb/cockroach/pkg/sql/catalog/descpb"
)

type txnJobsCollection struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: comment what this is used for

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -1007,9 +1004,7 @@ func (ie *InternalExecutor) execInternal(

// ReleaseSchemaChangeJobRecords is to release the schema change job records.
func (ie *InternalExecutor) releaseSchemaChangeJobRecords() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kill this function altogether. Look at the one place it was called. That thing can just invoke this function.

schemaChangerState *SchemaChangerState
txn *kv.Txn
descCollection *descs.Collection
jobsInfo *txnJobsCollection
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename this to jobs

Comment on lines 1369 to 1370
ie.extraTxnState.jobsInfo.reset()
ie.releaseSchemaChangeJobRecords()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This relates to my above comment. IIUC, the reset already makes the second function a no-op. This can now just be: defer ie.extraTxnState.jobs.reset()

// in sql.connExecutor. sql.connExecutor.createJobs() enqueues jobs with these
// records when transaction is committed.
SchemaChangeJobRecords map[descpb.ID]*jobs.Record
JobsInfo *txnJobsCollection
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call this jobs. it does not need to be exported.

@chengxiong-ruan chengxiong-ruan force-pushed the schema-changes-wait-for-one-version branch 2 times, most recently from 28ad804 to 53b2e83 Compare January 26, 2023 18:17
Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after addressing the comments

"github.com/cockroachdb/errors"
)

func (ex *connExecutor) waitOneVersionForNewVersionDescriptorsWithoutJobs(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Give this a comment on why we want to do it / why it matters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

return nil
}

func (j *txnJobsCollection) releaseToRecreate() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nothing uses this, remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed.

"github.com/cockroachdb/cockroach/pkg/jobs/jobspb"
"github.com/cockroachdb/cockroach/pkg/sql/catalog/descpb"
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also move jobsCollection to this file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@chengxiong-ruan chengxiong-ruan force-pushed the schema-changes-wait-for-one-version branch from 53b2e83 to 48252e8 Compare January 26, 2023 18:38
@chengxiong-ruan chengxiong-ruan requested a review from a team as a code owner January 26, 2023 18:38
@chengxiong-ruan chengxiong-ruan force-pushed the schema-changes-wait-for-one-version branch from 48252e8 to 8731301 Compare January 26, 2023 21:30
Previously we only wait for one version in jobs for descriptors
performed with schema changes. The problem is that we don't always
create jobs for all descriptors modified. We need to wait for one
version for these descriptors too. This pr adds logic to wait if
there is no jobs created for a descriptor has new version.

Fixes: cockroachdb#90600

Release note: None
@chengxiong-ruan chengxiong-ruan force-pushed the schema-changes-wait-for-one-version branch from 8731301 to e0e57b5 Compare January 27, 2023 03:18
@chengxiong-ruan
Copy link
Contributor Author

TFTR!
bors r+

@craig
Copy link
Contributor

craig bot commented Jan 27, 2023

Build succeeded:

@craig craig bot merged commit efb4481 into cockroachdb:master Jan 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql/schemachanger: schema changes without a job do not wait for descriptors
3 participants