Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: asynchronously drop non-interleaved indexes #30566

Merged
merged 1 commit into from
Oct 9, 2018

Conversation

eriktrinh
Copy link

@eriktrinh eriktrinh commented Sep 24, 2018

This change drops non-interleaved indexes asynchronously by performing
the deletion of data using an asynchronous schema changer. This is in
preparation to eventually remove index data using ClearRange after the
GC TTL period has passed. The initial schema changer runs through the
state machine but does not perform the deletion of index data. Instead
the mutation is moved to a separate list and has a timestamp attached.
The created asynchronous schema changer uses the timestamp and index's
configured GC TTL value to determine when it should begin execution and
actually truncate the index.

When the async schema changer deletes the index data two things occur:
the job is marked as succeeded and the index zone config is removed.

The job can immediately be marked as succeeded because currently a
separate job is created for each index that is dropped.

Interleaved indexes are unaffected and have their data deleted
immediately.

Related to #20696

Fixes #28859.

@eriktrinh eriktrinh requested review from dt, vivekmenezes and a team September 24, 2018 15:48
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@vivekmenezes vivekmenezes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great start for a PR! Thanks for putting it together. I have a few comments for you to consider.

I wish there was a way to break up this change.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/backfill.go, line 210 at r1 (raw file):

			return err
		}
	}

I forget why we need this code?


pkg/sql/backfill.go, line 296 at r1 (raw file):

				resume, err = td.deleteIndex(
					ctx, &desc, resumeAt, chunkSize, noAutoCommit, false /* traceKV */, true, /* rangeDelete */
				)

why is this always true?


pkg/sql/schema_changer.go, line 361 at r1 (raw file):

				return err
			}
		}

There are other mutations that are not GGCMutations. Those also have their jobs marked as succeeded. You want to move this logic to where that happens.


pkg/sql/schema_changer.go, line 531 at r1 (raw file):

	table *sqlbase.TableDescriptor,
) error {
	if inSession || len(table.GCMutations) == 0 || len(sc.dropIndexTimes) == 0 {

also if len(table.Mutations) > 0 because we don't want to run any GC works while there are other mutations waiting.


pkg/sql/schema_changer.go, line 607 at r1 (raw file):

			if err := removeIndexZoneConfigs(ctx, txn, sc.execCfg, table.ID, dropped); err != nil {
				return err
			}

I suppose you can remove the zone config within truncateIndexes?


pkg/sql/schema_changer.go, line 967 at r1 (raw file):

	jobSucceeded := true
	return sc.leaseMgr.Publish(ctx, sc.tableID, func(desc *sqlbase.TableDescriptor) error {
		i := 0

reset jobSucceeded = true here because this function can be called more than once
you might as well also reset isRollback to false here too.


pkg/sql/schema_changer.go, line 1676 at r1 (raw file):

								indexExecAfter := timeutil.Unix(0, minDeadline)
								if schemaChanger.execAfter.IsZero() || schemaChanger.execAfter.After(indexExecAfter) {
									schemaChanger.execAfter = indexExecAfter

this can be reset below which is okay.


pkg/sql/sqlbase/structured.proto, line 545 at r1 (raw file):

  // The time which the mutation was initiated. This is used to indicate if a mutation
  // should wait for the GC and complete asynchronously.
  optional int64 modification_time = 8 [(gogoproto.nullable) = false];

I don't think we need this because the transitions through the state machine are rather fast and we can can use the time when the mutation is ready to be put in the GC queue as the drop time for the mutation.

@eriktrinh eriktrinh force-pushed the fast-drop-index branch 2 times, most recently from b053d6b to 27ffbcb Compare September 26, 2018 18:08
Copy link
Author

@eriktrinh eriktrinh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review. This is a fairly large change and would be hard to break up, unfortunately.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/backfill.go, line 210 at r1 (raw file):

Previously, vivekmenezes wrote…

I forget why we need this code?

If the index is part of a table with index subzones and configs it requires a CCL binary to modify the zone config of the table (also, see TestDropIndexWithZoneConfigOSS in drop_test.go. This check is normally done in removeIndexZoneConfigs and returns an error preventing the index from being dropped, and is done here again for non-interleaved indexes.


pkg/sql/backfill.go, line 296 at r1 (raw file):

Previously, vivekmenezes wrote…

why is this always true?

Because any non-interleaved indexes in this codepath has already waited the GC period and should be dropped using clearrange. It is only false when the drop index happens in the same transaction as the table creation (and will use the old index deletion for non-interleaved indexes).


pkg/sql/schema_changer.go, line 361 at r1 (raw file):

Previously, vivekmenezes wrote…

There are other mutations that are not GGCMutations. Those also have their jobs marked as succeeded. You want to move this logic to where that happens.

Done.


pkg/sql/schema_changer.go, line 531 at r1 (raw file):

Previously, vivekmenezes wrote…

also if len(table.Mutations) > 0 because we don't want to run any GC works while there are other mutations waiting.

Done.


pkg/sql/schema_changer.go, line 607 at r1 (raw file):

Previously, vivekmenezes wrote…

I suppose you can remove the zone config within truncateIndexes?

Done.


pkg/sql/schema_changer.go, line 967 at r1 (raw file):

Previously, vivekmenezes wrote…

reset jobSucceeded = true here because this function can be called more than once
you might as well also reset isRollback to false here too.

Done.


pkg/sql/schema_changer.go, line 1676 at r1 (raw file):

Previously, vivekmenezes wrote…

this can be reset below which is okay.

I moved it to be set below so that if the index GC deadline is before the drop table deadline it is not reset.


pkg/sql/sqlbase/structured.proto, line 545 at r1 (raw file):

Previously, vivekmenezes wrote…

I don't think we need this because the transitions through the state machine are rather fast and we can can use the time when the mutation is ready to be put in the GC queue as the drop time for the mutation.

Right. Done.

Copy link
Contributor

@vivekmenezes vivekmenezes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the changes.

I think we can split this into three PRs.

  1. A change that creates a SetTableGCImmediately() function that is used by all existing tests
  2. A change that moves all non interleaved index data deletions to the asynchronous path. What would happen if
    we also did it for the interleaved index? Would it break something to have interleaved index data for an index that doesn't exist?
  3. A change that changes the index data deletion under some circumstances to use ClearRange()

Thanks!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/backfill.go, line 210 at r1 (raw file):

Previously, eriktrinh (Erik Trinh) wrote…

If the index is part of a table with index subzones and configs it requires a CCL binary to modify the zone config of the table (also, see TestDropIndexWithZoneConfigOSS in drop_test.go. This check is normally done in removeIndexZoneConfigs and returns an error preventing the index from being dropped, and is done here again for non-interleaved indexes.

Hmmm, I'm not convinced this needs to be done here. Why here as opposed to in the DROP INDEX transaction?


pkg/sql/backfill.go, line 297 at r2 (raw file):

		}
		if err := sc.db.Txn(ctx, func(ctx context.Context, txn *client.Txn) error {
			return removeIndexZoneConfigs(ctx, txn, sc.execCfg, sc.tableID, dropped)

was that CCL error returned by this?


pkg/sql/drop_table.go, line 390 at r2 (raw file):

			return errors.Wrapf(err, "failed to mark job %d as as successful", gcm.JobID)
		}
	}

I'd suggest populating a map[JobID]stuct{} in the two above loops with jobs that need to be marked as successful and then looping over the map to mark the jobs.


pkg/sql/drop_test.go, line 431 at r2 (raw file):

	params.Knobs = base.TestingKnobs{
		SQLSchemaChanger: &sql.SchemaChangerTestingKnobs{
			AsyncExecQuickly:  true,

Do you really need this?


pkg/sql/drop_test.go, line 506 at r2 (raw file):

		t.Fatalf("table descriptor still contains index after index is dropped")
	}
	tests.CheckKeyCount(t, kvDB, indexSpan, numRows)

Add this comment:

// index data hasn't been deleted.


pkg/sql/drop_test.go, line 530 at r2 (raw file):

		t.Fatal(err)
	}

I'd suggest refactoring out the above code into a separate PR and using it whereever we set the ttl to 0 in other tests, and then calling that function here.


pkg/sql/drop_test.go, line 549 at r2 (raw file):

	tests.CheckKeyCount(t, kvDB, tableDesc.TableSpan(), 2*numRows)
}

excellent test! So it looks like the test below takes care of everything in this test, so we can safely delete it.

Perhaps we can rename TestDropIndexNameReuse to just TestDropIndex


pkg/sql/schema_changer.go, line 976 at r2 (raw file):

				indexDesc != nil &&
				!indexDesc.IsInterleaved() &&
				allowClearRange {

I think we need to have a function called

canUseClearRangeFrorDropIndex()

and use that everywhere we check for isInterleaved() and the cluster settings.


pkg/sql/schema_changer.go, line 987 at r2 (raw file):

					})
			} else {
				desc.MakeMutationComplete(mutation)

We always want to make the mutation complete independent of whether the mutation is moved to the gc mutations list or not

Copy link
Contributor

@vivekmenezes vivekmenezes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also want to say that given that you have worked for only a month on the project this is an incredible PR!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

Copy link
Author

@eriktrinh eriktrinh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repurposed this PR to include only the changes to move data deleted of non-interleaved indexes to the async code path.

Moving interleaved indexes to also use the asynchronous path would not break things, but the problem is that the data would stay twice as long as it should (wait for GC TTL + tombstone cleanup). ClearRange also isn't feasible for interleaved indexes yet because the kv structure is different (see: https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/sqlbase/index_encoding.go#L58 and https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20160624_sql_interleaved_tables.md)

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/backfill.go, line 297 at r2 (raw file):

Previously, vivekmenezes wrote…

was that CCL error returned by this?

yes, in the synchronous path for interleaved indexes it will error here and the mutations would not complete. It makes sense to move it to the DROP INDEX transaction but it will be checked a second time when the zone configs are removed either way.


pkg/sql/drop_table.go, line 390 at r2 (raw file):

Previously, vivekmenezes wrote…

I'd suggest populating a map[JobID]stuct{} in the two above loops with jobs that need to be marked as successful and then looping over the map to mark the jobs.

Done.


pkg/sql/drop_test.go, line 431 at r2 (raw file):

Previously, vivekmenezes wrote…

Do you really need this?

Nope, removed


pkg/sql/drop_test.go, line 506 at r2 (raw file):

Previously, vivekmenezes wrote…

Add this comment:

// index data hasn't been deleted.

Done.


pkg/sql/drop_test.go, line 530 at r2 (raw file):

Previously, vivekmenezes wrote…

I'd suggest refactoring out the above code into a separate PR and using it whereever we set the ttl to 0 in other tests, and then calling that function here.

Done. See pr #30741


pkg/sql/drop_test.go, line 549 at r2 (raw file):

Previously, vivekmenezes wrote…

excellent test! So it looks like the test below takes care of everything in this test, so we can safely delete it.

Perhaps we can rename TestDropIndexNameReuse to just TestDropIndex

Thanks! I've removed the other DropIndex tests and renamed this one.


pkg/sql/schema_changer.go, line 976 at r2 (raw file):

Previously, vivekmenezes wrote…

I think we need to have a function called

canUseClearRangeFrorDropIndex()

and use that everywhere we check for isInterleaved() and the cluster settings.

Done.


pkg/sql/schema_changer.go, line 987 at r2 (raw file):

Previously, vivekmenezes wrote…

We always want to make the mutation complete independent of whether the mutation is moved to the gc mutations list or not

Done.

@eriktrinh eriktrinh changed the title sql: drop index using ClearRange RPC sql: asynchronously drop non-interleaved indexes Sep 27, 2018
Copy link
Contributor

@vivekmenezes vivekmenezes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not looked at your latest updates but what I was suggesting is move all index data deletion to the async path but set DropTime to a value only if ClearRange is going to be used. On the async path the DropTime can be used to delay the deletion of data for the use of ClearRange. is this doable? You certainly need a test to check that running SQL operations over the index data doesn't do anything funky. Thanks!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

@eriktrinh eriktrinh force-pushed the fast-drop-index branch 5 times, most recently from 3207af1 to 6321153 Compare October 3, 2018 01:11
Copy link
Contributor

@vivekmenezes vivekmenezes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/drop_test.go, line 507 at r3 (raw file):

	tests.CheckKeyCount(t, kvDB, newIdxSpan, numRows)
	tests.CheckKeyCount(t, kvDB, indexSpan, 0)
	tests.CheckKeyCount(t, kvDB, tableDesc.TableSpan(), 3*numRows)

great test!


pkg/sql/drop_test.go, line 616 at r3 (raw file):

	}

	tests.CheckKeyCount(t, kvDB, tableSpan, 2*numRows)

Looks like both the changes you made to this test are not needed.


pkg/sql/schema_changer.go, line 575 at r3 (raw file):

		if timeRemaining < 0 {
			return nil
		}

It will be worth following up this PR with another one that abstracts this out into a function and uses it for the DROP TABLE and DROP INDEX use cases. No need to do it here.


pkg/sql/schema_changer.go, line 594 at r3 (raw file):

			for i := 0; i < len(tbl.GCMutations); i++ {
				otherMut := tbl.GCMutations[i]
				if otherMut.IndexID == mutation.IndexID {

Nit: combine the above two lines into one to limit scope of otherMut


pkg/sql/schema_changer.go, line 602 at r3 (raw file):

			if !found {
				return errors.Errorf("could not find expected GC'd mutation")

I think you want to return errDidntUpdateDescriptor here just like you did above. In fact you can remove the one done above.


pkg/sql/schema_changer.go, line 1590 at r3 (raw file):

						}
						if len(sc.dropIndexTimes) > 0 {
							earliestIndexExec := timeutil.Unix(0, 0)

You do not need to initialize this using Unix(0,0) . Just use a var declaration


pkg/sql/schema_changer.go, line 1689 at r3 (raw file):

								if minDeadline == 0 || deadline < minDeadline {
									minDeadline = deadline
									schemaChanger.dropIndexTimes = append([]droppedIndex{dropped}, schemaChanger.dropIndexTimes...)

is there a reason you need to prepend the dropped versus append it like you do below?


pkg/sql/schema_changer.go, line 1749 at r3 (raw file):

								// to be GC-ed.
								delete(s.schemaChangers, table.ID)
							}

The schema changer should only be on one list. If there is a non GC mutation needing processing it should stay on the non GC list.

There are three cases here

  1. table contains non gc mutations -> schema change list
  2. table contains only gc mutations -> gc list
  3. table contains a mix of both -> schema change list.

pkg/sql/schema_changer_test.go, line 3502 at r3 (raw file):

	// Verify that the index foo over v is consistent, and that column x has
	// been backfilled properly.

You've made a number of changes to the tests above that look fine but it's totall y worth running
make stress PKG=sql TESTS=TestFooBar for 5 minutes on each of the tests you have changed.

Copy link
Author

@eriktrinh eriktrinh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/drop_test.go, line 507 at r3 (raw file):

Previously, vivekmenezes wrote…

great test!

Thanks!


pkg/sql/drop_test.go, line 616 at r3 (raw file):

Previously, vivekmenezes wrote…

Looks like both the changes you made to this test are not needed.

Woops, reverted


pkg/sql/schema_changer.go, line 575 at r3 (raw file):

Previously, vivekmenezes wrote…

It will be worth following up this PR with another one that abstracts this out into a function and uses it for the DROP TABLE and DROP INDEX use cases. No need to do it here.

Might be worth it to just keep track of the deadline for the DROP TABLE case


pkg/sql/schema_changer.go, line 594 at r3 (raw file):

Previously, vivekmenezes wrote…

Nit: combine the above two lines into one to limit scope of otherMut

Done.


pkg/sql/schema_changer.go, line 602 at r3 (raw file):

Previously, vivekmenezes wrote…

I think you want to return errDidntUpdateDescriptor here just like you did above. In fact you can remove the one done above.

Done.


pkg/sql/schema_changer.go, line 1590 at r3 (raw file):

Previously, vivekmenezes wrote…

You do not need to initialize this using Unix(0,0) . Just use a var declaration

Done.


pkg/sql/schema_changer.go, line 1689 at r3 (raw file):

Previously, vivekmenezes wrote…

is there a reason you need to prepend the dropped versus append it like you do below?

Right now the list is maintained such that the head of the list is the dropped index with the earliest GC deadline so that in maybeGCMutations we avoid searching for such index by querying for the zone configs when checking if the deadline has passed. I've changed this so that the schema manager keeps track of the deadlines as well as the droptimes so we do not need to query for the zone config again.


pkg/sql/schema_changer.go, line 1749 at r3 (raw file):

Previously, vivekmenezes wrote…

The schema changer should only be on one list. If there is a non GC mutation needing processing it should stay on the non GC list.

There are three cases here

  1. table contains non gc mutations -> schema change list
  2. table contains only gc mutations -> gc list
  3. table contains a mix of both -> schema change list.

Done, although I think if the table is being dropped we can just put it into the schema change list

Copy link
Contributor

@vivekmenezes vivekmenezes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you copy the new commit message into the PR description. Thanks!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/schema_changer.go, line 1749 at r3 (raw file):

Previously, eriktrinh (Erik Trinh) wrote…

Done, although I think if the table is being dropped we can just put it into the schema change list

That sounds right


pkg/sql/schema_changer_test.go, line 3502 at r3 (raw file):

Previously, vivekmenezes wrote…

You've made a number of changes to the tests above that look fine but it's totall y worth running
make stress PKG=sql TESTS=TestFooBar for 5 minutes on each of the tests you have changed.

Please do the above before merging

Copy link
Author

@eriktrinh eriktrinh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/schema_changer_test.go, line 3502 at r3 (raw file):

Previously, vivekmenezes wrote…

Please do the above before merging

Done. Please take another look at the changes made in schema_changer_test.go.

Copy link
Contributor

@vivekmenezes vivekmenezes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/schema_changer_test.go, line 3502 at r3 (raw file):

Previously, eriktrinh (Erik Trinh) wrote…

Done. Please take another look at the changes made in schema_changer_test.go.

So I do see a number of instances where you're enabling early GC by setting the zone config and enabling quick asynchronous execution of schema changes. Why do you need to do that? In particular, can you just embrace the fact that index data will not be deleted?

I can see situations where the table is still being written to and in those circumstances you will not be able to count the keys correctly. Let's restrict the forced deletion to only those cases.

@eriktrinh eriktrinh force-pushed the fast-drop-index branch 2 times, most recently from 871cc33 to 98a1328 Compare October 8, 2018 19:33
Copy link
Contributor

@vivekmenezes vivekmenezes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/crdb_internal.go, line 1753 at r4 (raw file):

						if err == sqlbase.ErrIndexGCMutationsList {
							continue
						}

let's add a logictest for this


pkg/sql/schema_changer_test.go, line 1466 at r4 (raw file):

	params, _ := tests.CreateTestServerParams()
	const chunkSize = 200
	var enableAsyncSchemaChanges uint32 = 1

I forget why you had to enable the async schema changer and then disable it?


pkg/sql/schema_changer_test.go, line 1560 at r4 (raw file):

	if e := 1; len(tableDesc.GCMutations) != e {
		t.Fatalf("the table has %d instead of %d GC mutations", len(tableDesc.GCMutations), e)
	}

probably worth checking the values within the GCMutation


pkg/sql/schema_changer_test.go, line 2124 at r4 (raw file):

	// and completes successfully. The drop column part of the change errors
	// during backfilling the second chunk but cannot rollback the drop index.
	const expectedAttempts = 2

Add a comment here that the index truncation is no longer run through a backfill . Perhaps we should rename this expectedColumnBackfillAttempts

Copy link
Author

@eriktrinh eriktrinh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/crdb_internal.go, line 1753 at r4 (raw file):

Previously, vivekmenezes wrote…

let's add a logictest for this

The logictests shouldn't be able to marshal/unmarshal protos which would be required for a test like this. Instead, I've added a check in partitionccl/drop_test.go to ensure that the zone config is ignored the crdb_internal.zones and doesn't error.


pkg/sql/schema_changer_test.go, line 1466 at r4 (raw file):

Previously, vivekmenezes wrote…

I forget why you had to enable the async schema changer and then disable it?

We enable it to allow the async schema changer to attempt to truncate the index in the rollback, where an error will occur . It is then disabled to preventing the scheduled retry from occurring so we can perform assertions on the table state. It is then re-enabled to allow the rollback to complete.


pkg/sql/schema_changer_test.go, line 1560 at r4 (raw file):

Previously, vivekmenezes wrote…

probably worth checking the values within the GCMutation

Done.


pkg/sql/schema_changer_test.go, line 2124 at r4 (raw file):

Previously, vivekmenezes wrote…

Add a comment here that the index truncation is no longer run through a backfill . Perhaps we should rename this expectedColumnBackfillAttempts

Done.

@eriktrinh
Copy link
Author

bors r+

This change drops non-interleaved indexes asynchronously by performing
the deletion of data using an asynchronous schema changer. This is in
preparation to eventually remove index data using `ClearRange` after the
GC TTL period has passed. The initial schema changer runs through the
state machine but does not perform the deletion of index data. Instead
the mutation is moved to a separate list and has a timestamp attached.
The created asynchronous schema changer uses the timestamp and index's
configured GC TTL value to determine when it should begin execution and
actually truncate the index.

When the async schema changer deletes the index data two things occur:
the job is marked as succeeded and the index zone config is removed.

The job can immediately be marked as succeeded because currently a
separate job is created for each index that is dropped.

Interleaved indexes are unaffected and have their data deleted
immediately.

Related to cockroachdb#20696

Release note: none
@eriktrinh
Copy link
Author

bors r-

@craig
Copy link
Contributor

craig bot commented Oct 9, 2018

Canceled

@eriktrinh
Copy link
Author

bors r+

craig bot pushed a commit that referenced this pull request Oct 9, 2018
30566: sql: asynchronously drop non-interleaved indexes r=eriktrinh a=eriktrinh

This change drops non-interleaved indexes asynchronously by performing
the deletion of data using an asynchronous schema changer. This is in
preparation to eventually remove index data using `ClearRange` after the
GC TTL period has passed. The initial schema changer runs through the
state machine but does not perform the deletion of index data. Instead
the mutation is moved to a separate list and has a timestamp attached.
The created asynchronous schema changer uses the timestamp and index's
configured GC TTL value to determine when it should begin execution and
actually truncate the index.

When the async schema changer deletes the index data two things occur:
the job is marked as succeeded and the index zone config is removed.

The job can immediately be marked as succeeded because currently a
separate job is created for each index that is dropped.

Interleaved indexes are unaffected and have their data deleted
immediately.

Related to #20696

Fixes #28859.

31020: cdc: Test for falling behind schema TTL r=danhhz a=mrtracy

Add a test that ensures that changefeeds properly exit if they fall far
enough behind that schema information has been lost due to the GC TTL
(that is, a historical row version can no longer be read because the
schema at its timestamp has been garbage collected).

I have also discovered why the sister test (for the table TTL, not the
schema) required a 3 second sleep: the GC queue enforces that replicas
must have an appropriately high "score" before being GCed, even when the
"shouldQueue" process is skipped. To get around this, I have changed
"ManuallyEnqueueSpan" to a more explicit "ManuallyGCSpan", which
directly calls the processing implementation of the gcQueue on the
appropriate replicas. Both that sister test, and the new schema TTL
test, now only require a more predictable 1 second sleep.

Resolves #28644

Release note: None

31152: changefeedccl: fix TestAvroSchema/DECIMAL flake r=mrtracy a=danhhz

The precision is really meant to be in [1,10], but it sure looks like
there's an off by one error in the avro library that makes this test
flake if it picks precision of 1.

Release note: None

31154: kubernetes: Add multiregion channel, add channel to daemonset configs r=a-robinson a=a-robinson

Release note: None

Fixes #31144

Co-authored-by: Erik Trinh <[email protected]>
Co-authored-by: Matt Tracy <[email protected]>
Co-authored-by: Daniel Harrison <[email protected]>
Co-authored-by: Alex Robinson <[email protected]>
@craig
Copy link
Contributor

craig bot commented Oct 9, 2018

Build succeeded

@craig craig bot merged commit dc79d80 into cockroachdb:master Oct 9, 2018
eriktrinh pushed a commit to eriktrinh/cockroach that referenced this pull request Oct 17, 2018
This change makes the deletion of index data use the ClearRange batch
request. DROP INDEX schema changes in the same transaction as the one
which created the table still uses the slower DelRange because
ClearRange cannot be run inside a transaction and will remove write
intents of the index keys which have not been resolved in the
transaction.

This deletion of index data happens once the GC TTL period has passed
and it is safe to remove all the data. See PR cockroachdb#30566 for which delays
this data deletion.

The GC threshold is also forwarded for the cleared range to prevent
reads and writes of index data.

Fixes cockroachdb#20696.

Release note (performance improvement): Deletion of index data is faster.
eriktrinh pushed a commit to eriktrinh/cockroach that referenced this pull request Oct 19, 2018
This change makes the deletion of index data use the ClearRange batch
request. DROP INDEX schema changes in the same transaction as the one
which created the table still uses the slower DelRange because
ClearRange cannot be run inside a transaction and will remove write
intents of the index keys which have not been resolved in the
transaction.

This deletion of index data happens once the GC TTL period has passed
and it is safe to remove all the data. See PR cockroachdb#30566 for which delays
this data deletion.

The GC threshold is also forwarded for the cleared range to prevent
reads and writes of index data.

Fixes cockroachdb#20696.

Release note (performance improvement): Deletion of index data is faster.
eriktrinh pushed a commit to eriktrinh/cockroach that referenced this pull request Oct 19, 2018
This change makes the deletion of index data use the ClearRange batch
request. DROP INDEX schema changes in the same transaction as the one
which created the table still uses the slower DelRange because
ClearRange cannot be run inside a transaction and will remove write
intents of the index keys which have not been resolved in the
transaction.

This deletion of index data happens once the GC TTL period has passed
and it is safe to remove all the data. See PR cockroachdb#30566 for which delays
this data deletion.

Fixes cockroachdb#20696.

Release note (performance improvement): Deletion of index data is faster.
craig bot pushed a commit that referenced this pull request Oct 19, 2018
31326: sql: use ClearRange for index truncation r=eriktrinh a=eriktrinh

This change makes the deletion of index data use the ClearRange batch
request. DROP INDEX schema changes in the same transaction as the one
which created the table still uses the slower DelRange because
ClearRange cannot be run inside a transaction and will remove write
intents of the index keys which have not been resolved in the
transaction.

This deletion of index data happens once the GC TTL period has passed
and it is safe to remove all the data. See PR #30566 for which delays
this data deletion.

Note: See #31563 for a limitation with queries with old timestamps which returns inconsistent data which has been deleted with ClearRange.

Fixes #20696.

Release note (performance improvement): Deletion of index data is faster.

Co-authored-by: Erik Trinh <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants