Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backport-2.1: 30934, 31716 #31756

Merged

Conversation

vivekmenezes
Copy link
Contributor

Backport:

Please see individual PRs for details.

/cc @cockroachdb/release

@vivekmenezes vivekmenezes requested review from dt, eriktrinh and a team October 23, 2018 19:00
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@vivekmenezes
Copy link
Contributor Author

@eriktrinh please see your commit which is part of this.
@dt to review the commit that you reviewed.

@petermattis
Copy link
Collaborator

@andreimatei Please give this a gander.

Copy link

@eriktrinh eriktrinh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like an intellij config file got added to the second commit

@vivekmenezes vivekmenezes force-pushed the backport2.1-30934-31716 branch 2 times, most recently from 7fe4588 to a6c1aa0 Compare October 23, 2018 19:29
@vivekmenezes
Copy link
Contributor Author

@eriktrinh thanks for pointing that out.

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/schema_accessors.go, line 110 at r2 (raw file):

	avoidCached bool
	// set to true if the transaction has a fixed timestamp.
	fixedTimestamp bool

I don't understand what this flag is. And the comment doesn't help me much. And also you have a TODO to get rid of it :)
Could we get rid of this flag and fall back to always doing an uncached lookup if the cached one says "descriptor not found"?
Or, what is this flag supposed to tell the TableCollection exactly?


pkg/sql/table.go, line 204 at r2 (raw file):

// reconsidering if this is really needed.
//
// TODO(vivek): Allow cached descriptors for AS OF SYSTEM TIME queries.

this TODO should go away


pkg/sql/table.go, line 281 at r2 (raw file):

	origTimestamp := flags.txn.OrigTimestamp()
	table, expiration, err := tc.leaseMgr.AcquireByName(ctx, origTimestamp, dbID, tn.Table())
	if err != nil {

should we only handle ErrDescriptorNotFound here, instead of a blanket catch?


pkg/sql/table.go, line 282 at r2 (raw file):

	table, expiration, err := tc.leaseMgr.AcquireByName(ctx, origTimestamp, dbID, tn.Table())
	if err != nil {
		// AcquireByName() is unable to function correctly on a timestamp

It'd be nice if the limitations on AcquireByName would be documented in more detail; I bet they're subtle and, for one, knowing them would probably help me review this.


pkg/sql/table.go, line 288 at r2 (raw file):

		// to AS OF SYSTEM TIME requests; remove flags.fixedTimestamp.
		if flags.fixedTimestamp {
			flags.avoidCached = true

seems funny that an UncachedPhysicalAccessor cares about an "avoidCached" flag. Does it really? What cache would it apply to?


pkg/sql/table.go, line 290 at r2 (raw file):

			flags.avoidCached = true
			phyAccessor := UncachedPhysicalAccessor{}
			return phyAccessor.GetObjectDesc(tn, flags)

don't we want the error handling code below to run if phyAccessor.GetObjectDesc() returns an error?

Copy link
Contributor Author

@vivekmenezes vivekmenezes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comments @andreimatei . I appreciate you wanting to make this better but the reason why it's this way is because I wanted to make the change small and predictable specifically for 2.1

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/schema_accessors.go, line 110 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I don't understand what this flag is. And the comment doesn't help me much. And also you have a TODO to get rid of it :)
Could we get rid of this flag and fall back to always doing an uncached lookup if the cached one says "descriptor not found"?
Or, what is this flag supposed to tell the TableCollection exactly?

Unfortunately I tried making the original change more general and after working on it for a bit hit a wall, so I decided to take the route of introducing the flag, and making the code such that it works for the case we care about for 2.1. For 2.2 I plan to investigate how I can get rid of this flag.


pkg/sql/table.go, line 204 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

this TODO should go away

True. Will create a change to do that incorporate some of your feedback.


pkg/sql/table.go, line 281 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

should we only handle ErrDescriptorNotFound here, instead of a blanket catch?

we need to handle

err == errTableDropped || err == sqlbase.ErrDescriptorNotFound


pkg/sql/table.go, line 282 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

It'd be nice if the limitations on AcquireByName would be documented in more detail; I bet they're subtle and, for one, knowing them would probably help me review this.

I'd originally tried to not use the fixedTimestamp flag and use

if err == errTableDropped || err == sqlbase.ErrDescriptorNotFound {

but that ran into trouble with a test.

I went with the former because this is a patch we're doing to 2.1 and I thought we should make a change that we feel comfortable we understand well.


pkg/sql/table.go, line 288 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

seems funny that an UncachedPhysicalAccessor cares about an "avoidCached" flag. Does it really? What cache would it apply to?

I don't believe this matters. I've just used the code we've been using above to read from the store.


pkg/sql/table.go, line 290 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

don't we want the error handling code below to run if phyAccessor.GetObjectDesc() returns an error?

No. This code is specific to the fixedTimestamp case. When we do make it more general it will need to be fit in with the error checks below.

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, as it stands, I don't quite understand this patch. I technically have been added late to this train, so I'll let the original reviewers do their thing.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried to look at this PR but the nuance is much too subtle for me to handle. So I guess I really can't review this.

Also I appreciate this should go as a cherry-pick for Kindred specifically, but I'm strongly 👎 on putting this in the 2.1.0 release -- this needs a few weeks of stress testing before we are happy for it to land in 2.1.1.

Reviewed 1 of 1 files at r1, 9 of 9 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/table.go, line 281 at r2 (raw file):

Previously, vivekmenezes wrote…

we need to handle

err == errTableDropped || err == sqlbase.ErrDescriptorNotFound

Please add the specific checks then and fail otherwise is possible.
If you can't do that please add log.Warning of the error you're ignoring.
We certainly don't want to drop errors on the floor blindly, this will make troubleshooting super-hard.

@vivekmenezes
Copy link
Contributor Author

vivekmenezes commented Oct 23, 2018 via email

@vivekmenezes
Copy link
Contributor Author

I've created #31776 to address the questions raised here. I'm not in favor of adding commits from that into 2.1

@knz
Copy link
Contributor

knz commented Oct 24, 2018

Thank you.

Just to confirm, do you agree the present PR is not acceptable for inclusion in 2.1.0 either right, and will need thorough testing and verification on the lead up to 2.1.1?

Also I feel like we are rushing this review. This is so nuanced it should be scrutinized with more care.

@vivekmenezes
Copy link
Contributor Author

I think this PR is fine for inclusion with 2.1 because it "explicitly" identifying a code path for AS OF SYSTEM TIME queries using a boolean fixedTimestamp. The code path looks up the cache and when the cache returns a failure it reverts to the old way of reading directly from the store. It's actually quite easy to reason about.

I think it is being held back for pure "aim high" ideological reasons. Sometimes it is better to make an improvement for a customer as long as it's safe. The fixedTimestamp boolean I believe provides the guardrails required to put in such a change late in the development cycle. I ask the focus of this review be correctness.

The new change I've made is riskier because it is more general and applies to both the AS OF SYSTEM TIME path and the general path. It fixes two other outstanding issues in the process :-)
I'm not in favor of backporting it.

@jordanlewis
Copy link
Member

Note that @bdarnell's confirmed that we won't be adding this to 2.1.0.

Erik Trinh and others added 2 commits November 14, 2018 11:32
This change looks for table descriptors using the TableCollection's
uncommitted tables before always going through the physical accessor
when the lease manager's cache is avoided. This avoids the read which
otherwise would have occurred when the uncached physical accessor gets
the descriptor from the store.

Release note (performance improvement): Accessing table descriptors when
performing schema changes after the descriptor has been modified within
the same transaction should be faster.
This change also fixes a problem where TRUNCATE was not
setting the ModificationTime on the descriptor.

This change also fixes a bug where we were not using
Timestamp.Prev() to compute the previous timestamp
when reading the previous version of a descriptor.

This change disables the use of the descriptor cache
when AS OF SYSTEM TIME is for a timestamp before
a DROP or a TRUNCATE on a table because of a bug
in AcquireByName().

fixes cockroachdb#30967

Release note (sql change): Speedup AS OF SYSTEM TIME requests by
letting them use the table descriptor cache.
@vivekmenezes vivekmenezes force-pushed the backport2.1-30934-31716 branch from a6c1aa0 to 6257889 Compare November 14, 2018 16:32
@vivekmenezes vivekmenezes merged commit 2cd7749 into cockroachdb:release-2.1 Nov 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants