Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster: remove a series of 2.1+ cluster versions #38446

Merged

Conversation

nvanbenschoten
Copy link
Member

I wanted to remove VersionLazyTxnRecord and VersionPushTxnToInclusive,
but while there I removed all of the following:

  • VersionCascadingZoneConfigs
  • VersionLoadSplits
  • VersionExportStorageWorkload
  • VersionLazyTxnRecord
  • VersionSequencedReads
  • VersionCreateStats
  • VersionDirectImport
  • VersionPushTxnToInclusive

@nvanbenschoten nvanbenschoten requested review from tbg and a team June 26, 2019 16:28
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/removeMigrations branch from 19d3e4b to 0f4bd70 Compare June 26, 2019 16:58
Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: mod some comments

Reviewed 3 of 3 files at r1, 3 of 3 files at r2, 5 of 5 files at r3, 5 of 5 files at r4, 6 of 6 files at r5, 4 of 4 files at r6, 3 of 3 files at r7, 8 of 8 files at r8, 3 of 3 files at r9.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/storage/replica.go, line 1458 at r8 (raw file):

				DeprecatedNow:   b.Header.Timestamp,
				PushType:        roachpb.PUSH_ABORT,
				InclusivePushTo: true,

Ah, this is subtle.


pkg/storage/batcheval/cmd_get.go, line 78 at r5 (raw file):

	// active which proves that all nodes are running version 19.2.
	//
	// TODO(nvanbenschoten): Remove in 20.1. This serves only as documentation

I don't understand why you can't remove this now. I do understand why VersionInclusivePush to needs to stay around, but this?

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/removeMigrations branch from 0f4bd70 to c6cf7f1 Compare June 26, 2019 21:50
I had it commented that we could also remove `InclusivePushTo` in 19.2,
but I now think that was incorrect. We still need to set it to true to
prevent 19.1 nodes that have yet to hear about the `VersionPushTxnToInclusive`
becoming active from thinking that the Replica sending the PushTxnRequest
is a 2.1 node (which sent exclusive push to timestamps).

Release note: None
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/removeMigrations branch from c6cf7f1 to 7400cc4 Compare June 26, 2019 21:51
Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @tbg)


pkg/storage/replica.go, line 1458 at r8 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Ah, this is subtle.

Yeah, it's unfortunate.


pkg/storage/batcheval/cmd_get.go, line 78 at r5 (raw file):

Previously, tbg (Tobias Grieger) wrote…

I don't understand why you can't remove this now. I do understand why VersionInclusivePush to needs to stay around, but this?

I'm leaving this around as documentation to explain that because we checked a cluster version to determine whether we should ignore sequence numbers in 19.1, 19.2 nodes can't blindly expect all nodes in their cluster to respect sequence numbers when reading. It's possible that a 19.2 node joins the cluster before all 19.1 nodes realize that the cluster version has been upgraded to 19.1.X. I reworded the comment slightly.

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 13 of 13 files at r10, 4 of 4 files at r11, 3 of 3 files at r12, 8 of 8 files at r13, 3 of 3 files at r14.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)


pkg/storage/batcheval/cmd_get.go, line 78 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I'm leaving this around as documentation to explain that because we checked a cluster version to determine whether we should ignore sequence numbers in 19.1, 19.2 nodes can't blindly expect all nodes in their cluster to respect sequence numbers when reading. It's possible that a 19.2 node joins the cluster before all 19.1 nodes realize that the cluster version has been upgraded to 19.1.X. I reworded the comment slightly.

Feels like I'm missing something still. What confuses me is that I don't understand who you are documenting this for. After removing this and by extension IgnoreSequence, your code works the same; it's using the new behavior now (right?). From your comment, my best guess now is that this documentation is just in an awkward place. In this code, it looks very much like "19.2 nodes are blindly relying on correct sequence number behavior". According to your comment, there should be places where they have to be more careful. Where is that and why is the comment here and not there?

(It doesn't help that MVCCGetOptions.IgnoreIntent doesn't have any documentation, not sure that it's worth fixing now that it's basically obsolete)

Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @knz)


pkg/storage/batcheval/cmd_get.go, line 78 at r5 (raw file):

After removing this and by extension IgnoreSequence, your code works the same; it's using the new behavior now (right?)

Yes, this is the same as deleting all of the code. The plan was to do that in 20.1.

Where is that and why is the comment here and not there?

I'm leaving this primarily for myself and @knz because we're going to need to be careful about this when fixing the Halloween problem. The point of the comment is to say that until 20.1, we need to be careful that not all nodes in the cluster will always consider sequence numbers when performing a scan, even though the codebase that we'll be programming against will. Leaving the code in until it is no longer relevant is a way to ensure that we don't completely forget about this migration period.

I don't see where a better place to put it would be. Moving it to Transaction.Sequence is likely to get lost. Leaving the code like this ensures that readers can understand what to expect from replicas before and after the old cluster setting is active.

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)

@nvanbenschoten
Copy link
Member Author

bors r=tbg

craig bot pushed a commit that referenced this pull request Jun 28, 2019
38446: cluster: remove a series of 2.1+ cluster versions r=tbg a=nvanbenschoten

I wanted to remove `VersionLazyTxnRecord` and `VersionPushTxnToInclusive`,
but while there I removed all of the following:
- VersionCascadingZoneConfigs
- VersionLoadSplits
- VersionExportStorageWorkload
- VersionLazyTxnRecord
- VersionSequencedReads
- VersionCreateStats
- VersionDirectImport
- VersionPushTxnToInclusive

Co-authored-by: Nathan VanBenschoten <[email protected]>
@craig
Copy link
Contributor

craig bot commented Jun 28, 2019

Build succeeded

@craig craig bot merged commit 7400cc4 into cockroachdb:master Jun 28, 2019
@nvanbenschoten nvanbenschoten deleted the nvanbenschoten/removeMigrations branch July 1, 2019 17:00
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Dec 27, 2019
This completes the migration started in cockroachdb#35297, which was partially
finalized in cockroachdb#38446.

Release note: None
craig bot pushed a commit that referenced this pull request Dec 27, 2019
43227: roachpb: remove PushTxnRequest.InclusivePushTo r=nvanbenschoten a=nvanbenschoten

This completes the migration started in #35297, which was partially finalized in #38446.

Release note: None

43316: storagebase: take RangeDesc by reference in ContainsKey(Range) r=nvanbenschoten a=nvanbenschoten

All callers of this had a *RangeDescriptor, so there's no reason to pass in a large RangeDescriptor struct.

Release note: None

43563: storage/intentresolver: don't capture loop iteration vars in async task r=nvanbenschoten a=nvanbenschoten

It's unclear if we've ever seen issues from this, but I intend to backport the
fix to v19.2, v19.1, and v2.1. I believe the worst thing that could have
happened is that a batch that observed multiple intents or pushed multiple txns
would only end up cleaning up a single one of these. It would then run into some
of these intents again when it tried to re-evaluate, forcing it to push again.
This subverts the parallelism that we were trying to achieve here, but would
never cause a stall.

Release note (bug fix): Ensure that all intents or transactions that a
batch observes are asynchronously cleaned up.

Co-authored-by: Nathan VanBenschoten <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants