Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: fix cleanup of not exhausted pausable portals in some cases #110625

Merged
merged 2 commits into from
Sep 14, 2023

Conversation

yuzefovich
Copy link
Member

execinfra: relax RowSource.ConsumerClosed contract

This commit adjusts RowSource.ConsumerClosed contract so that it's
possible for this method to be called multiple times. Previously, vast
majority of implementations already supported that (and some actually
assumed that), but there were a couple that would result in a panic or
an error if called multiple times - those have been relaxed.

This is needed for the follow-up commit which will introduce an
unconditional call of this method on the "head" processor of the flow
(needed for pausable portals execution model).

Release note: None

sql: fix cleanup of not exhausted pausable portals in some cases

Previously, if we executed a flow (either vectorized or row-based) that
contains a row-by-row processor in "pausable portal" model and didn't
fully exhaust that flow (meaning that it could still produce more rows),
this could result in an error when closing the portal because that
processor might not be closed properly. Vectorized operators aren't
affected by this, but row-by-row processors effectively require
ConsumerClosed method to be called on them, and this might not happen.
In particular, it was assumed that this method would get called by
execinfra.Run loop, but for pausable portal model we needed to adjust
it so that the loop could be exited (and then re-entered) when switching
to another portal; thus, we lost the guarantee that ConsumerClosed is
always called.

This commit fixes this problem by teaching both vectorized and row-based
flows to always call ConsumerClosed on their "head" processor on
Cleanup (which is called when closing the portal). Now, Cleanup
calls ConsumerClosed unconditionally (when the flow might be running
in the pausable portal mode), and this is safe given the interface
adjustment in the previous commit.

Fixes: #110486

Release note (bug fix): Previously, when evaluating some queries (most
likely with a lookup join) via "multiple active portals" execution
mode (preview feature that can be enabled via
multiple_active_portals_enabled session variable) CockroachDB could
encounter an internal error like unexpected 40960 leftover bytes if
that portal wasn't fully consumed. This is now fixed.

This commit adjusts RowSource.ConsumerClosed contract so that it's
possible for this method to be called multiple times. Previously, vast
majority of implementations already supported that (and some actually
assumed that), but there were a couple that would result in a panic or
an error if called multiple times - those have been relaxed.

This is needed for the follow-up commit which will introduce an
unconditional call of this method on the "head" processor of the flow
(needed for pausable portals execution model).

Release note: None
Previously, if we executed a flow (either vectorized or row-based) that
contains a row-by-row processor in "pausable portal" model and didn't
fully exhaust that flow (meaning that it could still produce more rows),
this could result in an error when closing the portal because that
processor might not be closed properly. Vectorized operators aren't
affected by this, but row-by-row processors effectively require
`ConsumerClosed` method to be called on them, and this might not happen.
In particular, it was assumed that this method would get called by
`execinfra.Run` loop, but for pausable portal model we needed to adjust
it so that the loop could be exited (and then re-entered) when switching
to another portal; thus, we lost the guarantee that `ConsumerClosed` is
always called.

This commit fixes this problem by teaching both vectorized and row-based
flows to always call `ConsumerClosed` on their "head" processor on
`Cleanup` (which is called when closing the portal). Now, `Cleanup`
calls `ConsumerClosed` unconditionally (when the flow _might_ be running
in the pausable portal mode), and this is safe given the interface
adjustment in the previous commit.

Release note (bug fix): Previously, when evaluating some queries (most
likely with a lookup join) via "multiple active portals" execution
mode (preview feature that can be enabled via
`multiple_active_portals_enabled` session variable) CockroachDB could
encounter an internal error like `unexpected 40960 leftover bytes` if
that portal wasn't fully consumed. This is now fixed.
@yuzefovich yuzefovich added the backport-23.1.x Flags PRs that need to be backported to 23.1 label Sep 14, 2023
@yuzefovich yuzefovich requested review from a team as code owners September 14, 2023 04:48
@yuzefovich yuzefovich requested review from rhu713 and michae2 and removed request for a team September 14, 2023 04:48
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich yuzefovich removed request for a team and rhu713 September 14, 2023 04:49
Copy link
Collaborator

@ZhouXing19 ZhouXing19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the detailed explanation in the commit message! Just left a nit comment

Reviewed 5 of 5 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @michae2, @rafiss, and @yuzefovich)


pkg/ccl/backupccl/split_and_scatter_processor.go line 198 at r1 (raw file):

	scatterer splitAndScatterer
	// cancelScatterAndWaitForWorker cancels the scatter goroutine and waits for
	// it to finish. It can be called multiples times.

Curious why we add that it can be called multiple times now -- is this something new, or do we just want to emphasize it? For I can't see changes about cancelScatterAndWaitForWorker.

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @michae2, @rafiss, and @ZhouXing19)


pkg/ccl/backupccl/split_and_scatter_processor.go line 198 at r1 (raw file):

Previously, ZhouXing19 (Jane Xing) wrote…

Curious why we add that it can be called multiple times now -- is this something new, or do we just want to emphasize it? For I can't see changes about cancelScatterAndWaitForWorker.

It's not new - it's simply highlighting the existing implementation detail which makes it easier to see that splitAndScatterProcessor.ConsumerClosed is such that it can be safely called multiple times. (We already had the similar comment on backupDataProcessor.cancelAndWaitForWorker, so I added this callout in two other places.) While working on the first commit I audited all implementations of ConsumerClosed, and in most we short-circuit if .Closed is true or only do work if InternalClose returns true, but in a few places we don't do that, so I added a bit more commentary.

@craig
Copy link
Contributor

craig bot commented Sep 14, 2023

🕐 Waiting for PR status (GitHub check) to be set, probably by CI. Bors will automatically try to run when all required PR statuses are set.

@yuzefovich
Copy link
Member Author

CLA bot hiccup.

bors r+

@craig
Copy link
Contributor

craig bot commented Sep 14, 2023

Stopped waiting for PR status (GitHub check) without running due to duplicate requests to run. You may check Bors to see that this PR is included in a batch by one of the other requests.

@craig
Copy link
Contributor

craig bot commented Sep 14, 2023

Build succeeded:

@blathers-crl
Copy link

blathers-crl bot commented Sep 14, 2023

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error setting reviewers, but backport branch blathers/backport-release-23.1-110625 is ready: POST https://api.github.com/repos/cockroachdb/cockroach/pulls/110666/requested_reviewers: 422 Reviews may only be requested from collaborators. One or more of the teams you specified is not a collaborator of the cockroachdb/cockroach repository. []

Backport to branch 23.1.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@yuzefovich yuzefovich deleted the fix-pausable-portal branch September 14, 2023 18:33
yuzefovich added a commit to yuzefovich/cockroach that referenced this pull request Sep 18, 2023
This commit fixes a minor bug introduced in cockroachdb#110625. In particular, that
PR made so that we now unconditionally call `ConsumerClosed` on the
"head" processor to make sure that resources are properly closed in the
pausable portal model. However, `ConsumerClosed` is only valid to be
called only if `Start` was called, so this commit fixes that.
Additionally, to de-risk cockroachdb#110625 further we now only call that method
for local flows since pausable portals currently disable DistSQL.

Epic: None

Release note: None
craig bot pushed a commit that referenced this pull request Sep 19, 2023
110807: flowinfra: fix recently introduced minor bug r=yuzefovich a=yuzefovich

This commit fixes a minor bug introduced in #110625. In particular, that PR made so that we now unconditionally call `ConsumerClosed` on the "head" processor to make sure that resources are properly closed in the pausable portal model. However, `ConsumerClosed` is only valid to be called only if `Start` was called, so this commit fixes that. Additionally, to de-risk #110625 further we now only call that method for local flows since pausable portals currently disable DistSQL.

Epic: None

Release note: None

110903: storage: rename MaxIntentsPerLockConflictError to MaxConflictsPerLockConflictError r=nvanbenschoten a=nvanbenschoten

Informs #110902.

This commit is a broad rename of "max intents" to "max lock conflicts", to better reflect the fact that locks of any strength can be returned in LockConflictErrors. The semantics of which locks conflict with which operations are unchanged.

Release note: None

110916: storage: fix invariant Value assertion r=itsbilal a=jbowens

Calling Iterator.Value is only permitted on a valid iterator positioned over a point key. Previously the storage package assertions could attempt to read the Value of an iterator positioned only at a range key start bound.

Epic: none
Fix #110771.
Release note: None

Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Jackson Owens <[email protected]>
yuzefovich added a commit that referenced this pull request Sep 19, 2023
This commit fixes a minor bug introduced in #110625. In particular, that
PR made so that we now unconditionally call `ConsumerClosed` on the
"head" processor to make sure that resources are properly closed in the
pausable portal model. However, `ConsumerClosed` is only valid to be
called only if `Start` was called, so this commit fixes that.
Additionally, to de-risk #110625 further we now only call that method
for local flows since pausable portals currently disable DistSQL.

Epic: None

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.1.x Flags PRs that need to be backported to 23.1
Projects
None yet
3 participants