-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: fix cleanup of not exhausted pausable portals in some cases #110625
Conversation
This commit adjusts RowSource.ConsumerClosed contract so that it's possible for this method to be called multiple times. Previously, vast majority of implementations already supported that (and some actually assumed that), but there were a couple that would result in a panic or an error if called multiple times - those have been relaxed. This is needed for the follow-up commit which will introduce an unconditional call of this method on the "head" processor of the flow (needed for pausable portals execution model). Release note: None
Previously, if we executed a flow (either vectorized or row-based) that contains a row-by-row processor in "pausable portal" model and didn't fully exhaust that flow (meaning that it could still produce more rows), this could result in an error when closing the portal because that processor might not be closed properly. Vectorized operators aren't affected by this, but row-by-row processors effectively require `ConsumerClosed` method to be called on them, and this might not happen. In particular, it was assumed that this method would get called by `execinfra.Run` loop, but for pausable portal model we needed to adjust it so that the loop could be exited (and then re-entered) when switching to another portal; thus, we lost the guarantee that `ConsumerClosed` is always called. This commit fixes this problem by teaching both vectorized and row-based flows to always call `ConsumerClosed` on their "head" processor on `Cleanup` (which is called when closing the portal). Now, `Cleanup` calls `ConsumerClosed` unconditionally (when the flow _might_ be running in the pausable portal mode), and this is safe given the interface adjustment in the previous commit. Release note (bug fix): Previously, when evaluating some queries (most likely with a lookup join) via "multiple active portals" execution mode (preview feature that can be enabled via `multiple_active_portals_enabled` session variable) CockroachDB could encounter an internal error like `unexpected 40960 leftover bytes` if that portal wasn't fully consumed. This is now fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the detailed explanation in the commit message! Just left a nit comment
Reviewed 5 of 5 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2, @rafiss, and @yuzefovich)
pkg/ccl/backupccl/split_and_scatter_processor.go
line 198 at r1 (raw file):
scatterer splitAndScatterer // cancelScatterAndWaitForWorker cancels the scatter goroutine and waits for // it to finish. It can be called multiples times.
Curious why we add that it can be called multiple times now -- is this something new, or do we just want to emphasize it? For I can't see changes about cancelScatterAndWaitForWorker
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
bors r+
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2, @rafiss, and @ZhouXing19)
pkg/ccl/backupccl/split_and_scatter_processor.go
line 198 at r1 (raw file):
Previously, ZhouXing19 (Jane Xing) wrote…
Curious why we add that it can be called multiple times now -- is this something new, or do we just want to emphasize it? For I can't see changes about
cancelScatterAndWaitForWorker
.
It's not new - it's simply highlighting the existing implementation detail which makes it easier to see that splitAndScatterProcessor.ConsumerClosed
is such that it can be safely called multiple times. (We already had the similar comment on backupDataProcessor.cancelAndWaitForWorker
, so I added this callout in two other places.) While working on the first commit I audited all implementations of ConsumerClosed
, and in most we short-circuit if .Closed
is true
or only do work if InternalClose
returns true
, but in a few places we don't do that, so I added a bit more commentary.
🕐 Waiting for PR status (GitHub check) to be set, probably by CI. Bors will automatically try to run when all required PR statuses are set. |
CLA bot hiccup. bors r+ |
Stopped waiting for PR status (GitHub check) without running due to duplicate requests to run. You may check Bors to see that this PR is included in a batch by one of the other requests. |
Build succeeded: |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error setting reviewers, but backport branch blathers/backport-release-23.1-110625 is ready: POST https://api.github.com/repos/cockroachdb/cockroach/pulls/110666/requested_reviewers: 422 Reviews may only be requested from collaborators. One or more of the teams you specified is not a collaborator of the cockroachdb/cockroach repository. [] Backport to branch 23.1.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
This commit fixes a minor bug introduced in cockroachdb#110625. In particular, that PR made so that we now unconditionally call `ConsumerClosed` on the "head" processor to make sure that resources are properly closed in the pausable portal model. However, `ConsumerClosed` is only valid to be called only if `Start` was called, so this commit fixes that. Additionally, to de-risk cockroachdb#110625 further we now only call that method for local flows since pausable portals currently disable DistSQL. Epic: None Release note: None
110807: flowinfra: fix recently introduced minor bug r=yuzefovich a=yuzefovich This commit fixes a minor bug introduced in #110625. In particular, that PR made so that we now unconditionally call `ConsumerClosed` on the "head" processor to make sure that resources are properly closed in the pausable portal model. However, `ConsumerClosed` is only valid to be called only if `Start` was called, so this commit fixes that. Additionally, to de-risk #110625 further we now only call that method for local flows since pausable portals currently disable DistSQL. Epic: None Release note: None 110903: storage: rename MaxIntentsPerLockConflictError to MaxConflictsPerLockConflictError r=nvanbenschoten a=nvanbenschoten Informs #110902. This commit is a broad rename of "max intents" to "max lock conflicts", to better reflect the fact that locks of any strength can be returned in LockConflictErrors. The semantics of which locks conflict with which operations are unchanged. Release note: None 110916: storage: fix invariant Value assertion r=itsbilal a=jbowens Calling Iterator.Value is only permitted on a valid iterator positioned over a point key. Previously the storage package assertions could attempt to read the Value of an iterator positioned only at a range key start bound. Epic: none Fix #110771. Release note: None Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Jackson Owens <[email protected]>
This commit fixes a minor bug introduced in #110625. In particular, that PR made so that we now unconditionally call `ConsumerClosed` on the "head" processor to make sure that resources are properly closed in the pausable portal model. However, `ConsumerClosed` is only valid to be called only if `Start` was called, so this commit fixes that. Additionally, to de-risk #110625 further we now only call that method for local flows since pausable portals currently disable DistSQL. Epic: None Release note: None
execinfra: relax RowSource.ConsumerClosed contract
This commit adjusts RowSource.ConsumerClosed contract so that it's
possible for this method to be called multiple times. Previously, vast
majority of implementations already supported that (and some actually
assumed that), but there were a couple that would result in a panic or
an error if called multiple times - those have been relaxed.
This is needed for the follow-up commit which will introduce an
unconditional call of this method on the "head" processor of the flow
(needed for pausable portals execution model).
Release note: None
sql: fix cleanup of not exhausted pausable portals in some cases
Previously, if we executed a flow (either vectorized or row-based) that
contains a row-by-row processor in "pausable portal" model and didn't
fully exhaust that flow (meaning that it could still produce more rows),
this could result in an error when closing the portal because that
processor might not be closed properly. Vectorized operators aren't
affected by this, but row-by-row processors effectively require
ConsumerClosed
method to be called on them, and this might not happen.In particular, it was assumed that this method would get called by
execinfra.Run
loop, but for pausable portal model we needed to adjustit so that the loop could be exited (and then re-entered) when switching
to another portal; thus, we lost the guarantee that
ConsumerClosed
isalways called.
This commit fixes this problem by teaching both vectorized and row-based
flows to always call
ConsumerClosed
on their "head" processor onCleanup
(which is called when closing the portal). Now,Cleanup
calls
ConsumerClosed
unconditionally (when the flow might be runningin the pausable portal mode), and this is safe given the interface
adjustment in the previous commit.
Fixes: #110486
Release note (bug fix): Previously, when evaluating some queries (most
likely with a lookup join) via "multiple active portals" execution
mode (preview feature that can be enabled via
multiple_active_portals_enabled
session variable) CockroachDB couldencounter an internal error like
unexpected 40960 leftover bytes
ifthat portal wasn't fully consumed. This is now fixed.