Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

streamingccl: fix producer stream to properly propagate error #85867

Merged
merged 1 commit into from
Aug 22, 2022

Conversation

gh-casper
Copy link
Contributor

@gh-casper gh-casper commented Aug 10, 2022

Previously when rangefeed returns internal error, the error
didn't get propagated through the producer stream cursor.

The bug is that the SQL receiver's result writer only sends
error after all ValueGenerators close. However, closing eventStream
(i.e., the producer stream ValueGenerator) relies on shutting
down of streamLoop which is not explicitly shut down during Close.

Release justification: bug fixes and low-risk updates to
new functionality

Release note: None

@gh-casper gh-casper added the A-tenant-streaming Including cluster streaming label Aug 10, 2022
@gh-casper gh-casper added this to the 22.1 milestone Aug 10, 2022
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor Author

@gh-casper gh-casper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi folks. This is a bug I found when I tried to see what happens if the source cluster issues a non-MVCC op like ClearRange, the behavior I observed is that the producer stream just hang there, instead of erroring out to make a subscription to exit.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @miretskiy, @samiskin, and @stevendanna)

@@ -50,6 +50,7 @@ type eventStream struct {
// Fields below initialized when Start called.
rf *rangefeed.RangeFeed // Currently running rangefeed.
streamGroup ctxgroup.Group // Context group controlling stream execution.
closeChan chan struct{} // Channel signaled to close the stream loop.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to keep 1 channel.

If I understood the problem you're describing, then closeChan
could be renamed to doneChan. errCh can be removed. When error happens (maybeSetError), close doneChan, and save the terminal error.

Copy link
Contributor Author

@gh-casper gh-casper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @miretskiy, @samiskin, and @stevendanna)


pkg/ccl/streamingccl/streamproducer/event_stream.go line 53 at r1 (raw file):

Previously, miretskiy (Yevgeniy Miretskiy) wrote…

I think it would be nice to keep 1 channel.

If I understood the problem you're describing, then closeChan
could be renamed to doneChan. errCh can be removed. When error happens (maybeSetError), close doneChan, and save the terminal error.

I renamed the closeCh to doneCh. Although it sounds nice and simple, removing errCh actually is making it very hard to implement it right because

  1. there are two callsites where we want to close this doneChan: rangefeed internalError callback and the eventStream.Close. But channel cannot be closed twice.
  2. Having a doneChan and err make it hard to satisfy following usage:
  • rangefeed internalErr callback sets err and close doneCh
  • the Next will need to react for doneCh and then reads err and it needs to know other errors occurred in streamLoop.
  • so streamLoop needs to set err when other errors occur, that will cause race condition.

So using another channel doneCh is probably the easiest and cleaner way to do it.

@gh-casper gh-casper requested a review from a team as a code owner August 12, 2022 02:37
@gh-casper gh-casper requested a review from miretskiy August 12, 2022 02:52
Copy link
Collaborator

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine this is also why we currently just see the entire stream hang when you don't have rangefeeds enabled on the source side.

Just to make sure I understand your commit message and the code:

The bug is currently that when we get a rangefeed error, we push the error into the errCh. The SQL receive sees this error, but expect to be able to Close() the generator before returning it. Our Close() function would hang forever since nothing was going to end up shutting it down.

Copy link
Contributor Author

@gh-casper gh-casper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @miretskiy and @samiskin)

Previously when rangefeed returns internal error, the error
didn't get propagated through the producer stream cursor.

The bug is that the SQL receiver's result writer only send
error after all ValueGenerator close. However, closing ventStream
(i.e., the producer stream ValueGenerator) relies on shutting
down streamLoop which is not explicitly shut down during Close.

Release justification: bug fixes and low-risk updates to
new functionality

Release note: None
@gh-casper
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Aug 22, 2022

Build failed:

@gh-casper
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Aug 22, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tenant-streaming Including cluster streaming
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants