-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
streamingccl: fix producer stream to properly propagate error #85867
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi folks. This is a bug I found when I tried to see what happens if the source cluster issues a non-MVCC op like ClearRange, the behavior I observed is that the producer stream just hang there, instead of erroring out to make a subscription to exit.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @miretskiy, @samiskin, and @stevendanna)
@@ -50,6 +50,7 @@ type eventStream struct { | |||
// Fields below initialized when Start called. | |||
rf *rangefeed.RangeFeed // Currently running rangefeed. | |||
streamGroup ctxgroup.Group // Context group controlling stream execution. | |||
closeChan chan struct{} // Channel signaled to close the stream loop. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nice to keep 1 channel.
If I understood the problem you're describing, then closeChan
could be renamed to doneChan. errCh can be removed. When error happens (maybeSetError), close doneChan, and save the terminal error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @miretskiy, @samiskin, and @stevendanna)
pkg/ccl/streamingccl/streamproducer/event_stream.go
line 53 at r1 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
I think it would be nice to keep 1 channel.
If I understood the problem you're describing, then closeChan
could be renamed to doneChan. errCh can be removed. When error happens (maybeSetError), close doneChan, and save the terminal error.
I renamed the closeCh
to doneCh
. Although it sounds nice and simple, removing errCh
actually is making it very hard to implement it right because
- there are two callsites where we want to close this doneChan: rangefeed internalError callback and the eventStream.Close. But channel cannot be closed twice.
- Having a doneChan and err make it hard to satisfy following usage:
- rangefeed internalErr callback sets err and close
doneCh
- the Next will need to react for
doneCh
and then readserr
and it needs to know other errors occurred in streamLoop. - so streamLoop needs to set
err
when other errors occur, that will cause race condition.
So using another channel doneCh
is probably the easiest and cleaner way to do it.
1edb0ac
to
f89820c
Compare
f89820c
to
078a0af
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine this is also why we currently just see the entire stream hang when you don't have rangefeeds enabled on the source side.
Just to make sure I understand your commit message and the code:
The bug is currently that when we get a rangefeed error, we push the error into the errCh. The SQL receive sees this error, but expect to be able to Close() the generator before returning it. Our Close() function would hang forever since nothing was going to end up shutting it down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @miretskiy and @samiskin)
Previously when rangefeed returns internal error, the error didn't get propagated through the producer stream cursor. The bug is that the SQL receiver's result writer only send error after all ValueGenerator close. However, closing ventStream (i.e., the producer stream ValueGenerator) relies on shutting down streamLoop which is not explicitly shut down during Close. Release justification: bug fixes and low-risk updates to new functionality Release note: None
078a0af
to
c68231c
Compare
bors r+ |
Build failed: |
bors r+ |
Build succeeded: |
Previously when rangefeed returns internal error, the error
didn't get propagated through the producer stream cursor.
The bug is that the SQL receiver's result writer only sends
error after all ValueGenerators close. However, closing eventStream
(i.e., the producer stream ValueGenerator) relies on shutting
down of streamLoop which is not explicitly shut down during Close.
Release justification: bug fixes and low-risk updates to
new functionality
Release note: None