-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Service Bus] Bug fix: batching receiver upon a disconnect #13374
[Service Bus] Bug fix: batching receiver upon a disconnect #13374
Conversation
if (!state.wasConnectionCloseCalled && state.numReceivers) { | ||
logger.verbose( | ||
`[${connectionContext.connection.id}] connection.close() was not called from the sdk and there were ${state.numReceivers} ` + | ||
`receivers. We should reconnect.` | ||
); | ||
await delay(Constants.connectionReconnectDelay); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the delay for all kinds of receivers.
await callOnDetachedOnReceivers( | ||
connectionContext, | ||
connectionError || contextError, | ||
"streaming" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling onDetached on streaming receivers after the refreshConnection
since onDetached
would recover the streaming receiver and that would only be possible after the connection is refreshed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Walking through this mentally it seems like it should be fine. I believe the sequence we'll end up with is something like this:
'connection refreshing promise started'
- onDetach called on all receivers
- batchingReceiver.onDetach
- batchingReceiver.closeLink
- batchingReceiver.openLock is now locked
- connection refresh initiates
- connection closed
- new connection created
- refresh connection promise resolved
- (continues) batchingReceiver.closeLink
- batchingReceiver.openLock is now acquired
- batchingReceiver.closeLink completes (against old connection, but that's fine
- batchingReceiver.openLink happens (against new connection, which is what you intend)
Does that match your analysis?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One correction..
The closeLink
ends completely and then the refresh is initiated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sharing this reasoning! Any chance you can put it as a comment as well for our future selves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added more comments. :)
@@ -93,7 +93,7 @@ export class BatchingReceiver extends MessageReceiver { | |||
); | |||
} | |||
|
|||
await this._batchingReceiverLite.close(connectionError); | |||
this._batchingReceiverLite.close(connectionError); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this now it seems 'close' is a bit misnamed - it's more like "cancel" or something of that nature. (ie, it doesn't change the link state at all).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(not something you need to fix in this PR, just curious if you agree)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... maybe terminate
? (since it refers to ending the things than canceling or closing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logged #13390
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me!
Verify the receiverType that's passed
|
||
for (const receiverName of Object.keys(connectionContext.messageReceivers)) { | ||
const receiver = connectionContext.messageReceivers[receiverName]; | ||
if (receiver && receiver.receiverType === receiverType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this affect session receivers at all? Just want to make sure we're still calling onDetached on all receivers like we used to now that we're checking receiverType as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
onDetached
was(and is) only called for the streaming and batching receivers.
(Session receivers can only be accessed through connectionContext.messageSessions
. We never called onDetached
for sessions, there is no onDetached
method that is implemented for sessions.)
Previously, onDetached
was called for "batching" and "streaming" receivers after the refresh connection.
Now, onDetached
is called for "batching" before the refresh connection, unlike the streaming receivers.
I left a TODO to investigate further on the "sessions" part, I'll log an issue for it too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is this related issue #8875 which intends to deal with the work related to disconnects for sessions, that issue should take care of the remaining work for sessions, not logging a new issue.
Added a comment with questions - #8875 (comment)
Co-authored-by: chradek <[email protected]>
…ivers after the refresh connection
…b.com/HarshaNalluru/azure-sdk-for-js into harshan/sb/issue/13339-disconnect-fix
/azp run js - servicebus - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
Hello @HarshaNalluru! Because this pull request has the p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (
|
Bug
receiveMessages
method returned zero messages upon a connection refresh caused by a network disconnect(simulated in the test).OperationTimeout
error on message settlement after the disconnectThese two failures made the "disconnect" test occasionally fail for the batching receiver.
Cause
onDetached
on the batchReceivers is called 300ms after the connection refresh causing the recovered receive link to be closed.onDetached
is called to close the link leading to the loss of messages.Investigated here #13339
Fix
onDetached
for the batching receivers before calling the refresh connectiononDetached
for the streaming receivers after the refresh connectionChanges in the PR