-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Service Bus Session Processor stop receiving messages, because if the pending attach link is forced to close without any errors, it cannot be recovered. #33313
Comments
Reason why link cannot be recoveredLog AnalysisWhen we try to receive message for an empty session queue. The link is in UNINITIALIZED status and waiting for a response from service.
As repro, we wait until timeout and service response with the DETACH frame. Due to the timeout error, the link was forced to CLOSED. Because we set the error condition to be null in code level, the link can't be recovered after CLOSED.
Because the link can't be recovered, after 5 minutes, the connection was shutting down because of inactive and can't be recover either. (We have explained why connection can't be recovered)
Code AnalysisFrom code side, when the link is closed by remote (the real scenario is remote is close because of OS upgrade), the //LinkHandler.class
public void onLinkRemoteClose(Event event) {
handleRemoteLinkClosed("onLinkRemoteClose", event);
} Then it jumps to its private function //LinkHandler.class
private void handleRemoteLinkClosed(final String eventName, final Event event) {
final Link link = event.getLink();
//Here we change to use a empty error condition to repro
final ErrorCondition condition = new ErrorCondition();
...
if (condition != null && condition.getCondition() != null) {
...
onError(exception);
} else {
super.close();
} The super class is //Handler.class
public void close() {
...
endpointStates.emitNext(EndpointState.CLOSED, (signalType, emitResult) -> {
...
});
endpointStates.emitComplete((signalType, emitResult) -> {
...
});
} Once the next signal of CLOSED was emitted, the status of created link changed from UNINTIALIZED to CLOSED. we will not pass the CLOSED state as we But once the complete signal was emitted, it will complete the //ServiceBusSessionManager.class
Mono<ServiceBusReceiveLink> getActiveLink() {
...
return Mono.defer(() -> createSessionReceiveLink()
.flatMap(link -> link.getEndpointStates()
.takeUntil(e -> e == AmqpEndpointState.ACTIVE)
.timeout(operationTimeout)
.then(Mono.just(link))))
.retryWhen(Retry.from(retrySignals -> retrySignals.flatMap(signal -> {
...
if (isDisposed.get()) {
...
} else if (failure instanceof TimeoutException) {
return Mono.delay(SLEEP_DURATION_ON_ACCEPT_SESSION_EXCEPTION);
} else if (failure instanceof AmqpException
&& ((AmqpException) failure).getErrorCondition() == AmqpErrorCondition.TIMEOUT_ERROR) {
return Mono.delay(SLEEP_DURATION_ON_ACCEPT_SESSION_EXCEPTION);
} else {
return Mono.<Long>error(failure);
}
})));
} The problem here is our FixesAt present, the fix is to make sure we don't pass down the CLOSED link. So we can check whether the link is disposed, and if so, change it Mono.just(link) to Mono.error() to trigger the retryWhen. Temp Solution 1: ...
.then(Mono.fromCallable(() -> link.isDisposed() ? null : link))
.switchIfEmpty(Mono.error(() ->
new TimeoutException("Mock Timeout exception to trigger Mono.delay() condition in retry")))
//we receive the link on reactor-executor thread, we need to publish on a non-blocking thread for retry
.publishOn(Schedulers.boundedElastic())
... Temp Solution 2: Add a filter to filter out dispose link then use switchIfEmpty to change to error ...
.then(Mono.just(link))
.filter(receiveLink -> !receiveLink.isDisposed())
.switchIfEmpty(Mono.error(() ->
new TimeoutException("Mock Timeout exception to trigger Mono.delay() condition in retry")))
.publishOn(Schedulers.boundedElastic())
... Temp Solution 3: ...
.then(Mono.just(link))
.flatMap(receiveLink -> {
if(receiveLink.isDisposed()) {
return Mono.error(() ->
new TimeoutException("Mock Timeout exception to trigger Mono.delay() condition in retry"));
} else {
return Mono.just(receiveLink);
}
})
.publishOn(Schedulers.boundedElastic())
... Temp Solution 4: ...
.then(Mono.fromCallable(() -> {
if(link.isDisposed()) {
throw new TimeoutException("Mock Timeout exception to trigger Mono.delay() condition in retry");
} else {
return link;
}
}))
.publishOn(Schedulers.boundedElastic())
... pending questions
|
Describe the bug
Users encounter a issue that their session processor stop receiving messages after some time.
The log shows after 08:12:45, the session processor client can't receive any messages after the connection shutdown:
To Reproduce
Questions
Why link was forced to close without receive any error message?
The service bus role instance, which the link are waiting the attach response from, was restarted due to OS update at around 8:12. In this case, the session receive link was detached with no error condition messages.
Why the link does not recover?
This is what we have to analyze and address in this issue.
Why connection does not recover?
After connection is closed, the
ServiceBusConnectionProcessor
will request one more connection.But the new connection is
UNINITIALIZED
and there is noreceiveMessages()
action to make that connection to be active.The client was not restarted because of our old connection check is wrong, the
ServiceBusConnectionProcessor
is not disposed and there is anUNINITIALIZED
connection. It can be fixed by ([Re-Design Amqp Connection Recovery] 'ReactorConnectionCache' replacement for 'AmqpChannelProcessor' #33224)The text was updated successfully, but these errors were encountered: