-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Stopping ServiceBusProcessor causes some messages to be left locked #21869
Comments
Thank you for your feedback. Tagging and routing to the team member best able to assist. |
Hi @vbystricky21, |
Hi @JoshLove-msft I see your point. We also expected message to be available, once we dispose everything, but our test suggest something else. Simple code snippet to reproduce described issue with Azure.Messaging.ServiceBus7.1.2 :
Output from our tests with lock duration set to 30 seconds: |
Hi @vbystricky21. It doesn't look like you're stopping the processor by calling If we alter your static async Task ReceiveMessages(string connectionString, string queueName, TimeSpan processingTime)
{
Console.WriteLine("Receiving started.");
var client = new ServiceBusClient(connectionString);
var opts = new ServiceBusReceiverOptions()
{
ReceiveMode = ServiceBusReceiveMode.PeekLock
};
var processor = client.CreateProcessor(queueName, new ServiceBusProcessorOptions
{
AutoCompleteMessages = false,
ReceiveMode = ServiceBusReceiveMode.PeekLock,
MaxAutoLockRenewalDuration = TimeSpan.FromMinutes(10),
PrefetchCount = 0,
MaxConcurrentCalls = 4
});
processor.ProcessErrorAsync += async args =>
{
Console.WriteLine($"Error: {args.EntityPath} {args.Exception}");
};
processor.ProcessMessageAsync += async args =>
{
var tmp = Interlocked.Increment(ref _counter);
Console.WriteLine($"{tmp} Processing message: {args.Message.SequenceNumber}, queue time: {DateTime.Now - args.Message.EnqueuedTime}");
await Task.Delay(300);
await args.CompleteMessageAsync(args.Message);
};
await processor.StartProcessingAsync();
await Task.Delay((int)processingTime.TotalMilliseconds);
// -- STOP ADDED
await processor.StopProcessingAsync();
//--
await processor.DisposeAsync();
await client.DisposeAsync();
Console.WriteLine("Receiving stopped.");
} |
One thing to note - if calling processor.DisposeAsync it is not necessary to call processor.StopProcessingAsync. |
Thanks for the code snippet, I will try to repro. |
Really? That's not my understanding; I'll follow-up offline. Edit: |
I was able to reproduce this. I think what is happening is that there is batching of the credits in the underlying AMQP library, since MaxConcurrentCalls is set to 4. So a single receive call may end up resulting in more messages being requested (I couldn't repro when using MaxConcurrentCalls of 1). However, I'm not sure why closing the link would not cause the message locks to be relinquished. I will have to follow up with the service team about this. If you need this kind of ordering guarantee, you would need to use sessions. Even without this issue, it is possible your connection could drop and any messages that were already sent by the service would be locked, and potentially remain locked when reconnecting. With sessions, this is not possible as the lock is on the entire session. When using sessions, the session does NOT remain locked for the lock duration when the link is closed, so the issue does seem specific to message lock behavior on the service. |
We don't need FIFO. In fact in production code we use multiple instances of producer that consume messages from same queue, so sessions are not an option for us (code snippet was just to reproduce the bug).
|
Ah, I see - thank you for tracking that down! I don't think there is value in honoring the cancellationToken at that point so we can probably just fix that. I am following up with the service team to try to figure out why the locks are not released when closing the receiver (as this should happen regardless of this fix). |
Actually, that code has been updated in the 7.2 version, but the issue persists. I talked to the service team, and it appears that messages can end up being locked by the service after the client closes the link, but before the service can acknowledge that the link is closed. This was made more likely to occur to the issue you noted in 7.1.2 and similarly by how we handle cancellation in 7.2.x. It should be made less likely to occur (though still possible) once full cancellation support is added to the AMQP library. |
So there is no plan to fix this bug, only to reduce its occurrence likelihood ? This quite a show stopper for our production. Workaround would probably be not to honor cancelation token during message receiving and wait for message to be received or maxWaitTime to elapse. I know this goes against changes you made in 7.2.X, fixing the bug about long processor stopping, but in our case we prefer correct shutdown over its speed. Any chance you make this configurable somehow ? Of the topic: There is kind of inconsistency in renew lock task cancelation. During shutdown, you cancel this task immediately even if message is still being processed in message handler. In case of auto complete message you always try to complete it, but lock can be already lost. Is there real reason not to cancel lock renewal after all message handling is done ? I know this can be solved in our code, but synchronization between producer shutdown and message handling needed for that, seems to be unnecessary. |
Can you clarify why this is a show stopper? The messages will still get delivered but they are locked for the lock duration and the delivery count is incremented. In order to fully fix it, we would need updates on the service side to allow abandoning messages that are locked by a closed receiver. In terms of the 7.2.x fix, it is still not correct if we simply wait for the timeout as the race condition is still possible, it just becomes less likely as you are polling for a much longer time (60s by default) than the window where this issue can occur. re: lock renewal - that sounds like a bug and we should fix it. |
In our case, we use many producers, which number varies according to SB queue length. So we typically start/stop some producers every few minutes, which leaves us with several locked messages. Every consumed message represents some OnDemand job we process for our customers . These jobs usually take under 1s, so 30s lock every few minutes degrades our performance. |
Is it possible to reuse the same processor by just calling StopProcessing and then restarting using StartProcessing? This would use the same link so you wouldn't be locked out from the message. |
Unfortunately no. Stopping producer in our architecture actually means deleting POD from Azure Kubernetes Services |
Just to clarify things, any change to Azure.Messaging.ServiceBus or underlying Microsoft.Azure.Amqp can only mitigate the risk. And real fix would actually need change to Service Bus Service itself. Did I get it right ? |
Correct, in order to prevent this from happening completely, we would need service updates. The issue can be mitigated through client side enhancements to Microsoft.Azure.Amqp to support draining the link credits once the processor is stopped. |
This issue has been raised to the service team for future investigation. |
Hi @JoshLove-msft is there any update ? |
@yvgopal do you know if there has been any investigation into this issue from the service side. I believe the issue is assigned to you. |
you can send a drain flow when you don't want to receive any messages. That is supported in the service. |
As discussed offline with @xinchen10, sending the drain flow would still lead to races where messages can be locked. In order to prevent this from occurring, we would need service support for unlocking messages on link close. |
We can't do that. That would break backward compatibility for some customers. We know some customers who receive using AMQP but complete with the lock token using http. |
I see, but it could be an optional setting, right? |
There appears to be an issue in the underlying AMQP library that is making the behavior worse. I've submitted a PR. That alone would hopefully make the issue less likely, but there may be something we can do the in the Service Bus SDK as well. |
I should clarify - I don't think it would be as simple as consuming the updated version of the AMQP library whenever it is available. We will also need an update in the SB library to make sure we are hitting that code path for cancellation. |
Describe the bug
We use ServiceBusProcessor in PeekLock mode. Once processor is stopped/disposed it lefts often some messages locked in the queue. It seems caused by TaskCanceledException thrown in AmqpReceiver.ReceiveMessagesAsyncInternal. That gives us no chance to process already received message, but it is not abandoned either.
Changes introduced in Azure.Messaging.ServiceBus 7.2.0 beta by @JoshLove-msft doesn't seem to solve this issue as processor is stoped rather fast than correct.
Expected behavior
All messages can be processed or abandoned.
Actual behavior (include Exception or Stack Trace)
In flights messages remain locked.
Environment:
The text was updated successfully, but these errors were encountered: