Skip to content
This repository was archived by the owner on Oct 12, 2023. It is now read-only.

20 minute MessageHandler throws MessageLockLostException #645

Closed
brettsam opened this issue Feb 11, 2019 · 15 comments
Closed

20 minute MessageHandler throws MessageLockLostException #645

brettsam opened this issue Feb 11, 2019 · 15 comments

Comments

@brettsam
Copy link
Member

I’m on the WebJobs/Functions team and we’ve got a customer that’s getting a MessageLockLostException when running a 20-minute function. I was able to repro with a console app (code below). Some things to call out:

  • Lowering the delay to 10 or 15 minutes worked.
  • This repros every time I run it.
  • My LockDuration is 5 minutes -- but I’ve tried 30 seconds as well.
  • This repros with both 3.0.2 and 3.3.0 of Microsoft.Azure.ServiceBus.

Is there anything obviously wrong with this configuration below that would cause this?

static void Main(string[] args)
{
    var receiver = new MessageReceiver("{snip}", "myqueue");
    receiver.PrefetchCount = 1;

    receiver.RegisterMessageHandler(DelaySome, new MessageHandlerOptions(LogException)
    {
        AutoComplete = true,
        MaxAutoRenewDuration = TimeSpan.FromMinutes(55)
    });

    Console.ReadLine();
}

private static async Task DelaySome(Message message, CancellationToken cancellationToken)
{
    Console.WriteLine($"[{DateTime.UtcNow}]: ---START MessageHandler---");
    await Task.Delay(20 * 60 * 1000);
    Console.WriteLine($"[{DateTime.UtcNow}]: ---END MessageHandler---");
}

public static Task LogException(ExceptionReceivedEventArgs args)
{
    Console.WriteLine(args.Exception.ToString());
    return Task.CompletedTask;
}

I wired up an EventListener to “Microsoft-Azure-ServiceBus” as well to capture those logs. They’re here.

[2/8/2019 11:13:41 PM] Creating MessageReceiver (Namespace 'brettsam-func.servicebus.windows.net'; Entity 'myqueue'; ReceiveMode 'PeekLock').
[2/8/2019 11:13:41 PM] MessageReceiver (Namespace 'brettsam-func.servicebus.windows.net'; Entity 'myqueue'; ClientId 'MessageReceiver1myqueue' created).
[2/8/2019 11:13:41 PM] MessageReceiver1myqueue: Register OnMessageHandler start: OnMessage Options: AutoComplete: True, AutoRenewLock: True, MaxConcurrentCalls: 1, AutoRenewTimeout: 3300
[2/8/2019 11:13:41 PM] MessageReceiver1myqueue: Register OnMessageHandler done.
[2/8/2019 11:13:41 PM] MessageReceiver1myqueue: ReceiveAsync start. MessageCount = 1
[2/8/2019 11:13:41 PM] Creating/Recreating New Link. ClientId: MessageReceiver1myqueue, IsSessionReceiver: False, SessionId: , IsRequestResponseLink: False, LinkError: .
[2/8/2019 11:13:41 PM] MessageReceiver1myqueue: AmqpReceiveLinkCreate started. IsRequestResponseLink: False,  EntityType: False, EntityPath:
[2/8/2019 11:13:41 PM] AmqpGetOrCreateConnection started.
[2/8/2019 11:13:41 PM] AmqpConnectionCreated: HostName: brettsam-func.servicebus.windows.net, ConnectionInfo: out-connection0, ConnectionState: Opened
[2/8/2019 11:13:41 PM] AmqpGetOrCreateConnection done. EntityPath: myqueue, ConnectionInfo: out-connection0, ConnectionState: Opened
[2/8/2019 11:13:41 PM] AmqpSendAuthenticanToken started. Address: sb://brettsam-func.servicebus.windows.net/myqueue, Audience: sb://brettsam-func.servicebus.windows.net/myqueue, Resource: sb://brettsam-func.servicebus.windows.net/myqueue, Claims: System.String[]
[2/8/2019 11:13:42 PM] AmqpSendAuthenticanToken done.
[2/8/2019 11:13:42 PM] MessageReceiver1myqueue: AmqpReceiveLinkCreate done.
[2/8/2019 11:14:41 PM] MessageReceiver1myqueue: ReceiveAsync done. Received '0' messages
[2/8/2019 11:14:41 PM] MessageReceiver1myqueue: MessageReceiverPump PumpTask done: Available Semaphore Count: 1
[2/8/2019 11:14:41 PM] MessageReceiver1myqueue: ReceiveAsync start. MessageCount = 1
[2/8/2019 11:15:28 PM] MessageReceiver1myqueue: ReceiveAsync done. Received '1' messages
[2/8/2019 11:15:28 PM] MessageReceiver1myqueue: MessageReceiverPump PumpTask Started: Message: SequenceNumber: 3659174697238877, Available Semaphore Count: 0
[2/8/2019 11:15:28 PM] MessageReceiver1myqueue: MessageReceiverPump DispatchTask start: Message: SequenceNumber: 3659174697238877
[2/8/2019 11:15:28 PM] MessageReceiver1myqueue: MessageReceiverPump UserCallback start: Message: SequenceNumber: 3659174697238877
[2/8/2019 11:15:28 PM]: ---START MessageHandler---
[2/8/2019 11:15:28 PM] MessageReceiver1myqueue: MessageReceiverPump RenewMessage start: Message: SequenceNumber: 3659174697238877, RenewAfterTimeInSeconds: 289
[2/8/2019 11:20:18 PM] MessageReceiver1myqueue: RenewLockAsync start. MessageCount = 1, LockToken = f19f4f93-4e3b-4831-a4d7-84ee25f9000d
[2/8/2019 11:20:18 PM] Creating/Recreating New Link. ClientId: MessageReceiver1myqueue, IsSessionReceiver: False, SessionId: , IsRequestResponseLink: True, LinkError: .
[2/8/2019 11:20:18 PM] MessageReceiver1myqueue: AmqpReceiveLinkCreate started. IsRequestResponseLink: True,  EntityType: True, EntityPath:
[2/8/2019 11:20:18 PM] AmqpGetOrCreateConnection started.
[2/8/2019 11:20:18 PM] AmqpGetOrCreateConnection done. EntityPath: myqueue/$management, ConnectionInfo: out-connection0, ConnectionState: Opened
[2/8/2019 11:20:18 PM] AmqpSendAuthenticanToken started. Address: sb://brettsam-func.servicebus.windows.net/myqueue/$management, Audience: sb://brettsam-func.servicebus.windows.net/myqueue/$management, Resource: sb://brettsam-func.servicebus.windows.net/myqueue/$management, Claims: System.String[]
[2/8/2019 11:20:18 PM] AmqpSendAuthenticanToken done.
[2/8/2019 11:20:18 PM] MessageReceiver1myqueue: AmqpReceiveLinkCreate done.
[2/8/2019 11:20:18 PM] MessageReceiver1myqueue: RenewLockAsync done.
[2/8/2019 11:20:18 PM] MessageReceiver1myqueue: MessageReceiverPump RenewMessage done: Message: SequenceNumber: 3659174697238877
[2/8/2019 11:20:18 PM] MessageReceiver1myqueue: MessageReceiverPump RenewMessage start: Message: SequenceNumber: 3659174697238877, RenewAfterTimeInSeconds: 289
[2/8/2019 11:25:08 PM] MessageReceiver1myqueue: RenewLockAsync start. MessageCount = 1, LockToken = f19f4f93-4e3b-4831-a4d7-84ee25f9000d
[2/8/2019 11:25:08 PM] MessageReceiver1myqueue: RenewLockAsync done.
[2/8/2019 11:25:08 PM] MessageReceiver1myqueue: MessageReceiverPump RenewMessage done: Message: SequenceNumber: 3659174697238877
[2/8/2019 11:25:08 PM] MessageReceiver1myqueue: MessageReceiverPump RenewMessage start: Message: SequenceNumber: 3659174697238877, RenewAfterTimeInSeconds: 289
[2/8/2019 11:29:58 PM] MessageReceiver1myqueue: RenewLockAsync start. MessageCount = 1, LockToken = f19f4f93-4e3b-4831-a4d7-84ee25f9000d
[2/8/2019 11:29:58 PM] MessageReceiver1myqueue: RenewLockAsync done.
[2/8/2019 11:29:58 PM] MessageReceiver1myqueue: MessageReceiverPump RenewMessage done: Message: SequenceNumber: 3659174697238877
[2/8/2019 11:29:58 PM] MessageReceiver1myqueue: MessageReceiverPump RenewMessage start: Message: SequenceNumber: 3659174697238877, RenewAfterTimeInSeconds: 289
[2/8/2019 11:34:48 PM] MessageReceiver1myqueue: RenewLockAsync start. MessageCount = 1, LockToken = f19f4f93-4e3b-4831-a4d7-84ee25f9000d
[2/8/2019 11:34:48 PM] MessageReceiver1myqueue: RenewLockAsync done.
[2/8/2019 11:34:48 PM] MessageReceiver1myqueue: MessageReceiverPump RenewMessage done: Message: SequenceNumber: 3659174697238877
[2/8/2019 11:34:48 PM] MessageReceiver1myqueue: MessageReceiverPump RenewMessage start: Message: SequenceNumber: 3659174697238877, RenewAfterTimeInSeconds: 289
[2/8/2019 11:35:28 PM]: ---END MessageHandler---
[2/8/2019 11:35:28 PM] MessageReceiver1myqueue: MessageReceiverPump UserCallback done: Message: SequenceNumber: 3659174697238877
[2/8/2019 11:35:28 PM] MessageReceiver1myqueue: CompleteAsync start. MessageCount = 1, LockTokens = <LockToken>f19f4f93-4e3b-4831-a4d7-84ee25f9000d</LockToken>
[2/8/2019 11:35:28 PM] Creating/Recreating New Link. ClientId: MessageReceiver1myqueue, IsSessionReceiver: False, SessionId: , IsRequestResponseLink: False, LinkError: .
[2/8/2019 11:35:28 PM] MessageReceiver1myqueue: AmqpReceiveLinkCreate started. IsRequestResponseLink: False,  EntityType: False, EntityPath:
[2/8/2019 11:35:28 PM] AmqpGetOrCreateConnection started.
[2/8/2019 11:35:28 PM] AmqpGetOrCreateConnection done. EntityPath: myqueue, ConnectionInfo: out-connection0, ConnectionState: Opened
[2/8/2019 11:35:28 PM] AmqpSendAuthenticanToken started. Address: sb://brettsam-func.servicebus.windows.net/myqueue, Audience: sb://brettsam-func.servicebus.windows.net/myqueue, Resource: sb://brettsam-func.servicebus.windows.net/myqueue, Claims: System.String[]
[2/8/2019 11:35:28 PM] AmqpSendAuthenticanToken done.
[2/8/2019 11:35:28 PM] MessageReceiver1myqueue: AmqpReceiveLinkCreate done.
[2/8/2019 11:35:29 PM] MessageReceiver1myqueue: CompleteAsync Exception: Microsoft.Azure.ServiceBus.MessageLockLostException: The lock supplied is invalid. Either the lock expired, or the message has already been removed from the queue, or was received by a different receiver instance.
   at Microsoft.Azure.ServiceBus.Core.MessageReceiver.DisposeMessagesAsync(IEnumerable`1 lockTokens, Outcome outcome) in C:\source\azure-service-bus-dotnet\src\Microsoft.Azure.ServiceBus\Core\MessageReceiver.cs:line 1429
   at Microsoft.Azure.ServiceBus.RetryPolicy.RunOperation(Func`1 operation, TimeSpan operationTimeout) in C:\source\azure-service-bus-dotnet\src\Microsoft.Azure.ServiceBus\RetryPolicy.cs:line 85
   at Microsoft.Azure.ServiceBus.RetryPolicy.RunOperation(Func`1 operation, TimeSpan operationTimeout) in C:\source\azure-service-bus-dotnet\src\Microsoft.Azure.ServiceBus\RetryPolicy.cs:line 107
   at Microsoft.Azure.ServiceBus.Core.MessageReceiver.CompleteAsync(IEnumerable`1 lockTokens) in C:\source\azure-service-bus-dotnet\src\Microsoft.Azure.ServiceBus\Core\MessageReceiver.cs:line 527.
Microsoft.Azure.ServiceBus.MessageLockLostException: The lock supplied is invalid. Either the lock expired, or the message has already been removed from the queue, or was received by a different receiver instance.
  at Microsoft.Azure.ServiceBus.Core.MessageReceiver.DisposeMessagesAsync(IEnumerable`1 lockTokens, Outcome outcome) in C:\source\azure-service-bus-dotnet\src\Microsoft.Azure.ServiceBus\Core\MessageReceiver.cs:line 1429
   at Microsoft.Azure.ServiceBus.RetryPolicy.RunOperation(Func`1 operation, TimeSpan operationTimeout) in C:\source\azure-service-bus-dotnet\src\Microsoft.Azure.ServiceBus\RetryPolicy.cs:line 85
   at Microsoft.Azure.ServiceBus.RetryPolicy.RunOperation(Func`1 operation, TimeSpan operationTimeout) in C:\source\azure-service-bus-dotnet\src\Microsoft.Azure.ServiceBus\RetryPolicy.cs:line 107
   at Microsoft.Azure.ServiceBus.Core.MessageReceiver.CompleteAsync(IEnumerable`1 lockTokens) in C:\source\azure-service-bus-dotnet\src\Microsoft.Azure.ServiceBus\Core\MessageReceiver.cs:line 527
   at Microsoft.Azure.ServiceBus.MessageReceivePump.CompleteMessageIfNeededAsync(Message message) in C:\source\azure-service-bus-dotnet\src\Microsoft.Azure.ServiceBus\MessageReceivePump.cs:line 216
[2/8/2019 11:35:29 PM] MessageReceiver1myqueue: ReceiveAsync start. MessageCount = 1
[2/8/2019 11:35:29 PM] MessageReceiver1myqueue: MessageReceiverPump DispatchTask done: Message: SequenceNumber: 3659174697238877, Current Semaphore Count: 0
[2/8/2019 11:36:29 PM] MessageReceiver1myqueue: ReceiveAsync done. Received '0' messages
[2/8/2019 11:36:29 PM] MessageReceiver1myqueue: MessageReceiverPump PumpTask done: Available Semaphore Count: 1
[2/8/2019 11:36:29 PM] MessageReceiver1myqueue: ReceiveAsync start. MessageCount = 1
[2/8/2019 11:37:29 PM] MessageReceiver1myqueue: ReceiveAsync done. Received '0' messages
[2/8/2019 11:37:29 PM] MessageReceiver1myqueue: MessageReceiverPump PumpTask done: Available Semaphore Count: 1
[2/8/2019 11:37:29 PM] MessageReceiver1myqueue: ReceiveAsync start. MessageCount = 1
@SeanFeldman
Copy link
Collaborator

@binzywu is there a max lock renewal duration on the server by any chance?

@arun-binary-write
Copy link

I have the similar issue,

I have lock duration set on my topic for 2 seconds, and set MaxAutoRenewDuration=4 seconds.

So I expect the the message lock to be renewed when the message handler takes more than 2 seconds, and I can this happening as i expected.

However every-time when it renews the lock it is setting the new lock token!.

Now, the message handler eventually completes the process and trying to call the CompleteAsync with the lock token it received initially , when this happen I get the messagelocklost exception.

Then I further investigated from service bus explorer, it shows different lock token than I initially received.

Is this how it is expected to behave ? in that case how do I know the current lock token to mark the message as completed. And what is the fix for it ?

@SeanFeldman
Copy link
Collaborator

@techiearun LockDuration of 2 seconds and MaxAutoRenewDuration of 4 seconds sounds a bit suspicious. How long is the message processing? Also, note that MaxAutoRenewDuration means that the maximum total time a message could be locked would be 4 seconds. And that's not guaranteed since auto-renewal is not a guaranteed operation.

@arun-binary-write
Copy link

@SeanFeldman The message usually gets processed under 2 seconds, 4 out of 50 messages is taking more than 2 seconds that is around 3 seconds.. so I had set the MaxAutoRenewDuration to 4 seconds with the hope that message lock will be renewed automatically for additional 2 seconds if the message handler didn't complete the process under 2 seconds.

Is this the wrong usage ?

@SeanFeldman
Copy link
Collaborator

MaxLockDuration can be set up to 5 minutes. MaxAutoRenewDuration is designed to allow an option to extend processing beyond 5 minutes (though I would be careful about it).
I would set it up differently. I would set the MaxLockDuration to up-to 30 seconds and not utilize MaxAutoRenewDuration altogether. That way, you're guaranteed not to lose the lock while processing is taking place. And if it's truly under 30 seconds processing, you won't see LockLostExceptions. And if you do, then investigate what's going on.

@arun-binary-write
Copy link

@brettsam Thanks for that. I do have another processor that is running with following configuration

Message Lock Duration : 5 Minutes
MaxAutoRenewDuration : 301 seconds. ( Based the documentation that states "This value should be greater than the longest message lock duration; for example, the LockDuration Property" )

Expected processing time of a message 3-4 minutes on normal circumstances, can go over 5 minutes at times. So I have set above configurations. Even in this case i get MessageLockLostException when some of message processing time exceeds 5 minutes. I investigated and found that message lock renewed with differnt lock token that causes MessageLockLostException. For me, this behaviour seems to be a bug.

@SeanFeldman
Copy link
Collaborator

MaxAutoRenewDuration of 301 seconds? That's 5 minutes and 1 second of total time. It's that what you want? MaxAutoRenewDuration should be the maximum duration your message should be locked. So if processing can take more than 5 minutes, find the longest duration and use that plus a few seconds.

I investigated and found that message lock renewed with differnt lock token that causes MessageLockLostException. For me, this behaviour seems to be a bug.

Could you link your repro to back this statement?

@Matt-Westphal
Copy link

Back to the original post, what is the best way to deal with AutoComplete = true, MaxRenewDuration = 55 minutes, the handler taking longer than 20 minutes and getting a MessageLockLostException?
I'm experiencing the same thing in an application that we're working on.

@SeanFeldman
Copy link
Collaborator

@Matt-Westphal does it happen for every message that takes 20+ minutes to process?

Personally, I'm in a mindset that processing of 20 minutes should be implemented in a manner where a message is locked for that time. But that's my personal view.

@vipinmathew2013
Copy link

Hello All, does anyone know if the prevoius versions than 3.40 has the same issue? Maybe older versions does not have the issue.

@Matt-Westphal
Copy link

@SeanFeldman yes it was happening for every message. The process is creating multiple VMs in Azure and takes between 20-30 minutes to complete. If it fails we want the message to remain in the queue so that the process can try again. Usually 2nd attempt works.

The code that we had is the same as issue 684. I then modified the code to use AutoComplete = true for the MessageHandler and the MaxAutoRenewDuration to 1 hour. The code was still throwing the MessageLockLostException, but then I found a couple of references "Lock duration" within the properties of the queue. Once I change that from 30 seconds to 5 minutes, the code now completes the message successfully when the process takes 20-30 minutes.
It would be nice to set the lock duration to 20 minutes, but the UI isn't letting me enter anything after 5 minutes.

@vinaysurya
Copy link
Member

Hello All, this is an issue on the Service side and only happens for entities with 'EnablePartitioning' set to true. We are in the process of fixing this and should be deployed to production clusters in our next update. In the meantime, if possible for your scenario, a potential workaround would be to create an entity with 'EnablePartitioning' set to false and you should not see this issue with that. I will update this thread as I have more information about when the fix may be deployed.

@SeanFeldman
Copy link
Collaborator

I will update this thread as I have more information about when the fix may be deployed.

@vinaysurya thank you so much for clarifying on the issue. As this is not an issue with the client, it would be great if the broker issue tracker, https://github.com/Azure/azure-service-bus/issues, would contain an issue with symptoms and workarounds. If I may suggest a format, something among the lines of this issue: Particular/NServiceBus.Transport.AzureServiceBus#51. Then this issue could be closed pointing to the broker issue tracker item.

In the meantime, if possible for your scenario, a potential workaround would be to create an entity with 'EnablePartitioning' set to false and you should not see this issue with that.

Please note that deleting an entity and re-creating is not a viable workaround for those that cannot shut down production system for this change. Also, it negates the advise to keep entities partitioned to improve the HA story on the standard tier. A non-partitioned entity will eventually suffer from short outages when containers are moved or replaced, affecting production system much more than systems with partitioned entities.

@SeanFeldman
Copy link
Collaborator

Broker issue: Azure/azure-service-bus#276

@vinaysurya
Copy link
Member

vinaysurya commented Apr 5, 2019

Good suggestion Sean, created an issue Azure/azure-service-bus#276 for Service.

Agree that it may not be possible to re-create entities for every scenario, we'll try to release fix as soon as we can. But wanted to mention it just in case there were people who could use it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants