Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Timeout stopping Service Bus Processor #17734

Closed
sireza opened this issue Jan 4, 2021 · 22 comments
Closed

[BUG] Timeout stopping Service Bus Processor #17734

sireza opened this issue Jan 4, 2021 · 22 comments
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Bus

Comments

@sireza
Copy link

sireza commented Jan 4, 2021

Description
I discovered this issue when using the ServiceBusProcessor with IHostedService, but I was able to replicate the issue using the quickstart snippet

The processor itself is working fine, it was able to receive messages and successfully 'completes' it

Not sure what it is waiting for, as there is no message in the queue. I am relying on this method to function properly to ensure messages in flight gets put back in the queue if the app is shutting down.

Expected behaviour
When StopProcessingAsync is called, the processor should gracefully stop processing any message, then successfully stops the processor before it times out

Actual behaviour (include Exception or Stack Trace)
Calling StopProcessingAsync() freezes the app, then times out after 60 seconds

I added event listener to see what's happening, and here's the output

[Verbose] Azure-Messaging-ServiceBus: Creating a ServiceBusClient (Namespace: 'rezatest02.servicebus.windows.net', Entity name: ''
[Verbose] Azure-Messaging-ServiceBus: A ServiceBusClient has been created (Identifier 'rezatest02.servicebus.windows.net-f5ea72d8-b171-4955-93a6-bdbf746731f7').
[Informational] Azure-Messaging-ServiceBus: testsQueue-9f2ce687-d35d-461c-8890-ae12f5e0392a: StartProcessingAsync start.
[Verbose] Azure-Messaging-ServiceBus: Creating a ServiceBusReceiver (Namespace: 'rezatest02.servicebus.windows.net', Entity name: 'testsQueue'
[Verbose] Azure-Messaging-ServiceBus: A ServiceBusReceiver has been created (Identifier 'testsQueue-30ca4675-6030-4047-8250-2cc6ec50ebe7').
[Informational] Azure-Messaging-ServiceBus: testsQueue-30ca4675-6030-4047-8250-2cc6ec50ebe7: ReceiveBatchAsync start. MessageCount = 1
[Informational] Azure-Messaging-ServiceBus: Creating receive link for Identifier: testsQueue-30ca4675-6030-4047-8250-2cc6ec50ebe7.
[Informational] Azure-Messaging-ServiceBus: testsQueue-9f2ce687-d35d-461c-8890-ae12f5e0392a: StartProcessingAsync done.
Wait for a minute and then press any key to end the processing
[Informational] Azure-Messaging-ServiceBus: Receive link created for Identifier: testsQueue-30ca4675-6030-4047-8250-2cc6ec50ebe7. Session Id: 

[4:14:47 pm]Stopping the receiver...
[Informational] Azure-Messaging-ServiceBus: testsQueue-9f2ce687-d35d-461c-8890-ae12f5e0392a: StopProcessingAsync start.
[Error] Azure-Messaging-ServiceBus: testsQueue-30ca4675-6030-4047-8250-2cc6ec50ebe7: ReceiveBatchAsync Exception: System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at Azure.Messaging.ServiceBus.Core.CancellationTokenExtensions.ThrowIfCancellationRequested[T](CancellationToken instance)
   at Azure.Messaging.ServiceBus.Amqp.AmqpReceiver.ReceiveMessagesAsyncInternal(Int32 maxMessages, Nullable`1 maxWaitTime, TimeSpan timeout, CancellationToken cancellationToken)
   at Azure.Messaging.ServiceBus.Amqp.AmqpReceiver.ReceiveMessagesAsyncInternal(Int32 maxMessages, Nullable`1 maxWaitTime, TimeSpan timeout, CancellationToken cancellationToken)
   at Azure.Messaging.ServiceBus.Amqp.AmqpReceiver.<>c__DisplayClass33_0.<<ReceiveMessagesAsync>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at Azure.Messaging.ServiceBus.ServiceBusRetryPolicy.RunOperation(Func`2 operation, TransportConnectionScope scope, CancellationToken cancellationToken)
   at Azure.Messaging.ServiceBus.ServiceBusRetryPolicy.RunOperation(Func`2 operation, TransportConnectionScope scope, CancellationToken cancellationToken)
   at Azure.Messaging.ServiceBus.Amqp.AmqpReceiver.ReceiveMessagesAsync(Int32 maxMessages, Nullable`1 maxWaitTime, CancellationToken cancellationToken)
   at Azure.Messaging.ServiceBus.ServiceBusReceiver.ReceiveMessagesAsync(Int32 maxMessages, Nullable`1 maxWaitTime, CancellationToken cancellationToken).
[Informational] Azure-Messaging-ServiceBus: testsQueue-9f2ce687-d35d-461c-8890-ae12f5e0392a: StopProcessingAsync done.
[4:15:35 pm]Stopped receiving messages
[Verbose] Azure-Messaging-ServiceBus: Closing a ServiceBusClient (Identifier 'rezatest02.servicebus.windows.net-f5ea72d8-b171-4955-93a6-bdbf746731f7').
[Verbose] Azure-Messaging-ServiceBus: A ServiceBusClient has been closed (Identifier 'rezatest02.servicebus.windows.net-f5ea72d8-b171-4955-93a6-bdbf746731f7').

To Reproduce
Steps to reproduce the behaviour (include a code snippet, screenshot, or any additional information that might help us reproduce the issue)

  1. Create console app with snippet from https://docs.microsoft.com/en-us/azure/service-bus-messaging/service-bus-dotnet-get-started-with-queues#receive-messages-from-a-queue
  2. Run the app
  3. Press any key to trigger StopProcessingAsync()

Environment:

  • Azure.Messaging.ServiceBus 7.0.0
  • Tried Windows and Linux, both with .net core 3.1
@ghost ghost added needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Jan 4, 2021
@jsquire jsquire added Client This issue points to a problem in the data-plane of the library. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team Service Bus labels Jan 4, 2021
@ghost ghost removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Jan 4, 2021
@jsquire
Copy link
Member

jsquire commented Jan 4, 2021

Thank you for your feedback. Tagging and routing to the team member best able to assist. Please be aware that due to the holidays, responses may be delayed.

@JoshLove-msft
Copy link
Member

Hi @sireza, unfortunatley this is expected behavior. The underlying AMQP library does not support cancellation tokens and is instead based on timeouts. Therefore, when StopProcessingAsync is called, we have to wait for any Receive calls to timeout before shutting down. The timeout can be tweaked by modifying the TryTimeout in the RetryOptions that can be set in the ServiceBusClientOptions.

@sireza
Copy link
Author

sireza commented Jan 4, 2021

@JoshLove-msft that's unfortunate. Any drawback from setting the TryTimeout to something short like 15 secoonds?

@JoshLove-msft
Copy link
Member

There is the potential that other operations might timeout unnecessarily, but they would still get retried. I would suggest trying it out to see if is acceptable. Another option would be for us to expose the MaxWaitTime for the receive operations, so that this can be configured in the processor, as suggested in #17581. If we added this option, you wouldn't need to change the TryTimeout, but instead you would set the MaxWaitTime.

@jsquire jsquire added needs-author-feedback Workflow: More information is needed from author to address the issue. and removed needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team labels Jan 9, 2021
@ghost ghost added the no-recent-activity There has been no recent activity on this issue. label Jan 16, 2021
@ghost
Copy link

ghost commented Jan 16, 2021

Hi, we're sending this friendly reminder because we haven't heard back from you in 7 days. We need more information about this issue to help address it. Please be sure to give us your input. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!

@sireza
Copy link
Author

sireza commented Jan 17, 2021

Using TryTimeout of 15 seconds doesn't seems to have any major drawback so far. I hope this gets fixed in future version though, relying on a timeout doesn't seem like an elegant solution. Also, don't remember seeing this behaviour in previous iterations of the library?

@ghost ghost added needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team and removed needs-author-feedback Workflow: More information is needed from author to address the issue. no-recent-activity There has been no recent activity on this issue. labels Jan 17, 2021
@aarondandy
Copy link
Contributor

I made a minimum reproduction of this assuming I had triggered some fringe corner case with my usage. After debugging for a bit and finding the cause I found this issue (wish I did a better search first 😅). Finding that this was expected behavior was a bit shocking and I hope you all change your view on this from being expected behavior to being a limitation.

Regarding the workaround of changing the TryTimeout to 15 seconds, does that mean my services will make 4 times as many AMQP requests compared to the default? The client is using that timeout for long polling right? If so, could that cause any possible issues with placing extra burden on my namespaces?

Assuming ReceivingAmqpLink never has support for cancellation, are there any possibilities to work around that limitation? For example could the AmqpReceiver be placed into a state of "stopping" to quickly return to the caller for Close/Dispose? In a stopping state maybe it could reject any new messages so that the link timeout can happen asynchronously and probably even just allow the process to die much quicker.

In my current context I have IHostedService background workers that close their processor within IHostedService.StopAsync . Combined with how StopAsync is called in sequence, gracefully terminating these services can now take multiple minutes. I guess I can just take a fire and forget approach to calling StopProcessingAsync/CloseAsync for now.

Here is the reproduction I made:

using System;
using System.Diagnostics;
using System.Threading;
using System.Threading.Tasks;
using Azure.Messaging.ServiceBus;

namespace ConsoleApp1
{
    class Program
    {
        static async Task Main(string[] args)
        {
            const string connectionString = @"<CONNECTIONSTRING>";
            var ct = CancellationToken.None;

            await using var client = new ServiceBusClient(connectionString);

            var processorOptions = new ServiceBusProcessorOptions
            {
                AutoCompleteMessages = false,
                MaxConcurrentCalls = 1,
                ReceiveMode = ServiceBusReceiveMode.PeekLock
            };

            var processor = client.CreateProcessor("test", processorOptions);
            processor.ProcessMessageAsync += _ => throw new NotImplementedException();
            processor.ProcessErrorAsync += _ => Task.CompletedTask;

            var stopwatch = Stopwatch.StartNew();
            await processor.StartProcessingAsync(ct);
            stopwatch.Stop();
            Console.WriteLine($"Started {processor.EntityPath}: {stopwatch.Elapsed}");

            // I think this delay is important so execution can get far enough into AmqpReceiver.ReceiveMessagesAsync
            await Task.Delay(TimeSpan.FromSeconds(3));

            Console.WriteLine($"Stopping {processor.EntityPath}...");
            stopwatch = Stopwatch.StartNew();

            // This is where the problem seems to be. When this is called ServiceBusProcessor.RunningTaskTokenSource will
            // have cancel called on it to try to bring the receivers down. Meanwhile in AmqpReceiver.ReceiveMessagesAsyncInternal
            // the call to link.BeginReceiveRemoteMessages does not seem to have any way to be notifed of that cancellation.
            // My guess is that an http timeout is required to cause the cancellation to be noticed which is why it always seems
            // to wait for one minute.
            await processor.StopProcessingAsync(ct);

            stopwatch.Stop();
            Console.WriteLine($"Stopped {processor.EntityPath}: {stopwatch.Elapsed}");

            Console.WriteLine($"Closing {processor.EntityPath}...");
            stopwatch = Stopwatch.StartNew();
            await processor.CloseAsync(ct);
            stopwatch.Stop();
            Console.WriteLine($"Closed {processor.EntityPath}: {stopwatch.Elapsed}");

            // TODO: dispose when https://github.com/Azure/azure-sdk-for-net/pull/18027 is merged
        }
    }
}

@JoshLove-msft
Copy link
Member

JoshLove-msft commented Jan 18, 2021

Regarding the workaround of changing the TryTimeout to 15 seconds, does that mean my services will make 4 times as many AMQP requests compared to the default? The client is using that timeout for long polling right? If so, could that cause any possible issues with placing extra burden on my namespaces?

Yes, if your queues are often empty, this would mean that the processor would end up doing 4x as many receive requests over a given period of time, which would impact the number of operations you are doing against your namespace and could subject you to throttling if you are doing more than 1000 operations per second (https://docs.microsoft.com/en-us/azure/service-bus-messaging/service-bus-throttling#what-are-the-credit-limits). If the queues generally contain messages, then it wouldn't impact the number of requests, since the receive calls would return as soon as a message was retrieved.

Assuming ReceivingAmqpLink never has support for cancellation, are there any possibilities to work around that limitation? For example could the AmqpReceiver be placed into a state of "stopping" to quickly return to the caller for Close/Dispose? In a stopping state maybe it could reject any new messages so that the link timeout can happen asynchronously and probably even just allow the process to die much quicker.

I think this could be possible but it would still require work in the underlying AMQP library which lives here - https://github.com/Azure/azure-amqp. I think we can also consider having special behavior in the Service Bus SDK where we will attempt to shutdown the link early even if a receive call is in process for the specific case of stopping the processor.

@JoshLove-msft
Copy link
Member

Fixed in #19955

@wizofaus
Copy link

wizofaus commented Apr 6, 2021

I just got hit with this going through the basic samples for code using the Azure.Messaging.ServiceBus library, I was certainly convinced it was a bug. I'm not entirely sure I understand what the fix actually is, is there actually a change to the default behaviour of StopProcessingAsync( ) (which seems to take ~60 seconds to timeout)?

@JoshLove-msft
Copy link
Member

JoshLove-msft commented Apr 6, 2021

Yeah there is a fix that will be available in an upcoming beta. The GA fix is still at least a month out.
@wizofaus the reason it takes 60 seconds is that this is the timeout for any receive calls that occuring. The fix is to cancel the receive call when stopping, so that we don't wait that long. We still will wait for any in flight callbacks that are doing work.

@JoshLove-msft
Copy link
Member

Reopening until the GA fix is available.

@JoshLove-msft JoshLove-msft reopened this Apr 6, 2021
@JoshLove-msft
Copy link
Member

JoshLove-msft commented Jun 17, 2021

It turns out the current fix for this has some unintended side effects. We still plan to fix this once support is added for cancellation tokens in the AMQP library, but may need to revert this change for now.

See #21869

@JoshLove-msft
Copy link
Member

The side effects seem to be not caused by the fix to this issue. The fix for this is in 7.2.0 stable release.

@xbaha
Copy link

xbaha commented Jan 7, 2022

The issue is not fixed, I am having the same problem, after :
await processor.StopProcessingAsync(ct);

just put a breakpoint, wait for a minute, then continue, you will always get this error.
using Azure Messaging ServiceBus v.7.5.1

@JoshLove-msft
Copy link
Member

@xbaha what error are you referring to?

@xbaha
Copy link

xbaha commented Jan 7, 2022

@JoshLove-msft the error I got is:
"The operation did not complete within the allocated time 00:00:30 for object tls0."

openapi-sdkautomation bot pushed a commit to AzureSDKAutomation/azure-sdk-for-net that referenced this issue Feb 25, 2022
Add xms-ids for machinelearning, machinelearningcompute, machinelearningexperimentation and machinelearningservices (Azure#17734)

* add xmsids for machinelearning rps

* update inner properties

* address code review comments
openapi-sdkautomation bot pushed a commit to AzureSDKAutomation/azure-sdk-for-net that referenced this issue Feb 25, 2022
Add xms-ids for machinelearning, machinelearningcompute, machinelearningexperimentation and machinelearningservices (Azure#17734)

* add xmsids for machinelearning rps

* update inner properties

* address code review comments
openapi-sdkautomation bot pushed a commit to AzureSDKAutomation/azure-sdk-for-net that referenced this issue Feb 25, 2022
Add xms-ids for machinelearning, machinelearningcompute, machinelearningexperimentation and machinelearningservices (Azure#17734)

* add xmsids for machinelearning rps

* update inner properties

* address code review comments
@sashahdo
Copy link

Hello, I believe I'm still facing this issue despite using Azure.Messaging.ServiceBus 7.7.0 (as well as Azure.Messaging.EventHubs 5.1.0).
In the below sample, StopProcessingAsync never finishes, and the second log never happens. Instead, I get a TaskCanceledException

sample:

log("ServiceBus is stopping.")
await processor.StopProcessingAsync(token).ConfigureAwait(false);
await processor.DisposeAsync().ConfigureAwait(false);
await client.DisposeAsync().ConfigureAwait(false);
log("ServiceBus has stopped.")

@JoshLove-msft
Copy link
Member

Because you are passing a cancellation token to StopProcessingAsync, the stopping will be canceled if the token is canceled. Do you know if the token is canceled?

@sashahdo
Copy link

Given the taskCanceledException, I believe the token is canceled. However, if I run locally without the token being cancelled, it's just stuck at StopProcessingAsync and never gets to the next line. Even after >5 minutes.

So currently, if the task is cancelled, it throws the exception, and if it isn't cancelled, it just runs forever. Either way, the StopProcessingAsync() call never completes.

@JoshLove-msft
Copy link
Member

StopProcessingAsync will wait for any event handlers to finish running. Is it possible that your event handlers are not completing?

@sashahdo
Copy link

This is happening even if the ServiceBus is empty.
I tried testing by just running StopAsync on the first line of the event handler as well as before calling StartProcessingAsync and this still happens. Also, the main problem is that this happens whenever I restart the AppService, even if there are 0 events and nothing is being processed. So on restart we consistently get a taskCanceledException.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 28, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Bus
Projects
None yet
Development

No branches or pull requests

7 participants