Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] ServiceBusSessionProcessor fails to stop in a timely manner #42410

Closed
paul-datatech911 opened this issue Mar 5, 2024 · 11 comments
Closed
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. issue-addressed Workflow: The Azure SDK team believes it to be addressed and ready to close. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Bus

Comments

@paul-datatech911
Copy link

paul-datatech911 commented Mar 5, 2024

Library name and version

Azure.Messagin.ServiceBus 7.17.3

Describe the bug

Calling processor.StopProcessingAsync(), processor.CloseAsync() or processor.DisposeAsync() takes up to 60 seconds.

This has been documented in multiple threads (now closed), and marked as merged/fix for EventHubs (#21242).

The issue remains for ServiceBusSessionProcessor.

#17734
#19306
#21869

Expected behavior

Calling processor.StopProcessingAsync(), processor.CloseAsync() or processor.DisposeAsync() should stop the processing of messages and close the underlying transport immediately.

Actual behavior

Calling processor.StopProcessingAsync(), processor.CloseAsync() or processor.DisposeAsync() take up to the default 60s polling period of the underlying amqp transport to return.

Reproduction Steps

var client = new ServiceBusClient(<CONN_STR>);
var processor = client.CreateSessionProcessor("topic", "subscription");
await processor.StartProcessingAsync();

// Stop processing and unsubscribe from events
await processor.StopProcessingAsync();
// ... unhook events

// Close/Dispose
await processor.DisposeAsync();

Environment

No response

@github-actions github-actions bot added Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Event Hubs needs-team-triage Workflow: This issue needs the team to triage. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Mar 5, 2024
@jsquire jsquire self-assigned this Mar 5, 2024
@jsquire
Copy link
Member

jsquire commented Mar 5, 2024

Hi @paul-datatech911. Thanks for reaching out and we regret that you're experiencing difficulties. One thing to note is that the processor will wait for your message processing handler to complete before processing stops. If your code chooses not to honor the cancellation token in the EventArgs when signaled, then the processor will remain running without dispatching additional messages until all outstanding handler calls complete. As you did not include your handlers in your repro sample, I can't say whether or not that is the cause of what you're seeing.

I'm unable to reproduce the behavior that you're reporting using the latest release (7.17.4) using a basic set of handlers with the same client pattern.

    await using var client = new ServiceBusClient("<< CONNECTION STRING >>");
    await using var processor = client.CreateSessionProcessor("<< TOPIC >>", "<< SUBSCRIPTION >>");

    processor.ProcessMessageAsync += async args =>
    {
        args.Message.Body.ToString().Dump();
        await args.CompleteMessageAsync(args.Message);
    };
    
    processor.ProcessErrorAsync += args =>
    {
        args.Exception.Dump();
        return Task.CompletedTask;  
    };
    
    await processor.StartProcessingAsync();
    await Task.Delay(TimeSpan.FromSeconds(10));
    
    var stopWatch = Stopwatch.StartNew();
    await processor.StopProcessingAsync();
    stopWatch.Stop();
    
    stopWatch.Elapsed.Dump();  

With that snippet, I'm seeing a number of messages be processed while the delay is active and then stopping completes in milliseconds:

... SNIP ...
Message-80
Message-81
Message-82
Message-83
Message-84
Message-85
00:00:00.0362204

If you are able to reproduce with the latest version of the library and you're sure that your handler is not blocking shutdown, please unresolve and provide a small, self-contained minimal repro and we'd be happy to take another look.

@jsquire jsquire added issue-addressed Workflow: The Azure SDK team believes it to be addressed and ready to close. and removed needs-team-triage Workflow: This issue needs the team to triage. labels Mar 5, 2024
Copy link

github-actions bot commented Mar 5, 2024

Hi @paul-datatech911. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text "/unresolve" to remove the "issue-addressed" label and continue the conversation.

@paul-datatech911
Copy link
Author

paul-datatech911 commented Mar 5, 2024 via email

@paul-datatech911
Copy link
Author

paul-datatech911 commented Mar 5, 2024

/unresolve

Yes I do realize that CloseAsync() and DisposeAsync() don't both need to be called, but for completeness they are all included.

using Azure.Messaging.ServiceBus;
using Azure.Messaging.ServiceBus.Administration;
using ConsoleDump;
using System.Diagnostics;

var connStr = "<<CONNSTRING>>";
var topic = "testtopic";
var subscription = "testSubscription";

var adminClient = new ServiceBusAdministrationClient(connStr);
if(!(await adminClient.TopicExistsAsync(topic)))
	await adminClient.CreateTopicAsync(topic);
if(!(await adminClient.SubscriptionExistsAsync(topic, subscription)))
	await adminClient.CreateSubscriptionAsync(new CreateSubscriptionOptions(topic, subscription) 
	{		
		RequiresSession = true
	});

await using var client = new ServiceBusClient(connStr);
await using var processor = client.CreateSessionProcessor("testtopic", "testsubscription", new ServiceBusSessionProcessorOptions()
{
	AutoCompleteMessages = false,
	MaxConcurrentCallsPerSession = 1,
	ReceiveMode = ServiceBusReceiveMode.PeekLock
	//,MaxConcurrentSessions = 16	
});

processor.ProcessMessageAsync += async args =>
{
	await args.CompleteMessageAsync(args.Message);
	args.Message.Body.ToString().Dump();
	await args.CompleteMessageAsync(args.Message);
};

processor.ProcessErrorAsync += args =>
{
	args.Exception.Dump();
	return Task.CompletedTask;
};

await processor.StartProcessingAsync();
await Task.Delay(TimeSpan.FromSeconds(5));

TimeAction("StopProcessingAsync", async  () => await processor.StopProcessingAsync());
TimeAction("CloseAsync",async () =>  await processor.CloseAsync());
TimeAction("DisposeAsync", async () => await processor.DisposeAsync());

Console.Read();

static void TimeAction(string name, Action toTime)
{
	try
	{
		var sw = Stopwatch.StartNew();
		toTime();
		$"'{name ?? "UNKNOWN"}' took {sw.Elapsed.TotalSeconds}s".Dump();
	}
	catch { }
}
// Default (8) Sessions
'StopProcessingAsync' took 4.4939928s
'CloseAsync' took 0.0029284s
'DisposeAsync' took 0.0007749s

// 16 Sessions
'StopProcessingAsync' took 8.7078234s
'CloseAsync' took 0.0028314s
'DisposeAsync' took 0.0006204s

// 32 Sessions
'StopProcessingAsync' took 14.7775529s
'CloseAsync' took 0.0033005s
'DisposeAsync' took 0.0008415s

// 64 Sessions
'StopProcessingAsync' took 16.6488138s
'CloseAsync' took 0.0031693s
'DisposeAsync' took 0.0006256s

// 128 Sessions
'StopProcessingAsync' took 25.6376603s
'CloseAsync' took 0.0027674s
'DisposeAsync' took 0.0006864s

// 256 Sessions
'StopProcessingAsync' took 29.0412196s
'CloseAsync' took 0.0027741s
'DisposeAsync' took 0.0008701s

// 512 Sessions
'StopProcessingAsync' took 45.67275s
'CloseAsync' took 0.002963s
'DisposeAsync' took 0.0006816s

@github-actions github-actions bot added needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team and removed issue-addressed Workflow: The Azure SDK team believes it to be addressed and ready to close. labels Mar 5, 2024
@jsquire
Copy link
Member

jsquire commented Mar 6, 2024

@paul-datatech911 : The results that you're seeing are normal behavior under high degrees of concurrency. Cancellation in the AMQP transport is best effort and is possible only during points in the protocol where it can be safely performed. When stopping, the processor will immediately request cancellation of all active network operations and then must wait for the transport to complete.

Each network operation is asynchronous and cancellation/completing requires the async continuations to be scheduled and completed. As your host is unlikely to have 128+ CPU cores available to run that amount of work in parallel, the .NET thread pool has to wait for resources to become available to schedule completion of those network operations and return control to the processor. I'd suggest reading through the Processor appears to hang or have latency issues when using high concurrency section of the Troubleshooting Guide for additional context and discussion.

@jsquire jsquire added issue-addressed Workflow: The Azure SDK team believes it to be addressed and ready to close. and removed needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team labels Mar 6, 2024
Copy link

github-actions bot commented Mar 6, 2024

Hi @paul-datatech911. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text "/unresolve" to remove the "issue-addressed" label and continue the conversation.

@paul-datatech911
Copy link
Author

paul-datatech911 commented Mar 6, 2024

/unresolve

I read it thanks for the link - although it doesn't give any good reason for why high concurrency should be an issue (in the absence of blocking sync or async-over-async), it just says that it is.

That just means that there is still some sort of blocking behavior in the transport itself, correct? I mean, it doesn't matter if I have 1 core and need to honor joined cancellation on 1000 threads, it can be and should be fast if there's no blocking synchronous calls. Which team/repo would you suggest an improvement request be created in?

In case you're curious, the use case is a pretty simple multi-tenant with the sessionId being the tenantId - doesn't seem exotic.

Also, if I set the concurrency to the number of cores on my system, I don't get the same performance as you do:

'StopProcessingAsync' took 2.9088147s
'CloseAsync' took 0.003914s
'DisposeAsync' took 0.0009842s

Paul

@github-actions github-actions bot added needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team and removed issue-addressed Workflow: The Azure SDK team believes it to be addressed and ready to close. labels Mar 6, 2024
@jsquire
Copy link
Member

jsquire commented Mar 6, 2024

The TSG doesn't go into deep concurrency discussion because it is not an issue related to the Azure SDK - it is a general development concept. The relevant information from the SDK perspective is that each degree of concurrency is managed as an independent background task in the thread pool.

Also, if I set the concurrency to the number of cores on my system, I don't get the same performance as you do:

As mentioned, the performance characteristics here will depend on the host system, its network, and at what point the cancellation is honored by the transport. It is within what would be a reasonable range, and isn't something that indicates an issue that with the Service Bus library.

Which team/repo would you suggest an improvement request be created in?

You're welcome to open an issue in the Microsoft.Azure.Amqp repository.

@jsquire jsquire added issue-addressed Workflow: The Azure SDK team believes it to be addressed and ready to close. and removed needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team labels Mar 6, 2024
Copy link

github-actions bot commented Mar 6, 2024

Hi @paul-datatech911. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text "/unresolve" to remove the "issue-addressed" label and continue the conversation.

@paul-datatech911
Copy link
Author

Thanks for explaining concurrency as a general development concept...

I think you are correct at hinting at "at which point the cancellation is honored in the transport", and I'll pass the buck over to someone else that may be willing to take a look!

Paul

Copy link

Hi @paul-datatech911, since you haven’t asked that we /unresolve the issue, we’ll close this out. If you believe further discussion is needed, please add a comment /unresolve to reopen the issue.

@github-actions github-actions bot locked and limited conversation to collaborators Jun 11, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. issue-addressed Workflow: The Azure SDK team believes it to be addressed and ready to close. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Bus
Projects
None yet
Development

No branches or pull requests

2 participants