Ability to cancel receive operations #19955

danielmarbach · 2021-03-30T20:00:23Z

Alternative to #19888

Closes #19306

All SDK Contribution checklist:

This checklist is used to make sure that common guidelines for a pull request are followed.

Please open PR in Draft mode if it is:
- Work in progress or not intended to be merged.
- Encountering multiple pipeline failures and working on fixes.
If an SDK is being regenerated based on a new swagger spec, a link to the pull request containing these swagger spec changes has been included above.
I have read the contribution guidelines.
The pull request does not introduce breaking changes.

General Guidelines and Best Practices

Title of the pull request is clear and informative.
There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

Pull request includes test coverage for the included changes.

SDK Generation Guidelines

The generate.cmd file for the SDK has been updated with the version of AutoRest, as well as the commitid of your swagger spec or link to the swagger spec, used to generate the code. (Track 2 only)
The *.csproj and AssemblyInfo.cs files have been updated with the new version of the SDK. Please double check nuget.org current release version.

Additional management plane SDK specific contribution checklist:

Note: Only applies to Microsoft.Azure.Management.[RP] or Azure.ResourceManager.[RP]

Include updated management metadata.
Update AzureRP.props to add/remove version info to maintain up to date API versions.

Management plane SDK Troubleshooting

If this is very first SDK for a services and you are adding new service folders directly under /SDK, please add new service label and/or contact assigned reviewer.
If the check fails at the Verify Code Generation step, please ensure:
- Do not modify any code in generated folders.
- Do not selectively include/remove generated files in the PR.
- Do use generate.ps1/cmd to generate this PR instead of calling autorest directly.
  Please pay attention to the @microsoft.csharp version output after running generate.ps1. If it is lower than current released version (2.3.82), please run it again as it should pull down the latest version,

Old outstanding PR cleanup

Please note:
If PRs (including draft) has been out for more than 60 days and there are no responses from our query or followups, they will be closed to maintain a concise list for our reviewers.

ghost · 2021-03-30T20:00:29Z

Thank you for your contribution @danielmarbach! We will review the pull request and get back to you soon.

danielmarbach · 2021-03-30T20:01:07Z

sdk/servicebus/Azure.Messaging.ServiceBus/src/Amqp/AmqpReceiver.cs

+                                TimeSpan.FromMilliseconds(20),
+                                maxWaitTime ?? timeout,
+                                callback,
+                                (link, receiveMessagesCompletionSource));


This comes with the cost of boxing yet I still think it is better than the alternative describe above

danielmarbach · 2021-03-30T20:03:04Z

sdk/servicebus/Azure.Messaging.ServiceBus/src/Amqp/AmqpReceiver.cs

+                }, receiveMessagesCompletionSource, useSynchronizationContext: false);
+
+                // in case BeginReceiveRemoteMessages throws exception will be materialized on the synchronous path
+                _ = Task.Factory


At first I had an approach following this pattern

var receiveTask = Receive(...) var completed = await Task.WhenAny(receiveTask, tcs.Task).ConfigureAwait(false) if(completed == tcs.Task) await tcs.Task.ConfigureAwait() await receiveTask.ConfigureAwait()

But then I figure with this approach we always pay the price of the array allocation of WhenAny plus the additional conditions on the path including the state machinery. So I ended up always using the TCS to materialize either the result, exceptions of end or the cancellation.

Let me know if you prefer the WhenAny approach. While it has more state machine involved and allocates the WhenAny array it wouldn't require us to box the value tuple and might make the code slightly more straightforward to read at the cost of more allocations

I like your implementation better; I sketched out the WhenAny approach in the issue, but had no idea that Register was a thing. While this does introduce a bit more density in the FromAsync machinery, it already had some complexity, With your current approach, I find the flow easier to follow than the "which task completed" juggling that WaitAny would require.

danielmarbach · 2021-03-30T20:12:10Z

sdk/servicebus/Azure.Messaging.ServiceBus/tests/Processor/ProcessorLiveTests.cs

+
+                await processor.StartProcessingAsync();
+                await tcs.Task;
+                await Task.Delay(10000); // wait long enough to be hanging in the next receive on the empty queue


@jsquire I tested the other proposed approach and unfortunately those tests passed even when I reverted my cancellation changes in the receive method. They passed because it wasn't guaranteed that the code was hanging in another receive attempt

JoshLove-msft · 2021-03-30T21:13:21Z

sdk/servicebus/Azure.Messaging.ServiceBus/tests/Receiver/ReceiverLiveTests.cs

@@ -99,6 +99,43 @@ await using (var scope = await ServiceBusScope.CreateWithQueue(enablePartitionin
            }
        }

+        [Test]
+        public async Task ReceiveMessagesWhenQueueEmpty()


I think we also want to prove that cancelling won't increment the delivery count for any messages that were already in the Amqp library's local buffer. This may be hard to do, but maybe we can try sending a message just before we cancel?

I can also add these tests in a follow up PR.

JoshLove-msft · 2021-03-30T21:25:14Z

sdk/servicebus/Azure.Messaging.ServiceBus/tests/Processor/ProcessorLiveTests.cs

+                using var cancellationTokenSource = new CancellationTokenSource(TimeSpan.FromSeconds(3));
+
+                var start = DateTime.UtcNow;
+                await processor.StopProcessingAsync(cancellationTokenSource.Token);


Can we also have a test that doesn't pass a token here (or update an existing test to assert the time elapsed)? It should still stop processing pretty quickly (or at least as quick as the user handler takes to complete). We would also want a test that verifies that stopping still allows in-flight user handlers to complete.

I can also add these tests in a follow up PR.

JoshLove-msft · 2021-03-30T21:27:38Z

Addresses #17734

JoshLove-msft · 2021-03-30T21:39:07Z

sdk/servicebus/Azure.Messaging.ServiceBus/src/Amqp/AmqpReceiver.cs

                    receivedMessages.Add(AmqpMessageConverter.AmqpMessageToSBMessage(message));
                    message.Dispose();
                }

                return receivedMessages;
            }
+            catch (OperationCanceledException)


Should we check if cancellationToken.IsCancellationRequest? And also possibly restrict to TaskCanceledException?

IIRC, we can't restrict there because the completion source will throw the OperationCanceledException. We try to normalize everything to TaskCanceledException around the SDK so that is what callers see. The Service Bus and Event Hubs troubleshooting guides attribute a specific meaning to OperationCanceled that indicates service behavior and we wanted to avoid confusion.

OperationCanceledException is the base of TaskCanceledException so I figured then when conditions can be removed.

JoshLove-msft · 2021-03-30T21:39:59Z

/azp run net - servicebus - tests

azure-pipelines · 2021-03-30T21:40:10Z

Azure Pipelines successfully started running 1 pipeline(s).

JoshLove-msft · 2021-03-30T21:42:41Z

This is awesome! Thanks @danielmarbach

jsquire

Late to the party, but LGTM.

jsquire · 2021-03-31T13:51:52Z

sdk/servicebus/Azure.Messaging.ServiceBus/src/Amqp/AmqpReceiver.cs

+                }, receiveMessagesCompletionSource, useSynchronizationContext: false);
+
+                // in case BeginReceiveRemoteMessages throws exception will be materialized on the synchronous path
+                _ = Task.Factory


I like your implementation better; I sketched out the WhenAny approach in the issue, but had no idea that Register was a thing. While this does introduce a bit more density in the FromAsync machinery, it already had some complexity, With your current approach, I find the flow easier to follow than the "which task completed" juggling that WaitAny would require.

jsquire · 2021-03-31T13:54:14Z

sdk/servicebus/Azure.Messaging.ServiceBus/src/Amqp/AmqpReceiver.cs

                    receivedMessages.Add(AmqpMessageConverter.AmqpMessageToSBMessage(message));
                    message.Dispose();
                }

                return receivedMessages;
            }
+            catch (OperationCanceledException)


IIRC, we can't restrict there because the completion source will throw the OperationCanceledException. We try to normalize everything to TaskCanceledException around the SDK so that is what callers see. The Service Bus and Event Hubs troubleshooting guides attribute a specific meaning to OperationCanceled that indicates service behavior and we wanted to avoid confusion.

jsquire · 2021-03-31T13:56:33Z

sdk/servicebus/Azure.Messaging.ServiceBus/src/Amqp/AmqpReceiver.cs

+                                TimeSpan.FromMilliseconds(20),
+                                maxWaitTime ?? timeout,
+                                callback,
+                                (link, receiveMessagesCompletionSource));


JoshLove-msft · 2021-03-31T19:39:41Z

sdk/servicebus/Azure.Messaging.ServiceBus/src/Amqp/AmqpReceiver.cs

-                    },
-                    (link, maxMessages, maxWaitTime, timeout),
-                    default
+                var receiveMessagesCompletionSource =


Unfortunately, I think we may need to revert this. @jsquire pointed out that with this approach we will just be leaving receive operations hanging on the AMQP link, which will cause a backup. I confirmed this with a test that attempts to receive after a previous cancel. I think we will need to either limit the scope of this change to just StopProcessing calls, because in that case it is okay that receive operations are blocked, or better yet, see if we can contribute Cancellation token support to the AMQP library.
/cc @xinchen10

public async Task CancellingDoesNotBlockSubsequentReceives(bool prefetch) { await using (var scope = await ServiceBusScope.CreateWithQueue(enablePartitioning: false, enableSession: false)) { await using var client = CreateClient(); ServiceBusSender sender = client.CreateSender(scope.QueueName); var receiver = client.CreateReceiver(scope.QueueName, new ServiceBusReceiverOptions { PrefetchCount = prefetch ? 10 : 0 }); using var cancellationTokenSource = new CancellationTokenSource(2000); var start = DateTime.UtcNow; Assert.That( async () => await receiver.ReceiveMessageAsync(TimeSpan.FromSeconds(60), cancellationToken: cancellationTokenSource.Token), Throws.InstanceOf<TaskCanceledException>()); await sender.SendMessageAsync(GetMessage()); var msg = await receiver.ReceiveMessageAsync(); Assert.AreEqual(1, msg.DeliveryCount); var end = DateTime.UtcNow; Assert.NotNull(msg); Assert.Less(end - start, TimeSpan.FromSeconds(5)); } }

The above test fails on the second receive call as we are blocked on the cancelled receive.

That's why we originally closed the link in the other PR but that has also other drawbacks. I think even StopProcessing can be problematic because the processor is designed to be restarted right?

Yes, the processor can be restarted - actually StopProcessing just stops receiving rather than closing any links. Close/Dispose would close links. I really think the best way forward is to try to get this integrated into the AMQP lib, so that we can actually end the operations early instead of ignoring them.

I would vote for option 2.

For 1, I don't think cancelling pending receive calls when totalCredit is 0 is sufficient because there could be concurrent receives occurring on the same link. Even with option 2, we wouldn't be able to correlate receive calls with ReceiveAsyncResults. IMO the cancellation token provides the best user experience.

I really have a hard time to understand all the push back against cancellationtoken. Cooperative cancellation is the defacto standard in dotnet for IO bound operations. It is present almost anywhere in moderns async enabled API even in the runtime as well as across the ecosystem. Even the SDK guidance of the whole azure SDK where a lot of people have contributed to and intense user studies have been done adheres to those principles because this is how this ecosystem works. So why so much push back?

@danielmarbach the push back is not for cancellation tokens. Its more about supporting the shutdown scenario in a better way that also makes sense to AMQP (I admit that I am influenced more by other AMQP implementations, especially the Apache Qpid products and their JMS implementation). Your PR to the AMQP library (thank you for that) adds cancellation token to the receive method only. It gives a feeling that the library API is created on a needed basis and it was done just to make the shutdown scenario work. To properly support cancellation tokens, we will also need to look at other Task based APIs.

Fair enough. Unfortunately, I'm only a community contributor without corporate backing so the only thing I could commit to in my precious spare time was exactly that. Your comment put things under a different light, and it sounds more like the door is open rather than the door is closed which I have potentially unrightfully experienced or shall I say read into the conversations. I appreciate you taking the time and clarify that.

If there would be some way to openly share on the repo some plans, ideas, directions including things that could be done I'm happy to contribute a few things when I have time, and it fits my small knowledge area that I have of the AMQP lib. For me, it boils down to have this project under some sort of active governance and communication plan to see where things are heading to (or not).

[SQL] Bump ManagedDatabaseRestoreDetails and ManagedDatabase version in v5 tag (Azure#19955) * Bump managedDatabaseRestoreDetails version * bump managed databases version as well

danielmarbach added 6 commits March 30, 2021 20:02

A crude test to start with

30b8937

Provide the ability for the ReceiveMessagesAsyncInternal to be canceled

9be9a9d

Simplify processor catch

0605a2e

Higher try timeout

8518f24

Receiver Test

6048076

SessionTest

3481277

danielmarbach requested review from JoshLove-msft and jsquire as code owners March 30, 2021 20:00

ghost added Service Bus customer-reported Issues that are reported by GitHub users external to the Azure organization. labels Mar 30, 2021

ghost added the Community Contribution Community members are working on the issue label Mar 30, 2021

danielmarbach commented Mar 30, 2021

View reviewed changes

Materialize exceptions from EndReceiveMessages

72ff1f7

danielmarbach commented Mar 30, 2021

View reviewed changes

This was referenced Mar 30, 2021

Stop cancellation spike #19888

Closed

Investigate Force-Closing AMQP Links for Cancellation #19306

Closed

JoshLove-msft reviewed Mar 30, 2021

View reviewed changes

JoshLove-msft approved these changes Mar 30, 2021

View reviewed changes

JoshLove-msft merged commit 1235a9c into Azure:master Mar 30, 2021

JoshLove-msft mentioned this pull request Mar 31, 2021

[BUG] Timeout stopping Service Bus Processor #17734

Closed

jsquire reviewed Mar 31, 2021

View reviewed changes

danielmarbach deleted the cancellation-take2 branch March 31, 2021 14:41

JoshLove-msft reviewed Mar 31, 2021

View reviewed changes

JoshLove-msft mentioned this pull request Mar 31, 2021

Close link on cancellation #20012

Closed

jsquire mentioned this pull request May 20, 2021

[Event Hubs Client] Processor Stop - Aborts Links #21242

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ability to cancel receive operations #19955

Ability to cancel receive operations #19955

danielmarbach commented Mar 30, 2021

ghost commented Mar 30, 2021

danielmarbach Mar 30, 2021

jsquire Mar 31, 2021

danielmarbach Mar 30, 2021

danielmarbach Mar 30, 2021 •

edited

Loading

danielmarbach Mar 30, 2021

jsquire Mar 31, 2021

danielmarbach Mar 30, 2021

JoshLove-msft Mar 30, 2021

JoshLove-msft Mar 30, 2021

JoshLove-msft Mar 30, 2021 •

edited

Loading

JoshLove-msft Mar 30, 2021

JoshLove-msft commented Mar 30, 2021

JoshLove-msft Mar 30, 2021

jsquire Mar 31, 2021

danielmarbach Mar 31, 2021

JoshLove-msft commented Mar 30, 2021

azure-pipelines bot commented Mar 30, 2021

JoshLove-msft commented Mar 30, 2021

jsquire left a comment

jsquire Mar 31, 2021

jsquire Mar 31, 2021

jsquire Mar 31, 2021

JoshLove-msft Mar 31, 2021

JoshLove-msft Mar 31, 2021

JoshLove-msft Mar 31, 2021

danielmarbach Mar 31, 2021

JoshLove-msft Mar 31, 2021 •

edited

Loading

JoshLove-msft Mar 31, 2021 •

edited

Loading

JoshLove-msft Apr 1, 2021 •

edited

Loading

danielmarbach Apr 1, 2021

xinchen10 Apr 1, 2021

danielmarbach Apr 1, 2021

Ability to cancel receive operations #19955

Ability to cancel receive operations #19955

Conversation

danielmarbach commented Mar 30, 2021

All SDK Contribution checklist:

Additional management plane SDK specific contribution checklist:

Management plane SDK Troubleshooting

Old outstanding PR cleanup

ghost commented Mar 30, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danielmarbach Mar 30, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JoshLove-msft Mar 30, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JoshLove-msft commented Mar 30, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JoshLove-msft commented Mar 30, 2021

azure-pipelines bot commented Mar 30, 2021

JoshLove-msft commented Mar 30, 2021

jsquire left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JoshLove-msft Mar 31, 2021 • edited Loading

Choose a reason for hiding this comment

JoshLove-msft Mar 31, 2021 • edited Loading

Choose a reason for hiding this comment

JoshLove-msft Apr 1, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danielmarbach Mar 30, 2021 •

edited

Loading

JoshLove-msft Mar 30, 2021 •

edited

Loading

JoshLove-msft Mar 31, 2021 •

edited

Loading

JoshLove-msft Mar 31, 2021 •

edited

Loading

JoshLove-msft Apr 1, 2021 •

edited

Loading