Skip to content
This repository has been archived by the owner on Jul 19, 2024. It is now read-only.

Azure Functions ServiceBusTrigger prevent abandon on Exception #183

Closed
Bpflugrad opened this issue Mar 21, 2019 · 17 comments
Closed

Azure Functions ServiceBusTrigger prevent abandon on Exception #183

Bpflugrad opened this issue Mar 21, 2019 · 17 comments
Labels

Comments

@Bpflugrad
Copy link

As a developer I would like to be able to send a Service Bus message to the DeadLetter Queue and throw an Exception without triggering the automatic Abandon behavior when an Exception is thrown. This is because I want to surface errors in Application Insights by considering requests that results in a message being DeadLettered as a failed request. This occurs even when extensions.serviceBus.messageHandlerOptions.autoComplete is false.

Repro steps

Create an ServiceBusTrigger function like below:

        [FunctionName("QueueTest")]
        public static async Task QueueTest(
            [ServiceBusTrigger("testqueue", Connection = "TestQueue")]string myQueueItem, 
            ILogger log,
            MessageReceiver messageReceiver,
            string lockToken
            )
        {
            log.LogInformation($"C# ServiceBus queue trigger function processed message: '{myQueueItem}'");

            await messageReceiver.DeadLetterAsync(lockToken, "DL");

            log.LogInformation($"C# ServiceBus queue trigger function finished processing.");

            throw new Exception("DL");
        }

Add an item to the test queue and observe the behavior.

Expected/Desired behavior

One exception thrown.

[3/21/2019 6:57:41 PM] Executed 'QueueTest' (Failed, Id=a76958d9-cefa-4d85-933e-178917709ef8)
[3/21/2019 6:57:41 PM] System.Private.CoreLib: Exception while executing function: QueueTest. QueueTestFunction: DL.
[3/21/2019 6:57:41 PM] MessageReceiver error (Action=UserCallback, ClientId=MessageReceiver1testqueue, EntityPath=testqueue, Endpoint=sonobello.servicebus.windows.net)
[3/21/2019 6:57:41 PM] System.Private.CoreLib: Exception while executing function: QueueTest. QueueTestFunction: DL.

Actual behavior

Two exceptions are thrown.

[3/21/2019 6:57:41 PM] Executed 'QueueTest' (Failed, Id=a76958d9-cefa-4d85-933e-178917709ef8)
[3/21/2019 6:57:41 PM] System.Private.CoreLib: Exception while executing function: QueueTest. QueueTestFunction: DL.
[3/21/2019 6:57:41 PM] MessageReceiver error (Action=UserCallback, ClientId=MessageReceiver1testqueue, EntityPath=testqueue, Endpoint=sonobello.servicebus.windows.net)
[3/21/2019 6:57:41 PM] System.Private.CoreLib: Exception while executing function: QueueTest. QueueTestFunction: DL.
[3/21/2019 6:57:41 PM] MessageReceiver error (Action=Abandon, ClientId=MessageReceiver1testqueue, EntityPath=testqueue, Endpoint=sonobello.servicebus.windows.net)
[3/21/2019 6:57:41 PM] Microsoft.Azure.ServiceBus: The lock supplied is invalid. Either the lock expired, or the message has already been removed from the queue, or was received by a different receiver instance.

Known workarounds

You can not throw the Exception and DeadLetter the message. However, this will not log as a failure in Application Insights.
You can ignore the second Exception, but it is unwieldy and clutters the logs.

Related Information

Packages:

Microsoft.Azure.WebJobs.Extensions.ServiceBus 3.0.3
Microsoft.NET.Sdk.Functions 1.0.26

Related ticket regarding AutoAbandon #691

@Hurtyto
Copy link

Hurtyto commented Jan 8, 2020

Hello everyone. I am working on exactly the same requirement. Do we have any progress on this?

@Hurtyto
Copy link

Hurtyto commented Jan 8, 2020

Hello @alrod :)
Do you have any information on this?
Thanks.

@mike-suffield-ul
Copy link

mike-suffield-ul commented Jul 16, 2020

Still seeing this issue, has anyone found the solution? I thought adding "autoComplete": true to the host.json would help, but doesn't seem to work for me

{
  "version": "2.0",
  "extensions": {
    "serviceBus": {
      "messageHandlerOptions": {
        "autoComplete": true
      }
    }
  }
}

@alainbryden
Copy link

alainbryden commented Jul 22, 2020

I have the exact same problem. We have two requirements:

  1. The ability to control whether a message is completed, dead-lettered, or abandoned
  2. Have the function show as failed ("Error") in Azure portal Monitor, rather than "Success" when this happens.

Everything I've read has told me the only way to have a function show as "Failed" is to throw an Exception at the end of it (which is kind of awful if you ask me) - forcing us to do the following:

[FunctionName("SomeSubscriberAsync")]
public async Task Run(ILogger logger, [ServiceBusTrigger("%ASubscriptionTopic%", "%ASubscription%", Connection = "AServiceBusConnectionString", IsSessionsEnabled = true)]Message message, MessageReceiver messageReceiver)
{
    ReturnCode returnCode = await ProcessMessage(message, logger);
    await ReleaseMessage(returnCode, message, messageReceiver, logger, "SomeSubscriber");

    // See https://stackoverflow.com/questions/42782379 - the only way notify Azure of an error
    // in the function app run is to raise an unhandled exception (even if it has been handled cleanly)
    if(returnCode != ReturnCode.Success)
        throw new Exception("Function execution failed - its all in the logs.");
}

This was supposed to be the method we use to put the message where we want it to go:

/// <summary>Based on how the message was handled, completes, abandons, or dead-letters it.</summary>
public static async Task ReleaseMessage(ReturnCode result, Message message, MessageReceiver messageReceiver, ILogger logger, string handlerName)
{
    switch (result)
    {
        case ReturnCode.Success:
            await messageReceiver.CompleteAsync(message.SystemProperties.LockToken);
            logger.LogInformation($"{handlerName} has 'completed' the message {message.MessageId} due to successful handling.");
            break;
        case ReturnCode.TemporaryFailure:
            await messageReceiver.AbandonAsync(message.SystemProperties.LockToken);
            logger.LogWarning($"{handlerName} has 'abandoned' the message {message.MessageId} so that it can try again later.");
            break;
        case ReturnCode.PermanentFailure:
            await messageReceiver.DeadLetterAsync(message.SystemProperties.LockToken);
            logger.LogWarning($"{handlerName} has 'dead-lettered' the message {message.MessageId} due to a permanent failure.");
            break;
    }
}

But in the ticket description, some sort of auto-handling kicks in after the exception is thrown (despite my "sessionHandlerOptions": { "autoComplete": false) and I just get a huge error dump about the lock no longer being valid:

[7/22/2020 11:54:45 PM] SomeSubscriber has 'abandoned' the message 3bb4a192b65f4c8181d3dc23e6d76907 so that it can try again later.
[7/22/2020 11:55:12 PM] Executed 'SomeSubscriberAsync' (Failed, Id=cc035116-3384-4dc3-aac3-53afa9f8dc9d)
[7/22/2020 11:55:12 PM] System.Private.CoreLib: Exception while executing function: SomeSubscriberAsync. Functions: Function execution failed - its all in the logs.
[7/22/2020 11:55:12 PM] Message processing error (Action=UserCallback, ClientId=SubscriptionClient3aTopic/aSubscriber, EntityPath=aTopic/Subscriptions/aSubscriber, Endpoint=aservicebus.servicebus.windows.net)
[7/22/2020 11:55:12 PM] System.Private.CoreLib: Exception while executing function: SomeSubscriberAsync. Functions: Function execution failed - its all in the logs.
[7/22/2020 11:55:12 PM] Message processing error (Action=Abandon, ClientId=SubscriptionClient3aTopic/aSubscriber, EntityPath=aTopic/Subscriptions/aSubscriber, Endpoint=aservicebus.servicebus.windows.net)
[7/22/2020 11:55:12 PM] Microsoft.Azure.ServiceBus: The session lock has expired on the MessageSession. Accept a new MessageSession.

I even tried "Wait()ing" onCompleteAsync/DeadLetterAsync before throwing, but that doesn't help.

The only thing that prevents these red-herring errors from capping off every failed execution is to go back to not throwing an error, but that leaves me with no way to complete requirement Azure/azure-webjobs-sdk#2 (which is to give any indication to the naked eye that an error occurred). In an ideal world, these requirements would not be mutually exclusive.

@ghost
Copy link

ghost commented Dec 18, 2020

I'm currently hitting this issue as well (and I'm quite surprised since this issue exists for almost 2 years now!). The docs clearly state that when autoComplete: "false", we are responsible for handling abandon as well. So it's quite odd that the functions framework still abandons it on its own. Is there anyone responsible for fixing this?

@andriibutko
Copy link

Don't forget about that issue with using Startup: #81

@ghost
Copy link

ghost commented Jul 5, 2021

@mathewc I'm sorry for tagging you, but it's been 7 months after my comment, and this issue seems to have been buried under all the other issues.
I'm assuming that this issue hasn't been resolved yet, so I'm currently still catching any exception, abandoning it on my own, and not rethrowing it. This makes the execution looks like it has succeeded in application insights, which is wrong. Sure, I can see it in the exceptions tab, but this makes it hard because exceptions do not necessarily mean the function execution has failed.

I would like to know if it will be fixed, because it will make our monitoring a lot more reliable.

@adeliab
Copy link

adeliab commented Jul 9, 2021

I have the same requirement - need to deadletter the message on failure and show the function invocation as well. and I got the same exceptions :(

@ghost
Copy link

ghost commented Aug 18, 2021

Alright, after seeing the availability of version 4.3.0 with the addition to mark an individual trigger with AutoComplete = false in the attribute, instead of having it for all functions in the function app via the host.json file, I decided to do some more testing. (#138)

Without specifying autoComplete: false in the host.json file, I added AutoComplete = false to a single trigger. The result is that the function doesn't abandon it on its own anymore when an exception is thrown. This is the behavior that I've been looking for.

So, while I have the expected behavior at the moment, I'm still baffled that it's only possible to get this by using the attribute setting.
This SDK still violates the documentation while using the host.json file to disable auto complete:

When set to false, you are responsible for calling MessageReceiver methods to complete, abandon, or deadletter the message. If an exception is thrown (and none of the MessageReceiver methods are called), then the lock remains. Once the lock expires, the message is re-queued with the DeliveryCount incremented and the lock is automatically renewed.
https://docs.microsoft.com/en-us/azure/azure-functions/functions-bindings-service-bus#hostjson-settings

@zamzowd
Copy link

zamzowd commented Aug 19, 2021

Without specifying autoComplete: false in the host.json file, I added AutoComplete = false to a single trigger. The result is that the function doesn't abandon it on its own anymore when an exception is thrown. This is the behavior that I've been looking for.

I am not able to reproduce this locally using and setting AutoComplete = false in the ServiceBusTrigger attribute.

I still get this error message logged if I deadletter the message in the code and then throw an exception: Microsoft.Azure.ServiceBus: The lock supplied is invalid. Either the lock expired, or the message has already been removed from the queue, or was received by a different receiver instance.

I do not get this message logged if I throw an exception without deadlettering, completing, or abandoning the message in my code.

This was both with and without all three autoComplete service bus options set to false in the host.json.

I have these nuget packages:

  • Microsoft.Azure.Functions.Extensions 1.1.0
  • Microsoft.Azure.WebJobs.Extensions.ServiceBus 4.3.0
  • Microsoft.NET.Sdk.Functions

.NET Core 3.1 and "version: "2.0" specified in the host.json.

@zamzowd
Copy link

zamzowd commented Aug 20, 2021

I created Azure/azure-sdk-for-net/pull/23423 for this, but I'm not sure how to complete these two steps listed as "prior to issuing a pull request":

  • All code must have completed any necessary legal signoff for being publicly viewable (Patent review, JSR review, etc.)
  • Code should be fully code reviewed.

(of course, the fix itself ended up being very straightforward)

@ghost
Copy link

ghost commented Aug 20, 2021

That is odd. I have double checked my code, and it seems to be working on my end locally:

[FunctionName("AutoComplete")]
public static async Task AutoComplete(
    [ServiceBusTrigger("autocomplete", AutoComplete = false)] Message message,
    MessageReceiver messageReceiver,
    ILogger logger,
    CancellationToken cancellationToken)
{
    await Task.Delay(TimeSpan.FromSeconds(1)); // Delay it a bit

    var text = Encoding.UTF8.GetString(message.Body);

    logger.LogWarning($"Now: {DateTime.UtcNow}");
    logger.LogWarning($"Locked until: {message.SystemProperties.LockedUntilUtc}");
    logger.LogWarning($"Lock token: {message.SystemProperties.LockToken}");

    switch (text)
    {
        case "abandon":
            logger.LogWarning("Abandon and throw exception");
            await messageReceiver.AbandonAsync(message.SystemProperties.LockToken, new Dictionary<string, object> { ["Hello"] = "Abandoned" });
            throw new Exception("test");
        case "deadletter":
            logger.LogWarning("Deadletter and throw exception");
            await messageReceiver.DeadLetterAsync(message.SystemProperties.LockToken, new Dictionary<string, object> { ["Hello"] = "Deadlettered" });
            throw new Exception("test");
        case "timeout":
            logger.LogWarning("Time out operation");
            await Task.Delay(TimeSpan.FromSeconds(120), cancellationToken);
            break;
        case "exception":
            logger.LogWarning("Just throw exception");
            throw new Exception("test");
        case "nothing":
            logger.LogWarning("Do nothing");
            break;
        default:
            logger.LogWarning("Complete successfully");
            await messageReceiver.CompleteAsync(message.SystemProperties.LockToken);
            break;
    }
}

In my host.json file, the functionTimeout and maxAutoRenewDuration are both set to 1 minute.
My installed packages are: Microsoft.Azure.WebJobs.Extensions.ServiceBus 4.3.0, Microsoft.NET.Sdk.Functions 3.0.13.
I'm also using .NET Core 3.1 with the function version in the host.json set to 2.0.
The max delivery count on the service bus queue is set to 3.

I got the following results with the code above:

Action host.json auto complete Attribute auto complete Result
abandon - - My "test" exception + Exception: invalid lock (retried 3x) + Deadlettered
deadletter - - My "test" exception + Exception: invalid lock + Deadlettered
timeout - - Exception: timeout value exceeded (retried 3x) + Deadlettered
exception - - My "test" exception (retried 3x) + Deadlettered
nothing - - Completed
complete - - Completed + exception: invalid lock
abandon - false My "test" exception (retried 3x) + Deadlettered
deadletter - false My "test" exception + Deadlettered
timeout - false Exception: timeout value exceeded (retried 3x) + Deadlettered
exception - false My "test" exception (retried 3x without lock delay) + Deadlettered
nothing - false Retried 3x (over multiple minutes, due to lock) + Deadlettered
complete - false Completed
abandon false - My "test" exception + Exception: invalid lock (retried 3x) + Deadlettered
deadletter false - My "test" exception + Exception: invalid lock + Deadlettered
timeout false - Exception: timeout value exceeded (retried 3x) + Deadlettered
exception false - My "test" exception (retried 3x without lock delay) + Deadlettered
nothing false - Retried 3x (over multiple minutes, due to lock) + Deadlettered
complete false - Completed

The results from both not changing auto complete, and setting auto complete to false through the attribute, are what I mostly expect*. And we know that the host.json setting should do the same as the attribute, but for completion's sake I've put it here as well.

Can you check as well with the same code, and share your results?


*One thing to note, is that with both host.json and the attribute setting to false, throwing an exception without abandoning or deadlettering it yourself, still abandons it (there's no lock delay). I had expected the lock to act the same as doing "nothing".
Which ironically is also something that contradicts the documentation:

If an exception is thrown (and none of the MessageReceiver methods are called), then the lock remains.

Because the lock clearly doesn't seem to remain in my testing.
It doesn't harm me in any way currently, and one can argue that this is a fallback in case someone forgets calling Abandon, but if the docs say that the lock remains, the code should follow that behavior. (Unless I'm misinterpreting the "remain" part..., because in my mind, this means the message is still locked until it expires)

@zamzowd
Copy link

zamzowd commented Aug 20, 2021

I set up a repository to try this out, using your example: https://github.com/zamzowd/service-bus-sans-autocomplete

On manual deadletter followed by an exception, I still get the message that it attempts to abandon the message:

image

Maybe there's some race condition between no longer writing to a log for a given message and the attempt to abandon the message in MessageReceivePump? When I try to manually abandon followed by an exception, we can see this message gets logged after it already started re-processing the abandoned message:

image

Regardless of whether the log includes the error for attempting to abandon the message, I'm pretty confident it still attempts to abandon the message whether the AutoComplete is set by way of ServiceBusTrigger's attribute or the host.json. There doesn't seem to be any code that would make this part behave differently in those scenarios. As seen in my PR, the AbandonMessageIfNeededAsync needs to check for the AutoComplete the same way CompleteMessageIfNeededAsync does. This seems to be the only time MessageHandlerOptions.AutoComplete is checked. (I'm actually now seeing there's other places SessionHandlerOptions.AutoComplete is checked, so I might have to update my PR to make it not attempt to abandon when autocomplete is false in those scenarios as well.)

@ghost
Copy link

ghost commented Aug 23, 2021

I have no idea why it has a different result... I've tried it on two computers now, and I still get no exception using your full repository as code:

I can at least confirm your scenario where it triggers the next delivery before the exception is logged:

However, I think that's expected, since abandoning is done before throwing the exception. So it's undefined which comes first, logging the exception, or the next triggered delivery.

I'll try running it in Azure and see what the results are.

@ghost
Copy link

ghost commented Aug 23, 2021

I just tried it in Azure, and I get consistent results there as well after repeatedly trying a bunch of times.

Running it through WithAutoCompleteAttribute, this is the log for every run:

For verification, WithoutAutoCompleteAttribute (with a different queue name since I already had one):

@ghost
Copy link

ghost commented Mar 9, 2022

As I'll be removing my work account in favor of my main account, I can contacted at @Archomeda from now on for further updates.

@ghost ghost added the Needs: triage 🔍 label Apr 4, 2022
@ghost ghost assigned sidkri Apr 4, 2022
@alrod alrod transferred this issue from Azure/azure-webjobs-sdk Apr 4, 2022
@alrod alrod added the Track2 label Apr 4, 2022
@alrod
Copy link
Member

alrod commented Apr 4, 2022

@alrod alrod closed this as completed Apr 4, 2022
@ghost ghost removed the Needs: triage 🔍 label Apr 4, 2022
@sidkri sidkri removed their assignment Apr 5, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

10 participants