Expose current retry count in the context of the function #2595

jeffhollan · 2020-09-30T20:55:21Z

Per the new retry feature rolling out, one scenario that is desired is the ability to deadletter / capture something that is on its final retry. For example, if I have an event hub triggered function and define a retry policy of 5, on the 5th retry if it fails I want to catch that failure and store it in a deadletter queue or something so I can go inspect it later. However, today the context for the current retry count isn't surfaced directly, and using something like a local variable to increment may be tricky when multiple executions could be executing / retrying on the same host.

Proposal is that there is a new context passed into the ExecutionContext that includes information on the retry.

try
{
}
catch()
{
   if(context.retries.count >= context.retries.max)
   { 
     // deadletter
   }
   throw exception;
}

// @pragnagopa @fabiocav @mathewc

The text was updated successfully, but these errors were encountered:

jeffhollan · 2020-09-30T20:56:27Z

And this would need to be piped into Java, Python, C#, PowerShell, and JavaScript context objects - not sure if additional work needed on the language libraries to support this

pragnagopa · 2020-09-30T21:49:48Z

I agree retrycount is useful to include in execution context. Once this change is pulled into functions host --> Rpc layer needs to be updated to include this info. Language workers need to be updated to consume change in Rpc.

jeffhollan · 2020-09-30T22:05:52Z

FWIW I think just the retry count would be good, retry count + defined maximum slightly better, but best case we pass in the entire RetryContext with info on the exception that was hit last

mathewc · 2020-10-01T01:09:46Z

Yes, we might surface RetryContext on ExecutionContext. We'd just have to add MaxRetries to RetryContext - it's not there now. Not sure yet if we should surface that raw type, or have a new type with the members needed. E.g. RetryContext currently has an IFunctionInstance member on it that we might not want to expose here. Need to think about it more.

Another important thing - we need to be sure that users understand that there may actually be 2 levels of retries going on. In addition to the retries we're talking about (e.g. retry a single queue message dispatch N times), the queue trigger binding itself has a MaxDequeueCount which is the second level of retries. This is only an issue for triggers with retries already built in (Queue/ServiceBus). So applying this to your example above, you'd need to check both RetryCount as well as DequeueCount :)

jeffhollan · 2020-11-04T16:41:31Z

Moving comment from @casper-79 here: Azure/azure-functions-java-library#132 (comment)

I have tested the retry functionality today and seen the exponential retry strategy in action. I am also seeing some very strange behaviour, however. As I understand the documentation the retry strategy is implemented on the function instance itself, rather than storing the delivery state on the queue. I am seeing what I believe is side effects of this approach. My experiments center around submitting poisonous messages that will always fail onto a queue consumed by a Java azure function. The function uses a retry strategy defined in host.json as seen below:

"retry":{
"strategy":"exponentialBackoff",
"maxRetryCount":6,
"minimumInterval":"00:00:10",
"maximumInterval":"00:05:00"
}
(1) Processing of poisonous messages does not always show up in the Application Insights and the "monitor" section of Azure functions. When I use the Azure portal to peek look at test messages I can tell DeliveryCount has gone up by 1, but more often than not there is no trace of the failed execution that increased the counter.

(2) Azure function instances are short lived, thereby affecting the useful range of the parameters in the retry configuration parameters. Can you provide guidance on what will work in practice? I am guessing you will run into problems if you set maximumInterval to 24 hours and retryCount to 30 in my host.json?

(3) What is the recommended approach for dead lettering? The only solution I can think of is to set maxDeliveryCount=1 on the queue, but this will only work if all retry attempts of your strategy can be be performed within the typical lifetime of an instance. Otherwise, I guess the message will be retried for ever.

jeffhollan · 2020-11-04T16:48:24Z

(we may need to break out a few of these other issues but replying in bulk here):

One thing worth noting is that the host retry policy layers on top of the trigger source - it doesn't know anything about the trigger source directly. So for Event Hubs this is easy because Event Hubs itself has no retry policy so this is the only one. However, queues is a bit more interesting. Let's imagine you have a service bus queue, you have a retry policy on the queue (from service bus) of 5 attempts. You then define a function app retry policy on top of it of 5 attempts. What will happen is:

Pulls the queue message from service bus (attempt 1 from service bus)
Fails 5 times on the host (service bus knows nothing about these 5 attempts, only the function host does
After the 5th time, it abondons the message and now service bus knows it's failed eventually, and then requeues it
Pulls the queue message from service bus (attempt 2 from service bus)
etc. etc. until eventually this message is attempted 25 times, at which point service bus will deadletter.

So when you mention in one you are looking at the DeliveryCount, the delivery count between host retries will be the same on the queue message. it's the same queue message being retried.

You are right on #2 that if you have long delays on something like a timer trigger, it's possible you'll get scaled down. There is some logic in the scale controller to look at queue length and execution logs, we are investigating it to create clear guidance but in general I'd say on the consumption plan you likely don't want a delay of longer than 5ish minutes (10 may be safe too, longer than 10 you are likely at risk of scale down). I don't believe retry attempts will matter, but I'll try to test this out as well.

For the third point, that's where this issue sits. Given the above, I'd say you likely don't want to do maxDeliveryCount of 1 just in case something happens. We are planning to pass in the context so you could have logic like the original comment in this issue that could help as well.

casper-79 · 2020-11-04T17:21:48Z

Thanks @jeffhollan

Your description matches my own initial expectations, but even though i looked for signs of step 4/5 I have yet to observe this behaviour. It seems failed messages are just stuck on the queue after the first failure in step 3.

When is step 4 supposed to take place? Up until now retries at the queue level have happened immediately, so this is what I have been looking for.

Best regards,

Casper

…e#2595)

… (#2658)

alrod · 2021-02-03T20:42:22Z

merged #2658

ghost · 2021-04-27T19:42:16Z

Is this something that has been implemented in C# yet? If so, is there a link for documentation or some kind of resource for it? I see NodeJS, Java, and Powershell so far. We are trying to implement some final retry logic but have been unsuccessful through what we've found here and the links within. We currently have a private static retryCount = 0 and increment it on every retry and set to 0 on final retry or success. Which, I believe makes it somewhat stateful in a REST environment since it holds the value between Retries and calls.

fabiocav · 2021-04-27T20:21:31Z

@LockpickingDev this should be available for C# in-proc. Is that what you're currently using?

ghost · 2021-04-29T16:42:49Z

All we're doing is taking in a JSON request from an external source and storing it. Theres really not much to it. After reading, it sounds like that's not in-proc though.

ghost added the Needs: Triage (Functions) label Sep 30, 2020

ghost assigned fabiocav Sep 30, 2020

jeffhollan added this to the Triaged milestone Sep 30, 2020

pragnagopa removed the Needs: Triage (Functions) label Sep 30, 2020

pragnagopa assigned pragnagopa and unassigned fabiocav Sep 30, 2020

jeffhollan mentioned this issue Nov 4, 2020

Add Retry annotation Azure/azure-functions-java-library#132

Closed

pragnagopa mentioned this issue Nov 12, 2020

Add Retry annotation Azure/azure-functions-java-library#139

Merged

fabiocav assigned alrod and unassigned pragnagopa Jan 6, 2021

fabiocav modified the milestones: Triaged, Functions Sprint 93 Jan 6, 2021

fabiocav added the enhancement label Jan 6, 2021

alrod added a commit to alrod/azure-webjobs-sdk that referenced this issue Jan 9, 2021

Expose current retry count in the context of the function (Issue Azur…

cde0db2

…e#2595)

pragnagopa mentioned this issue Jan 11, 2021

Expose current retry count in the context of the function (Issue #2595) #2658

Merged

alrod added a commit to alrod/azure-webjobs-sdk that referenced this issue Jan 13, 2021

Expose current retry count in the context of the function (Issue Azur…

6407f91

…e#2595)

alrod added a commit that referenced this issue Jan 21, 2021

Expose current retry count in the context of the function (Issue #2595)…

a3e31ba

… (#2658)

pragnagopa mentioned this issue Feb 1, 2021

Add support for RetryContext Azure/azure-functions-python-worker#814

Closed

alrod closed this as completed Feb 3, 2021

This was referenced Feb 4, 2021

Add support for RetryContext Azure/azure-functions-nodejs-worker#373

Closed

Add support for RetryContext Azure/azure-functions-java-worker#418

Closed

Add support for RetryContext Azure/azure-functions-powershell-worker#594

Closed

ZachTB123 mentioned this issue Mar 6, 2021

Expose current retry count to Custom Handlers Azure/azure-functions-host#7212

Closed

shreyas-gopalakrishna mentioned this issue Dec 9, 2021

Add RetryContext as part of ExecutionContext Azure/azure-functions-java-library#160

Merged

This was referenced Jan 4, 2022

Added retry context as part of execution context Azure/azure-functions-java-worker#500

Merged

Added retry context as part of execution context Azure/azure-functions-java-worker#501

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose current retry count in the context of the function #2595

Expose current retry count in the context of the function #2595

jeffhollan commented Sep 30, 2020 •

edited

Loading

jeffhollan commented Sep 30, 2020

pragnagopa commented Sep 30, 2020 •

edited

Loading

jeffhollan commented Sep 30, 2020

mathewc commented Oct 1, 2020 •

edited

Loading

jeffhollan commented Nov 4, 2020

jeffhollan commented Nov 4, 2020

casper-79 commented Nov 4, 2020

alrod commented Feb 3, 2021

ghost commented Apr 27, 2021 •

edited by ghost

Loading

fabiocav commented Apr 27, 2021

ghost commented Apr 29, 2021

Expose current retry count in the context of the function #2595

Expose current retry count in the context of the function #2595

Comments

jeffhollan commented Sep 30, 2020 • edited Loading

jeffhollan commented Sep 30, 2020

pragnagopa commented Sep 30, 2020 • edited Loading

jeffhollan commented Sep 30, 2020

mathewc commented Oct 1, 2020 • edited Loading

jeffhollan commented Nov 4, 2020

jeffhollan commented Nov 4, 2020

casper-79 commented Nov 4, 2020

alrod commented Feb 3, 2021

ghost commented Apr 27, 2021 • edited by ghost Loading

fabiocav commented Apr 27, 2021

ghost commented Apr 29, 2021

jeffhollan commented Sep 30, 2020 •

edited

Loading

pragnagopa commented Sep 30, 2020 •

edited

Loading

mathewc commented Oct 1, 2020 •

edited

Loading

ghost commented Apr 27, 2021 •

edited by ghost

Loading