[SDK-1975] Use exponential backoff rather than rate limit headers #538

adamjmcgrath · 2020-09-29T11:02:00Z

Changes

Currently RetryRestClient uses the X-RateLimit-Reset header to calculate retry duration, eg if you hit a per minute rate limit, the backoff for 10 retries will be: [60s, 60s, ..., 60s, 60s]

But the X-RateLimit-Reset only tells when the bucket of available requests is full - not when there might be a request available in the bucket to use. So 60s will likely be too long to wait, and cause things like Webtasks to fail.

Instead we are switching to a generic exponential backoff with jitter, starting at ~1sec, eg for 10 retries: [ 1-2s, 2-4s, 4-8s, 8-16s, 16-32s, 32-64s, 64-128s, 128-256s, 256-512s, 512-1024s ]

Calculated by (1 + Math.random()) * 1sec * Math.pow(2, attemptNumber) - happy to tweak these numbers if we want

The SDK already uses retry - so we can leverage the default behaviour of this library

References

https://auth0team.atlassian.net/browse/ESD-8390
https://auth0.com/docs/policies/rate-limit-policy#handle-rates-limitations-in-code
https://auth0.com/docs/policies/rate-limit-policy/management-api-endpoint-rate-limits
https://github.com/tim-kos/node-retry

Testing

This change adds unit test coverage
~~- [ ] This change adds integration test coverage~~

joshcanhelp

Nice and simple, love it!

It might be nice to allow all the retry options to be passed in as an object. Not necessary but could make it easier to adjust based on what the time/rate limits are.

One thing that came to mind ... we're removing the reliance on the headers, which means we're ignoring info we have that mind make us skip retry altogether. The management API limit might be hit within, say, 10 seconds and not have anything available for another 50. So we might try 4 additional when we already know that we'll hit a 429 for 50 more seconds.

It might be that this kind of logic would complicate things and we'd, at best, only save ~4 API calls. Just wanted to bring it up.

src/RetryRestClient.js

adamjmcgrath · 2020-09-29T14:45:17Z

Thanks @joshcanhelp!

The management API limit might be hit within, say, 10 seconds and not have anything available for another 50. So we might try 4 additional when we already know that we'll hit a 429 for 50 more seconds.

Looking at that management api rate limit doc, it says:

for the given bucket, there is a maximum request limit of x per minute, and for each minute that elapses, permissions for y requests are added back. In other words, for each 60/y seconds, one additional request is added to the bucket.

Which tells me that requests are added back to the bucket at a rate of 60/per_minute_rate_limit per second, so when you say "hit within, say, 10 seconds and not have anything available for another 50" - I would expect only an endpoint that was limited at '1 per minute' would behave like this - and I don't see any endpoints like that (except possibly the signing key rotation one)

Although, looking at that doc, I wonder if I should make the minTimeout500ms rather than 1000ms

adamjmcgrath · 2020-09-29T15:01:29Z

It might be nice to allow all the retry options to be passed in as an object.

Good idea 👍

joshcanhelp · 2020-09-29T15:49:34Z

@adamjmcgrath - Ah, I see, it's a rolling check ... makes more sense than "sorry, this API won't work for another minute" 😆 thank god I'm not in charge of setting those limits!

In that case, with access to all the vars, this looks great! I'll bump the version in our Rule utility library and update the docs once it's ready.

jimmyjames · 2020-10-01T14:49:00Z

@adamjmcgrath looks like the codecov task is failing due to a decrease in coverage. Is that something that should be addressed in this PR, or do we need to re-evaluate our coverage requirements?

@davidpatrick will be good for you to review this just to ensure there's no risk here that I can't see. It looks good to me 👍

adamjmcgrath · 2020-10-01T16:35:00Z

@adamjmcgrath looks like the codecov task is failing due to a decrease in coverage.

Thanks @jimmyjames - it's the project coverage threshold, we should probably ignore them. I deleted a bunch of code that had 100% coverage so the overall level has decreased. The only way to make that green would be to go and find some unrelated uncovered code and write some tests for it (which I'm happy to do, just not in this pr)

@davidpatrick will be good for you to review this just to ensure there's no risk here that I can't see. It looks good to me 👍

I'm going to get someone from #iam-core-foundations to take a look as well, I just haven't gotten around to pinging them yet

# Conflicts: # src/RetryRestClient.js

adamjmcgrath · 2020-10-07T14:50:19Z

@jimmyjames @davidpatrick #iam-core-foundation has no issues so if one of you could approve, I'll go ahead and merge

Use exponential backoff rather than rate limit headers

e17ba1c

adamjmcgrath added the CH: Changed label Sep 29, 2020

adamjmcgrath requested review from joshcanhelp, hzalaz, santiagoaguiar and a team September 29, 2020 11:02

joshcanhelp reviewed Sep 29, 2020

View reviewed changes

src/RetryRestClient.js Outdated Show resolved Hide resolved

src/RetryRestClient.js Outdated Show resolved Hide resolved

Let some debugging stuff in there

10f338a

Allow setting node-retry options

3930fbd

adamjmcgrath added 2 commits October 6, 2020 11:34

10 retries was thought to be too many for a default

9892ec4

Merge branch 'master' into retry

afb6c3a

# Conflicts: # src/RetryRestClient.js

davidpatrick added this to the v2Next milestone Oct 14, 2020

Merge branch 'master' into retry

31c73c3

davidpatrick approved these changes Oct 14, 2020

View reviewed changes

adamjmcgrath merged commit cf2a1e5 into master Oct 14, 2020

adamjmcgrath deleted the retry branch October 14, 2020 16:08

davidpatrick mentioned this pull request Oct 22, 2020

Release v2.30.0 #544

Merged

jimmyjames mentioned this pull request Jul 28, 2021

Fix docs for rate limit default maxRetries #640

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SDK-1975] Use exponential backoff rather than rate limit headers #538

[SDK-1975] Use exponential backoff rather than rate limit headers #538

adamjmcgrath commented Sep 29, 2020 •

edited

Loading

joshcanhelp left a comment

adamjmcgrath commented Sep 29, 2020 •

edited

Loading

adamjmcgrath commented Sep 29, 2020

joshcanhelp commented Sep 29, 2020

jimmyjames commented Oct 1, 2020

adamjmcgrath commented Oct 1, 2020

adamjmcgrath commented Oct 7, 2020

[SDK-1975] Use exponential backoff rather than rate limit headers #538

[SDK-1975] Use exponential backoff rather than rate limit headers #538

Conversation

adamjmcgrath commented Sep 29, 2020 • edited Loading

Changes

References

Testing

joshcanhelp left a comment

Choose a reason for hiding this comment

adamjmcgrath commented Sep 29, 2020 • edited Loading

adamjmcgrath commented Sep 29, 2020

joshcanhelp commented Sep 29, 2020

jimmyjames commented Oct 1, 2020

adamjmcgrath commented Oct 1, 2020

adamjmcgrath commented Oct 7, 2020

adamjmcgrath commented Sep 29, 2020 •

edited

Loading

adamjmcgrath commented Sep 29, 2020 •

edited

Loading