Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing handling for HTTP 429 Too Many Requests Response #273

Closed
5 tasks done
jsoref opened this issue Aug 8, 2022 · 27 comments
Closed
5 tasks done

Missing handling for HTTP 429 Too Many Requests Response #273

jsoref opened this issue Aug 8, 2022 · 27 comments
Labels
🪲 bug Something isn't working

Comments

@jsoref
Copy link

jsoref commented Aug 8, 2022

Checklist

  • I have looked into the README and have not found a suitable solution or answer.
  • I have looked into the documentation and have not found a suitable solution or answer.
  • I have searched the issues and have not found a suitable solution or answer.
  • I have searched the Auth0 Community forums and have not found a suitable solution or answer.
  • I agree to the terms within the Auth0 Code of Conduct.

Description

A change in 92a9138 acknowledges that there are rate limits for Auth0's APIs (there also are authentication rate limits). However, it is not really a proper fix, it just means that anyone using terraform may suffer from these failures instead of anyone trying to contribute to this project.

Expectation

  1. When terraform-provider-auth0 makes a request, it should look at the rate limit response headers and store their values.
  2. If a response indicates that it's approaching a rate limit, it should slow down.
  3. If a response is a 429, trigger a full backoff. Wait until an appropriate amount of time has elapsed and retry.
  4. It should be possible to revert 92a9138 and tests should pass, albeit more slowly.

Reproduction

  1. Given a developer (or free) auth0 tenant

  2. Set up a management api

  3. generate terraform json which has the following keys:

      "auth0-yyy-client_id": {
      "auth0-monitoring-client_id": {
      "auth0-monitoring-client_secret": {
      "auth0-xxx_auth0_client_id": {
      "auth0-xxx_auth0_client_id": {
      "auth0-xxx_auth0_client_id": {
      "auth0-xxx_auth0_client_id": {
    "auth0_resource_server": {
    "auth0_user": {
    "auth0_tenant": {
    "auth0_client": {
    "auth0_client_grant": {
    "auth0": {
    
  4. perform a terraform apply

Or somehow create a terraform plan that involves 6-12 tasks which should result in at least 6 requests in under 1 second which is apparently in excess of 2 requests per second with a burst of ??

To review, you can go to the auth0 /logs endpoint and search for client_id:... for the client id from 2. Then for each event, open the url in a new tab and copy out the url and timestamp.

I've provided a sample including a bit of data about the 429:

Timestamp Event
2022-08-08T18:43:14.458Z
2022-08-08T18:43:58.628Z
2022-08-08T18:43:58.896Z
2022-08-08T18:43:58.659Z
2022-08-08T18:43:58.436Z
2022-08-08T18:43:58.598Z
2022-08-08T18:43:58.786Z
2022-08-08T18:43:58.886Z api_limit - Global limit has been reached

Auth0 Terraform Provider version

v0.14.0_x4

Terraform version

0.12.26

@jsoref jsoref added the 🪲 bug Something isn't working label Aug 8, 2022
@sergiught
Copy link
Contributor

sergiught commented Aug 10, 2022

Hey @jsoref 👋🏻 ,

The terraform provider leverages the go-auth0 management SDK to make the API calls and this one already supports handling rate limiting.

Please check: https://github.com/auth0/go-auth0/blob/b4cb1b75332f3af591a0b68ac8c2bcd30dc4ee03/management/management.go#L212.

You can quickly check that this works by running the full test suite against a real tenant using the make test-acc-e2e command as well, as this will generate a lot of calls and the rate limit will get hit, but nothing will fail and requests will continue after the waiting time.

Have you encountered a specific issue with how rate limit is being handled?

@sergiught sergiught added invalid This doesn't seem right question Further information is requested and removed 🪲 bug Something isn't working labels Aug 10, 2022
@jsoref
Copy link
Author

jsoref commented Aug 10, 2022

Yes, it specifically failed miserably. We opened this ticket about the behavior. Hence this ticket with excerpts from the Auth0 logging.

It's reproducible in that each and every time we try to plan+apply for this tenant it fails miserably.

@jsoref
Copy link
Author

jsoref commented Aug 10, 2022

It's possible that I'm misinterpreting the logging and the only failure is really #266.

But, ideally at no point would this tool ever trigger 429 / the Auth0 global rate limit, but it would instead discover limits and time requests so that they don't exceed the limit.

@sergiught
Copy link
Contributor

Could you provide additional information @jsoref 🙏🏻 ? Does the terraform command fail? If yes what's the output?

The Auth0 Logs will report a lot of rate limits getting hit but the provider should handle the requests regardless. What happens is that once we hit a 429, we'll wait before hitting the Management API until the limit rests.

@jsoref
Copy link
Author

jsoref commented Aug 10, 2022

The terraform command fails as in #266.

I filed this because the Auth0 support folks told me to investigate this log entry as it was the only information in the Auth0 logs.

As we don't run multiple terraform instances concurrently and we have an app specific to terraform, the terraform provider should be able to avoid triggering 429s in the first place (by making a request, and ideally getting a success message for its first api call that includes header responses hinting at how many further requests it can make before it would receive a 429) and thus not confuse poor users by us who are then forced to chase down the origin of a 429 reported in the Auth0 logs.

@sergiught
Copy link
Contributor

Appreciate the feedback @jsoref! You're right and indeed we could avoid those 429s completely by reading the X-Ratelimit-*, we'll add this into our backlog to improve the developer experience, however considering functionality isn't actually broken and 429s are handled right now by the go-auth0 SDK by letting them happen and then waiting for the reset, I'll be closing this issue down.

As for #266 we could continue the conversation over there. Please check #266 (comment).

@sergiught
Copy link
Contributor

For transparency I've created an internal backlog ticket to track this work: DXCDT-200.

@yourinium
Copy link

Tried to migrate from an earlier version of the terraform package and I am getting an insane amount of 429 errors...it's super frustrating. No real way to get around it either.

@willvedd
Copy link
Contributor

willvedd commented Aug 8, 2023

@yourinium Thanks for the feedback. Are you observing these rate-limit errors in your tenant logs or is there a malfunction with the Terraform provider itself? Functionally, the provider and your config should manage properly as the the Go SDK handles 429s under the hood. You may see rate limit errors in your tenant logs but they can be safely ignored.

@luislew
Copy link

luislew commented Aug 11, 2023

We have also been seeing 429s in Terraform Cloud for recent changes, e.g.:

╷
│ Error: 429 Too Many Requests: Global limit has been reached
│ 
│   with module.emails.auth0_email_template.enrollment_email,
│   on ../../modules/emails/main.tf line 60, in resource "auth0_email_template" "enrollment_email":
│   60: resource "auth0_email_template" "enrollment_email" {
│ 
╵
Operation failed: failed running terraform apply (exit 1)

and

╷
│ Error: 429 Too Many Requests: Global limit has been reached
│ 
│   with module.rbac.auth0_role_permissions.idt_ff_offline_reader,
│   on ../../modules/rbac/main.tf line 123, in resource "auth0_role_permissions" "idt_ff_offline_reader":
│  123: resource "auth0_role_permissions" "idt_ff_offline_reader" {
│ 
╵

in two separate runs today.

@luislew
Copy link

luislew commented Aug 11, 2023

@willvedd ☝🏻 to provide another example of 429 errors not being handled by the Terraform provider / Go SDK

@yourinium
Copy link

yourinium commented Aug 11, 2023

@willvedd the limits were not just in the tenant but in our pipeline. They all starting showing up when we updated the terraform provider from 0.36 to the latest 1.0.0-beta.1 release.
We never had an issue prior to updating.

@luislew
Copy link

luislew commented Aug 11, 2023

I can confirm that we also recently updated to the 1.0.0-beta.1 release

@willvedd
Copy link
Contributor

Thanks for the helpful feedback, all. The retry mechanism for Auth0's Go SDK changed between v0 and v1 of the provider. It's clearly not persistent enough to ensure reliability in your workflows. I'm going to re-open and make sure we address very soon!

@willvedd willvedd reopened this Aug 14, 2023
@yourinium
Copy link

Appreciate the follow up! Looking forward to a resolution, especially since I only update our dev environment and have to do this 3 more times!!! I managed to get it to work in our dev tenant but it took 3-4hours and lots of tinkering.

@kshvesov
Copy link

Appreciate more robust solution build based on rate limit monitoring https://community.auth0.com/t/how-to-tell-if-you-are-approaching-the-rate-limit/82103 and pauses in terraform rather than increased number of retries from 3 to 12. Failed execution is causing way more issues to us and much more harmful than slower success.

@luislew
Copy link

luislew commented Aug 16, 2023

@kshvesov I made a similar comment on the linked PR: #779 (comment)

@glehmann
Copy link

is there a time frame for the resolution of this bug? We've just switched to auth0 and this bug is breaking quite a lot of our CD runs :-(

@doino-gretchenliev
Copy link

is there a time frame for the resolution of this bug? We've just switched to auth0 and this bug is breaking quite a lot of our CD runs :-(

Hey. That's also our case. We contacted the support and they said they are working on it. No ETA though.

@yourinium
Copy link

Same here, makes it a night mare to push our higher env CD runs... is there a lower version we can downgrade to? When did that change occur? or is there a recommendation on how to get around this other than brute force?

@sergiught
Copy link
Contributor

Hey folks 👋🏻 ,

We sincerely apologize for the delay in our response. We fully empathize with the frustration you must be experiencing due to the rate limit issue, especially when dealing with larger config updates.

As you correctly pointed out, increasing the number of retries will not provide a viable solution; it will only exacerbate the problem.

Regrettably, our current capacity limitations have prevented us from implementing a comprehensive fix at this moment. Nevertheless, we do have some workarounds that should provide immediate relief:

  1. When executing terraform apply, we recommend using the -parallelism=1 option. By default, terraform apply runs with a parallelism setting of 10, which can trigger the rate limit much sooner. While this may extend the operation duration, it will prevent further failures. (example: terraform apply -parallelism=1)
  2. If possible, consider breaking down your tf config into smaller segments for application.

Please rest assured that our team is fully committed to addressing this issue promptly. We are currently regrouping and will prioritize resolving this matter asap ⚡ .

Thank you for your patience and understanding 🙏🏻

Further updates will follow the next days.

@yourinium
Copy link

Thank you for the update and the commitment to getting this fixed!

@sergiught
Copy link
Contributor

Hey folks 👋🏻 ,

An update from our side, I have a fix for the rate limit issues within #788, while the code change has been approved, I'm still in the process of running some more load tests against it to ensure it's the right approach. As soon as we can confirm everything we'll make sure to make it available for you folks in a v1.0.0-beta.2 version.

@sergiught sergiught added 🪲 bug Something isn't working and removed invalid This doesn't seem right question Further information is requested labels Aug 25, 2023
@sergiught
Copy link
Contributor

Hey folks 👋🏻

We just released https://registry.terraform.io/providers/auth0/auth0/1.0.0-beta.2 with a fix for the rate limit retry issues. Please give it a go and let us know if you encounter any other issues 🙏🏻 .

Appreciate everyones patiences with the resolution!

@glehmann
Copy link

@sergiught no problem for us so far with this version. Thanks for the fix!

@sergiught
Copy link
Contributor

Awesome to hear @glehmann 🙌🏻 thanks for checking! I'll proceed to close this issue down then.

@yourinium
Copy link

yourinium commented Aug 29, 2023

Same for us!! Thank you for getting this out so quick!! @sergiught

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🪲 bug Something isn't working
Projects
None yet
Development

No branches or pull requests

8 participants