-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add rate limit for calls to Buildkite's REST API #477
Comments
Hey @rjackson90 👋 Thanks for bringing this up and that proposal from the Terraform folks is certainly an interesting read! With our provider; the only calls to the REST API that are made are through test suiteand pipeline (for The same premise will also apply to the GraphQL API albeit on complexity point calculation; not RPM as implemented on the REST side - the majority of API requests on the provider being on the former. That said, we'll see what is potential with our provider! |
Hey @rjackson90! Could you share a stripped out version of your config? Are you using the timeouts {
read = 60m
} |
@mcncl We don't have a timeouts {} block on our Given that we have around 250 pipelines and our rate limit as of 3/31 will be 200, would you think a read timout of 90s would be sufficient? |
I just tried upgrading our terraform provider to your latest version, and added a timeout block to my
|
Hey @BoraxTheClean! Apologies for the confusion, the provider "buildkite" {
organization = "buildkite"
timeouts = {
read = "90s"
}
} |
I've fully set this up on my end, I'll let you know if I have any issues after the 3/31 rate limit drops! |
I didn't have issues right after 3/31, but I am now getting rate limit issues. I bumped the timeouts to be 600s across the board. My plan of about 440 resources, is tripping the rate limit after 7 minutes of clock time.
This makes my project virtually unplannable without targeting specific resources. |
Hey @BoraxTheClean! Yeah, that rate limit will be due to the amount of resources being created/planned. The only solution at the moment would be to use the resource "buildkite_cluster" "test" {
name = "Primary Cluster"
description = "Runs the monolith build and deploy"
emoji = "🚀"
color = "#bada55"
}
resource "buildkite_cluster_queue" "default" {
cluster_id = buildkite_cluster.primary.id
key = "macos"
} terraform plan -target=buildkite_cluster.test -target=buildkite_cluster_queue.default There is an open issue on terraform to accept globs, but it's 9 years old now so I don't think an alternative solution is going to be available any time soon. |
Hello! I ended up here by following a backlink from hashicorp/terraform#31094, where I was discussing the general problem of rate limits and some specific strategies that the Azure provider might follow. (Also 👋 Hello Buildkite folks! I used to be a happy customer at my old employer, many years ago, and wrote my own Terraform provider while there. I archived it today after learning that this official one exists!) The main reason I'm here is that I wanted to try to clarify how I'd interpret the general idea I was discussing in the other issue for Buildkite's API design in particular. I see that the provider is primarily using the GraphQL API rather than the REST API, and so I assume that it's the GraphQL resource limits that are in play here, rather than the REST API limits. I'm referring to Accessing limit details and understand from this that Buildkite uses a time-window-based rate limit strategy, in which case the only two options for reacting to limit exhaustion would be to either fail with an error -- which I assume is how the provider is currently behaving, based on this issue -- or to sleep for If this provider were to switch to the second strategy of sleeping until the limit resets, I guess that would mean a worst-case delay of five minutes, which is quite a long time by typical Terraform request standards, but probably still better than an immediate hard failure. Would you agree? If so, I wonder what you think about altering the provider's API client to notice when the response is a rate limit error, and to sleep until the limit resets and then retry. Does that seem feasible? (This provider also seems like it would benefit a lot from a means for Terraform to batch together provider requests so that the provider can perform fewer total GraphQL queries, which is something I've wanted to enable for a long time but is tricky with Terraform's execution model. However, since the rate limit model for GraphQL is measured in complexity points rather than as a request count I assume batch requests would only potentially help with performance -- performing fewer API round-trips -- and would not help to avoid hitting the rate limits.) I hope this is helpful! I am of course not trying to compel you to do anything in particular with this issue, but since much of the discussion about rate limits so far was focused on Microsoft Azure I was curious to see what it might look like to handle this for a different API with a different rate limit strategy. |
Hello there, we've also started running into this issue when trying to apply changes to the Buildkite pipelines for one of our Terraform monorepos. Unfortunately, using the In line with what @apparentlymart said, the AWS Go SDK (used by the AWS Terraform provider) automatically handles HTTP 429 responses (among other errors) by waiting and retrying with an exponential backoff algorithm. It would be great if the Buildkite provider implemented similar functionality, even if much simpler, as currently the provider is unusable for larger Terraform configurations. |
I just want to second this. Targeting is a partial solution for us, but having to do that is not compatible with our tooling, so it creates more operational burden. It would be ideal and expected for the provider to handle the rate limits for us. |
@bill-scalapay is this the exponential backoff retry that you're referring to: https://github.com/hashicorp/terraform-provider-aws/tree/main/internal/retry? |
@mcncl The doc I read mentioned that the AWS Go SDK automatically handles retries for certain error codes, but maybe they also implement something in the provider. Here's the doc I read: https://github.com/hashicorp/terraform-provider-aws/blob/main/docs/retries-and-waiters.md#default-aws-go-sdk-retries And here's what looks like the retry code from the AWS Go SDK: EDIT: The |
The AWS provider's handling of this is rather complicated because it has to deal with the huge variety of different strategies across many different AWS services. So while I don't mean to say it isn't an interesting reference, I would hope that something for the Buildkite API could be considerably simpler because there is exactly one well-defined rate limiting model. In particular, I wouldn't expect exponential backoff to be needed here because the API already indicates how long the client should wait for the rate limit to have been reset. One potential way to complicate it would be to write the "requested complexity" rules as code and have the provider notice that a particular request is likely to breach the limit and do some clever request prioritization like prefering smaller requests over larger ones, but it's not clear to me that this would have any significant advantage over just trying requests as soon as they become pending, noticing a rate limit error, and then waiting There is of course some risk here that tightly coupling the provider behavior to the current rate limit model would make it harder to evolve the rate limit model in future, but I trust that you all know better than I do how likely it is that the Buildkite GraphQL API would adopt a different model in future. 😀 |
My organization uses Buildkite quite extensively, and we have hundreds of pipelines. Last November, Buildkite announced a new policy of rate-limiting requests against the REST API to 200 requests/minute, as documented here. Our use of this provider to manage Buildkite resources causes us to exceed this rate limit. We've looked at re-organizing our terraform configuration to reduce the number of resources affected by any particular plan operation, but there's only so much we can do without arbitrarily carving up our configuration in awkward ways.
I would like to request that the terraform provider add support for this rate-limit. As a user, I want the provider to comply with Buildkite's policies for API usage so that I don't have to worry about accidentally exceeding limits.
In researching this issue, I found a proposal for core Terraform that has an interesting discussion on this kind of problem. In that issue discussion, the participants appear to agree that rate limits are best handled by the providers in accordance with the policies of the underlying service.
The text was updated successfully, but these errors were encountered: