-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lions, tigers, and services being enabled with "precondition failed", oh my! #1565
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some tiny wording things, otherwise LGTM. Thanks Seth!
@@ -51,7 +52,7 @@ func resourceGoogleProjectServiceCreate(d *schema.ResourceData, meta interface{} | |||
srv := d.Get("service").(string) | |||
|
|||
if err = enableService(srv, project, config); err != nil { | |||
return fmt.Errorf("Error enabling service: %s", err) | |||
return errwrap.Wrapf("Error creating service: {{err}}", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be left as "enabling"
|
||
return nil | ||
}); err != nil { | ||
return nil, errwrap.Wrapf("failed to enable services: {{err}}", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and then this one should be "failed to list services"
} else if len(services) == 1 { | ||
// Use the singular enable - can't use batch for a single item | ||
name := fmt.Sprintf("projects/%s/services/%s", pid, services[0]) | ||
op := &serviceusage.EnableServiceRequest{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can this (and below) be named req instead of op? Op tends to refer to Operation objects
I don't think this works for #1562- it does a bunch of other fixes, but not retry logic on read (I have one that I'll send out that does it once this is in though) |
My bad, I added it because I thought they were related. I removed the linked issue. |
This removes the custom logic on pagination and uses the built-in Page function in the SDK to make things a bit simpler. Additionally, I added a field filter to only return service names, which drastically reduces the size of the API call (important for slow connections, given how frequently this function is executed). Also added errwrap to better trace where errors originate.
This just looked really nasty inline
This commit does three things: 1. It batches services to be enabled 20 at a time. The API fails if you try to enable more than 20 services, and this is documented in the SDK and API. I learned this the hard way. I think Terraform should "do the right thing" here and batch them in series' of twenty, which is what this does. Each batch is tried in serial, but I think making it parallelized is not worth the complexity tradeoffs. 2. Handle the precondition failed error that occurs randomly. This just started happened, but it affects at least two APIs consistently, and a rudimentary test showed that it failed 78% of the time (78/100 times in an hour). We should fix this upstream, but that failure rate also necessitates (in my opinion) some mitigation on the Terraform side until a fix is in place at the API level. 3. Use errwrap on errors for better tracing. It was really difficult to trace exactly which error was being throw. That's fixed.
Okay @danawillow @rosbo updated to fix comments. Let me know if there's anything else (I don't have merge permissions). |
LGTM. Thanks Seth! |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks! |
This PR does a few things:
It updates the "list enabled services" API call to be more efficient
It batches services to be enabled 20 at a time. The API fails if you try to enable more than 20 services, and this is documented in the SDK and API. I learned this the hard way. I think Terraform should "do the right thing" here and batch them in series' of twenty, which is what this does. Each batch is tried in serial, but I think making it parallelized is not worth the complexity tradeoffs.
Handle the precondition failed error that occurs randomly. This just started happened, but it affects at least two APIs consistently, and a rudimentary test showed that it failed 78% of the time (78/100 times in an hour). We should fix this upstream, but that failure rate also necessitates (in my opinion) some mitigation on the Terraform side until a fix is in place at the API level.
Use errwrap on errors for better tracing. It was really difficult to trace exactly which error was being throw. That's fixed.
/cc @danawillow @rosbo @paddycarver