Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lions, tigers, and services being enabled with "precondition failed", oh my! #1565

Merged
merged 5 commits into from
May 31, 2018
Merged

Lions, tigers, and services being enabled with "precondition failed", oh my! #1565

merged 5 commits into from
May 31, 2018

Conversation

sethvargo
Copy link
Contributor

@sethvargo sethvargo commented May 30, 2018

This PR does a few things:

  1. It updates the "list enabled services" API call to be more efficient

  2. It batches services to be enabled 20 at a time. The API fails if you try to enable more than 20 services, and this is documented in the SDK and API. I learned this the hard way. I think Terraform should "do the right thing" here and batch them in series' of twenty, which is what this does. Each batch is tried in serial, but I think making it parallelized is not worth the complexity tradeoffs.

  3. Handle the precondition failed error that occurs randomly. This just started happened, but it affects at least two APIs consistently, and a rudimentary test showed that it failed 78% of the time (78/100 times in an hour). We should fix this upstream, but that failure rate also necessitates (in my opinion) some mitigation on the Terraform side until a fix is in place at the API level.

  4. Use errwrap on errors for better tracing. It was really difficult to trace exactly which error was being throw. That's fixed.

/cc @danawillow @rosbo @paddycarver

Copy link
Contributor

@danawillow danawillow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some tiny wording things, otherwise LGTM. Thanks Seth!

@@ -51,7 +52,7 @@ func resourceGoogleProjectServiceCreate(d *schema.ResourceData, meta interface{}
srv := d.Get("service").(string)

if err = enableService(srv, project, config); err != nil {
return fmt.Errorf("Error enabling service: %s", err)
return errwrap.Wrapf("Error creating service: {{err}}", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be left as "enabling"


return nil
}); err != nil {
return nil, errwrap.Wrapf("failed to enable services: {{err}}", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and then this one should be "failed to list services"

} else if len(services) == 1 {
// Use the singular enable - can't use batch for a single item
name := fmt.Sprintf("projects/%s/services/%s", pid, services[0])
op := &serviceusage.EnableServiceRequest{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can this (and below) be named req instead of op? Op tends to refer to Operation objects

@danawillow
Copy link
Contributor

I don't think this works for #1562- it does a bunch of other fixes, but not retry logic on read (I have one that I'll send out that does it once this is in though)

@rosbo
Copy link
Contributor

rosbo commented May 30, 2018

My bad, I added it because I thought they were related. I removed the linked issue.

sethvargo added 5 commits May 31, 2018 11:10
This removes the custom logic on pagination and uses the built-in Page function in the SDK to make things a bit simpler. Additionally, I added a field filter to only return service names, which drastically reduces the size of the API call (important for slow connections, given how frequently this function is executed).

Also added errwrap to better trace where errors originate.
This just looked really nasty inline
This commit does three things:

1. It batches services to be enabled 20 at a time. The API fails if you try to enable more than 20 services, and this is documented in the SDK and API. I learned this the hard way. I think Terraform should "do the right thing" here and batch them in series' of twenty, which is what this does. Each batch is tried in serial, but I think making it parallelized is not worth the complexity tradeoffs.

2. Handle the precondition failed error that occurs randomly. This just started happened, but it affects at least two APIs consistently, and a rudimentary test showed that it failed 78% of the time (78/100 times in an hour). We should fix this upstream, but that failure rate also necessitates (in my opinion) some mitigation on the Terraform side until a fix is in place at the API level.

3. Use errwrap on errors for better tracing. It was really difficult to trace exactly which error was being throw. That's fixed.
@sethvargo
Copy link
Contributor Author

Okay @danawillow @rosbo updated to fix comments. Let me know if there's anything else (I don't have merge permissions).

@rosbo
Copy link
Contributor

rosbo commented May 31, 2018

LGTM. Thanks Seth!

@rosbo rosbo merged commit 40094ba into hashicorp:master May 31, 2018
@sethvargo sethvargo deleted the sethvargo/services_oh_my branch May 31, 2018 16:58
@ghost
Copy link

ghost commented Nov 18, 2018

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

@ghost ghost locked and limited conversation to collaborators Nov 18, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants