Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't start collector - failed to call webhook - context deadline exceeded #29

Closed
gbajson-area22 opened this issue Mar 22, 2023 · 8 comments · Fixed by #32
Closed

Can't start collector - failed to call webhook - context deadline exceeded #29

gbajson-area22 opened this issue Mar 22, 2023 · 8 comments · Fixed by #32
Assignees
Labels
enhancement New feature or request priority: p2

Comments

@gbajson-area22
Copy link

This is a problem in GKE Autopilot cluster version 1.25.6-gke.1000.

I follow the guide from https://github.com/GoogleCloudPlatform/opentelemetry-operator-sample#prerequisites.

I installed cert-manager and OpenTelemetry Operator.
I can't start collector, here is the error:

$ kubectl apply -f collector-config.yaml 
Error from server (InternalError): error when creating "collector-config.yaml": Internal error occurred: failed calling webhook "mopentelemetrycollector.kb.io": failed to call webhook: Post "https://opentelemetry-operator-webhook-service.opentelemetry-operator-system.svc:443/mutate-opentelemetry-io-v1alpha1-opentelemetrycollector?timeout=10s": context deadline exceeded
@gbajson-area22 gbajson-area22 changed the title Can't install collector Can't install collector - failed to call webhook - context deadline exceeded Mar 22, 2023
@gbajson-area22 gbajson-area22 changed the title Can't install collector - failed to call webhook - context deadline exceeded Can't start collector - failed to call webhook - context deadline exceeded Mar 22, 2023
@damemi damemi added enhancement New feature or request priority: p2 labels Mar 27, 2023
@damemi
Copy link
Contributor

damemi commented Mar 27, 2023

Hi @gbajson-area22, just a guess, but it sounds like maybe the operator pod isn't yet running/ready to serve the webhooks that process the Collector object. Is this by any chance a fresh autopilot cluster? If so, new clusters can take some time to scale up for all pods to be scheduled (including the operator pod).

@gbajson-area22
Copy link
Author

gbajson-area22 commented Mar 28, 2023 via email

@damemi
Copy link
Contributor

damemi commented Apr 4, 2023

@gbajson-area22 thanks for clarifying. It does look like the operator pod starts up fine and is watching for objects to be created.

I found a couple other issues scattered around that are similar to this: open-telemetry/opentelemetry-operator#100 open-telemetry/opentelemetry-operator#1009. I tried reproducing this on a couple new GKE autopilot clusters and wasn't able to unfortunately.

Could you confirm that you installed cert-manager using the special helm instructions in https://github.com/GoogleCloudPlatform/opentelemetry-operator-sample#prerequisites?

Also, is this a private GKE cluster? You may need to make a firewall rule for the webhook port based on this comment: open-telemetry/opentelemetry-operator#100 (comment)

@gbajson-area22
Copy link
Author

gbajson-area22 commented Apr 4, 2023 via email

@damemi
Copy link
Contributor

damemi commented Apr 4, 2023

Thanks for the quick reply. Please let me know if that helps and if so I'll update our docs to include it

@damemi
Copy link
Contributor

damemi commented Apr 10, 2023

Hi @gbajson-area22, just checking in if you had a chance to try that yet, did it fix the problem?

@damemi
Copy link
Contributor

damemi commented Apr 12, 2023

I was able to reproduce this in a private cluster and the fix linked above worked for me. Opened #32 to document this and that will close this issue.

@gbajson-area22
Copy link
Author

gbajson-area22 commented Apr 14, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request priority: p2
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants