-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't start collector - failed to call webhook - context deadline exceeded #29
Comments
Hi @gbajson-area22, just a guess, but it sounds like maybe the operator pod isn't yet running/ready to serve the webhooks that process the Collector object. Is this by any chance a fresh autopilot cluster? If so, new clusters can take some time to scale up for all pods to be scheduled (including the operator pod). |
Thanks for your response.
This is not fresh cluster and "opentelemetry-operator-controller-manager"
Pod successfully started.There are no errors in Pod's logs.
According to logs, `v1alpha1.OpenTelemetryCollector"` tried to start.
```
{"level":"info","ts":"2023-03-22T15:15:08.833041327Z","msg":"Starting
EventSource","controller":"opentelemetrycollector","controllerGroup":"
opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind
source: *v1alpha1.OpenTelemetryCollector"}
{"level":"info","ts":"2023-03-22T15:15:08.833121793Z","msg":"Starting
EventSource","controller":"opentelemetrycollector","controllerGroup":"
opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind
source: *v1.ConfigMap"}
{"level":"info","ts":"2023-03-22T15:15:08.833165633Z","msg":"Starting
EventSource","controller":"opentelemetrycollector","controllerGroup":"
opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind
source: *v1.ServiceAccount"}
{"level":"info","ts":"2023-03-22T15:15:08.833180179Z","msg":"Starting
EventSource","controller":"opentelemetrycollector","controllerGroup":"
opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind
source: *v1.Service"}
{"level":"info","ts":"2023-03-22T15:15:08.833188249Z","msg":"Starting
EventSource","controller":"opentelemetrycollector","controllerGroup":"
opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind
source: *v1.Deployment"}
{"level":"info","ts":"2023-03-22T15:15:08.833195283Z","msg":"Starting
EventSource","controller":"opentelemetrycollector","controllerGroup":"
opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind
source: *v1.DaemonSet"}
{"level":"info","ts":"2023-03-22T15:15:08.833201818Z","msg":"Starting
EventSource","controller":"opentelemetrycollector","controllerGroup":"
opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind
source: *v1.StatefulSet"}
{"level":"info","ts":"2023-03-22T15:15:08.833209363Z","msg":"Starting
EventSource","controller":"opentelemetrycollector","controllerGroup":"
opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind
source: *v2.HorizontalPodAutoscaler"}
{"level":"info","ts":"2023-03-22T15:15:08.833214042Z","msg":"Starting
Controller","controller":"opentelemetrycollector","controllerGroup":"
opentelemetry.io","controllerKind":"OpenTelemetryCollector"}
{"level":"info","ts":"2023-03-22T15:15:08.993565055Z","logger":"instrumentation-upgrade","msg":"no
instances to upgrade"}
{"level":"info","ts":"2023-03-22T15:15:08.993882768Z","logger":"collector-upgrade","msg":"no
instances to upgrade"}
{"level":"info","ts":"2023-03-22T15:15:09.195660699Z","msg":"Starting
workers","controller":"opentelemetrycollector","controllerGroup":"
opentelemetry.io","controllerKind":"OpenTelemetryCollector","worker
count":1}
```
…On Mon, Mar 27, 2023 at 9:28 PM Mike Dame ***@***.***> wrote:
Hi @gbajson-area22 <https://github.com/gbajson-area22>, just a guess, but
it sounds like maybe the operator pod isn't yet running/ready to serve the
webhooks that process the Collector object. Is this by any chance a fresh
autopilot cluster? If so, new clusters can take some time to scale up for
all pods to be scheduled (including the operator pod).
—
Reply to this email directly, view it on GitHub
<#29 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A37Q7PRN3YONCFQ6XH5M54TW6HS7PANCNFSM6AAAAAAWD7CUSA>
.
You are receiving this because you were mentioned.Message ID:
<GoogleCloudPlatform/opentelemetry-operator-sample/issues/29/1485746137@
github.com>
|
@gbajson-area22 thanks for clarifying. It does look like the operator pod starts up fine and is watching for objects to be created. I found a couple other issues scattered around that are similar to this: open-telemetry/opentelemetry-operator#100 open-telemetry/opentelemetry-operator#1009. I tried reproducing this on a couple new GKE autopilot clusters and wasn't able to unfortunately. Could you confirm that you installed cert-manager using the special helm instructions in https://github.com/GoogleCloudPlatform/opentelemetry-operator-sample#prerequisites? Also, is this a private GKE cluster? You may need to make a firewall rule for the webhook port based on this comment: open-telemetry/opentelemetry-operator#100 (comment) |
Hi Mike,
1. Yes, I installed cert-manager using instructions from
https://github.com/GoogleCloudPlatform/opentelemetry-operator-sample#prerequisites
2. Yes, this is a private GKE cluster. Thanks for this link, I will test it.
…On Tue, Apr 4, 2023 at 5:52 PM Mike Dame ***@***.***> wrote:
@gbajson-area22 <https://github.com/gbajson-area22> thanks for
clarifying. It does look like the operator pod starts up fine and is
watching for objects to be created.
I found a couple other issues scattered around that are similar to this:
open-telemetry/opentelemetry-operator#100
<open-telemetry/opentelemetry-operator#100>
open-telemetry/opentelemetry-operator#1009
<open-telemetry/opentelemetry-operator#1009>. I
tried reproducing this on a couple new GKE autopilot clusters and wasn't
able to unfortunately.
Could you confirm that you installed cert-manager using the special helm
instructions in
https://github.com/GoogleCloudPlatform/opentelemetry-operator-sample#prerequisites
?
Also, is this a private GKE cluster? You may need to make a firewall rule
for the webhook port based on this comment: open-telemetry/opentelemetry-operator#100
(comment)
<open-telemetry/opentelemetry-operator#100 (comment)>
—
Reply to this email directly, view it on GitHub
<#29 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A37Q7PRHCT6YQF5PAW5OCZ3W7Q7VJANCNFSM6AAAAAAWD7CUSA>
.
You are receiving this because you were mentioned.Message ID:
<GoogleCloudPlatform/opentelemetry-operator-sample/issues/29/1496218974@
github.com>
|
Thanks for the quick reply. Please let me know if that helps and if so I'll update our docs to include it |
Hi @gbajson-area22, just checking in if you had a chance to try that yet, did it fix the problem? |
I was able to reproduce this in a private cluster and the fix linked above worked for me. Opened #32 to document this and that will close this issue. |
Thank you!
…On Wed, Apr 12, 2023 at 8:21 PM Mike Dame ***@***.***> wrote:
Closed #29
<#29>
as completed via #32
<#32>
.
—
Reply to this email directly, view it on GitHub
<#29 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A37Q7PS2YQG3DA77IE32PWLXA3XDFANCNFSM6AAAAAAWD7CUSA>
.
You are receiving this because you were mentioned.Message ID:
<GoogleCloudPlatform/opentelemetry-operator-sample/issue/29/issue_event/8988645701
@github.com>
|
This is a problem in GKE Autopilot cluster version
1.25.6-gke.1000
.I follow the guide from https://github.com/GoogleCloudPlatform/opentelemetry-operator-sample#prerequisites.
I installed
cert-manager
and OpenTelemetry Operator.I can't start collector, here is the error:
The text was updated successfully, but these errors were encountered: