-
Notifications
You must be signed in to change notification settings - Fork 452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot create Collector : Webhook deadline exceeded #100
Comments
Do you have the |
Hi @jpkrohling, Yes it is installed and running.
Here are some logs from the cert-manager:
|
Could you try killing the pod? Is it possible that the operator deployment started before the cert-manager was ready? Are you able to consistently reproduce in $ kubectl apply --validate=false -f https://github.com/jetstack/cert-manager/releases/download/v0.16.1/cert-manager.yaml
customresourcedefinition.apiextensions.k8s.io/certificaterequests.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/certificates.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/challenges.acme.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/clusterissuers.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/issuers.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/orders.acme.cert-manager.io created
namespace/cert-manager created
serviceaccount/cert-manager-cainjector created
serviceaccount/cert-manager created
serviceaccount/cert-manager-webhook created
clusterrole.rbac.authorization.k8s.io/cert-manager-cainjector created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-issuers created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-clusterissuers created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-certificates created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-orders created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-challenges created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-ingress-shim created
clusterrole.rbac.authorization.k8s.io/cert-manager-view created
clusterrole.rbac.authorization.k8s.io/cert-manager-edit created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-cainjector created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-issuers created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-clusterissuers created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-certificates created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-orders created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-challenges created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-ingress-shim created
role.rbac.authorization.k8s.io/cert-manager-cainjector:leaderelection created
role.rbac.authorization.k8s.io/cert-manager:leaderelection created
role.rbac.authorization.k8s.io/cert-manager-webhook:dynamic-serving created
rolebinding.rbac.authorization.k8s.io/cert-manager-cainjector:leaderelection created
rolebinding.rbac.authorization.k8s.io/cert-manager:leaderelection created
rolebinding.rbac.authorization.k8s.io/cert-manager-webhook:dynamic-serving created
service/cert-manager created
service/cert-manager-webhook created
deployment.apps/cert-manager-cainjector created
deployment.apps/cert-manager created
deployment.apps/cert-manager-webhook created
mutatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook created
validatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook created
$ kubectl get pods -n cert-manager
cert-manager-cainjector-fc6c787db-lqdlb 1/1 Running 0 27s
$ kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml
namespace/opentelemetry-operator-system created
customresourcedefinition.apiextensions.k8s.io/opentelemetrycollectors.opentelemetry.io created
role.rbac.authorization.k8s.io/opentelemetry-operator-leader-election-role created
clusterrole.rbac.authorization.k8s.io/opentelemetry-operator-manager-role created
clusterrole.rbac.authorization.k8s.io/opentelemetry-operator-proxy-role created
clusterrole.rbac.authorization.k8s.io/opentelemetry-operator-metrics-reader created
rolebinding.rbac.authorization.k8s.io/opentelemetry-operator-leader-election-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/opentelemetry-operator-manager-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/opentelemetry-operator-proxy-rolebinding created
service/opentelemetry-operator-controller-manager-metrics-service created
service/opentelemetry-operator-webhook-service created
deployment.apps/opentelemetry-operator-controller-manager created
certificate.cert-manager.io/opentelemetry-operator-serving-cert created
issuer.cert-manager.io/opentelemetry-operator-selfsigned-issuer created
mutatingwebhookconfiguration.admissionregistration.k8s.io/opentelemetry-operator-mutating-webhook-configuration created
validatingwebhookconfiguration.admissionregistration.k8s.io/opentelemetry-operator-validating-webhook-configuration created
$ kubectl get pods -n opentelemetry-operator-system
NAME READY STATUS RESTARTS AGE
opentelemetry-operator-controller-manager-548b94f546-tmzpf 2/2 Running 0 18s
$ kubectl apply -f config/samples/core_v1alpha1_opentelemetrycollector.yaml
opentelemetrycollector.opentelemetry.io/opentelemetrycollector-sample created
$ kubectl get pods -n default
NAME READY STATUS RESTARTS AGE
opentelemetrycollector-sample-collector-5d9bbb498c-hf2r7 1/1 Running 0 10s
$ kubectl logs deployments/opentelemetrycollector-sample-collector | tail -n 1
{"level":"info","ts":1603351057.2240055,"caller":"service/service.go:252","msg":"Everything is ready. Begin running and processing data."} |
Hi @jpkrohling, I also tried to reproduce it on minikube and everything worked fine. Thank you for your quick input ! |
I am hitting this as well but I cannot replicate it in minikube, everything there works fine but as soon as I try this in our cluster it times out. There is zero logging from the operator so I have no idea whats going on 😭 |
@krak3n, there's no logging in the operator because the problem is probably before the operator gets the chance to see the change. Do you have cert-manager installed? Anything suspicious when you run |
Yeah @jpkrohling I thought as much, sorry for my cries of desperation lol It is likely an issue with our cluster, permissions or something thats blocking the There is nothing in cert manager is all running fine, its been there for months and recently updated it to the latest release so I don't think the issue is there. |
And I assume the operator itself is also up and running, right? Are you able to expose the service via, say, |
@jpkrohling yup opened a port forward to the service with:
Then made a http --verify=no POST https://localhost:9443/mutate-opentelemetry-io-v1alpha1-opentelemetrycollector\?timeout=10s
HTTP/1.1 200 OK
Content-Length: 128
Content-Type: text/plain; charset=utf-8
Date: Tue, 13 Apr 2021 10:06:52 GMT
{
"response": {
"allowed": false,
"status": {
"code": 400,
"message": "contentType=, expected application/json",
"metadata": {}
},
"uid": ""
}
} And see the error in the operator. |
Then either the TLS cert isn't being accepted by the Kubernetes API (the caller), or there's a networking issue. Not sure what I can do at this point to help you, though :-/ |
Yeah I am at a loss as well 🤷 The cert seems fine based on the sate of the resources.
And if I do the exact same setup in minikube it's all fine. I'll keep digging. |
@jpkrohling so I managed to get it working by changing the |
Ok so here is the full set of events from the
So I am guessing the actual issue is the |
@CyberHippo did you ever figure out what your issue was? |
@krak3n No I did not. I am now using a custom version of the otel-config.yaml for k8s which works perfectly fine. |
Thanks @CyberHippo thats a shame, I was really hoping to use sidecar injection, oh well. |
It is in my interest to have you both using the operator. Please help me reproduce this issue so that I can fix it :-) If you can't consistently reproduce with minikube, would one of you be able to give me access to your cluster? |
@jpkrohling it's a work cluster, though it is just staging, might be tricky, I'll run it past the team at stand up tomorrow, I was gonna try and set up my own GKE cluster and see if I can replicate there rather than minikube. I'll do another fresh install on the cluster tomorrow morning and give you all the logs / events I can get my hands on. |
A long shot, but which version of the cert-manager are you using? Have you tried using their latest version? |
Morning @jpkrohling - we run cert-manager 1.3.0 which I believe is the latest, we've been running cert manager for a while in the cluster and is currently handling some lets encrypt certs a well. I attempted a fresh install of the operator this morning using the example in the
|
This is running in GKE, right? I have meetings the whole day today and tomorrow, and I'm off Friday, but I'll try to get to this one early next week. |
Hi @jpkrohling yup it's in GKE, yeah no worries. hit me up when ever you get some time 👍 |
HI @jpkrohling we can't give you direct access to the cluster but we could jump on a call or something and you could debug through me lol Not ideal I know 🤷 |
@krak3n Is your GKE cluster private ? If yes, which ports are open on your GCP Firewall rule |
It is private and allows |
@CyberHippo do other ports need to be open? |
@krak3n I'm not sure. But I suspect that another port needs to be open as well. I encountered a similar issue with an ingress-controller and a private GKE cluster. @jpkrohling Do you know if specific ports need to be open on the master ? |
From the top of my head, I can't think of any other ports we might use, but I would check the kubebuilder docs. |
There's some information scattered around this issue, so, let me consolidate it here:
|
I am running into a similar issue on Openshift but with a different signature in the error msg. (BTW, did have cert-manager installed without any issue):
The operator controller pod was initially running fine but as soon as I run the above command to create a collector instance, it got into a crashLoopBackoff situation.
Logs from the controller pod:
As a result of the pod crash, there was no endpoints for the opentelemetry-operator-webhook-service service.
Not sure why the controller pod crashed in the first place. |
Update. I did some research, exit code 137 indicates an out of memory for the container. So I modified the resource request and limit for the operator manager container to below:
With this change in place, the operator is no longer crashing and I was able to deploy the simplest collector successfully. |
@ruibinghao, would you be able to send in a PR bumping that? Here's the relevant place: opentelemetry-operator/config/manager/manager.yaml Lines 32 to 38 in 463e014
|
nice! I was seeing exactly same w/ OpenShift (the |
Managed to get some time debugging this today, I set myself up a Private GKE cluster and was able to reproduce the context deadline exceeded issue. I can confirm that this is due ports I created these firewall rules for the master and everything works as it should:
Maybe some documentation should be added regarding private clusters to ensure these ports are open. Edit: |
@krak3n, would you be able to document this in the place you'd expect to see this documented? |
Sure will do, assign to me 😄 |
Hi, Logs from opentelemetry operator:
|
I'm closing this, as the original report has been fixed. |
Thanks again for all your kind help @jpkrohling |
Thank you! This also resolve the issue in GKE Autopilot! |
related issue to document it #1009 |
Hi,
I cannot create the
simplest
OpenTelemetryCollector. I get the following error when I try to create it from STDIN:The Opentelemetry Operator Controller Manager is up and running in the namespace
opentelemetry-operator-system
:And the logs of both containers (
manager
andkube-rbac-proxy
) do not show any error.I installed the required resources using:
Is there something I am missing ?
Thank you for your help !
The text was updated successfully, but these errors were encountered: