Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incorrect healthCheck API version causes panic #432

Closed
gwvandesteeg opened this issue Sep 24, 2021 · 1 comment · Fixed by #435
Closed

incorrect healthCheck API version causes panic #432

gwvandesteeg opened this issue Sep 24, 2021 · 1 comment · Fixed by #435
Assignees
Labels
area/kstatus Health checking related issues and pull requests bug Something isn't working

Comments

@gwvandesteeg
Copy link

Description of issue
If you have an incorrect API version specified in a healthCheck it can cause the controller to panic instead of generating an error message. The default validation process doesn't catch this either.

Expected behaviour
Error message generated logging the details of the issue.

Current behaviour

The healthCheck was defined on a Job type, but had the wrong apiVersion specified, this can be any simple Job you've got specified.

Details
Diff from the patch to the kustomization

   timeout: 5m
   healthChecks:
     # make sure the neo4j bootstrap is ready
-    - apiVersion: v1
+    - apiVersion: batch/v1
       kind: Job
       name: neo4j-bootstrap
       namespace: default

Full working kustomization

---
apiVersion: kustomize.toolkit.fluxcd.io/v1beta1
kind: Kustomization
metadata:
  name: backends-configs
  namespace: flux-system
spec:
  interval: 10m0s
  dependsOn:
    - name: backends
  sourceRef:
    kind: GitRepository
    name: flux-system
  path: ./backends-configs/staging
  prune: true
  validation: client
  timeout: 5m
  healthChecks:
    # make sure the neo4j bootstrap is ready
    - apiVersion: batch/v1
      kind: Job
      name: neo4j-bootstrap-databases
      namespace: default

The panic output

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x48 pc=0x1bca585]

goroutine 358 [running]:
github.com/fluxcd/kustomize-controller/controllers.(*KustomizeHealthCheck).Assess(0xc000b29b78, 0x3b9aca00, 0x0, 0x0)
        /workspace/controllers/kustomization_healthcheck.go:97 +0x4a5
github.com/fluxcd/kustomize-controller/controllers.(*KustomizationReconciler).checkHealth(0xc000446140, 0x2389948, 0xc001344d50, 0xc0004a88e0, 0x1c1c919, 0xd, 0xc000be7320, 0x23, 0xc000c58630, 0x10, ...)
        /workspace/controllers/kustomization_controller.go:744 +0xfd
github.com/fluxcd/kustomize-controller/controllers.(*KustomizationReconciler).reconcile(0xc000446140, 0x2389948, 0xc001344d50, 0x1c1c919, 0xd, 0xc000be7320, 0x23, 0xc000c58630, 0x10, 0x0, ...)
        /workspace/controllers/kustomization_controller.go:385 +0xf7b
github.com/fluxcd/kustomize-controller/controllers.(*KustomizationReconciler).Reconcile(0xc000446140, 0x2389948, 0xc001344d50, 0xc0002fa6a0, 0xb, 0xc0002fa680, 0x10, 0xc001344d00, 0x0, 0x0, ...)
        /workspace/controllers/kustomization_controller.go:233 +0xe58
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc00021e3c0, 0x23898a0, 0xc000376000, 0x1e1e180, 0xc0005521a0)
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:298 +0x30d
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc00021e3c0, 0x23898a0, 0xc000376000, 0x0)
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:253 +0x205
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2(0xc000058bd0, 0xc00021e3c0, 0x23898a0, 0xc000376000)
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:214 +0x6b
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:210 +0x425

version information:

$ flux check
► checking prerequisites
✔ kubectl 1.21.0 >=1.18.0-0
✔ Kubernetes 1.20.7-eks-d88609 >=1.16.0-0
► checking controllers
✔ helm-controller: deployment ready
► ghcr.io/fluxcd/helm-controller:v0.11.2
✔ kustomize-controller: deployment ready
► ghcr.io/fluxcd/kustomize-controller:v0.14.1
✔ notification-controller: deployment ready
► ghcr.io/fluxcd/notification-controller:v0.16.0
✔ source-controller: deployment ready
► ghcr.io/fluxcd/source-controller:v0.15.4
✔ all checks passed

List showing details of the # of restarts on the controller.

$ kubectl get pods -n flux-system
NAME                                       READY   STATUS    RESTARTS   AGE
helm-controller-dc6ffd55b-p5v58            1/1     Running   0          5d19h
kustomize-controller-6c8cfccb59-8qhrn      1/1     Running   798        5d19h
notification-controller-8494bfd747-f9llm   1/1     Running   0          5d19h
source-controller-7445c6755-sz7bf          1/1     Running   0          5d19h
@makkes
Copy link
Member

makkes commented Sep 30, 2021

I was able to reproduce it. Thanks @gwvandesteeg for raising this. I will address it asap.

@stefanprodan stefanprodan added area/kstatus Health checking related issues and pull requests bug Something isn't working labels Sep 30, 2021
makkes pushed a commit to makkes/kustomize-controller that referenced this issue Sep 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kstatus Health checking related issues and pull requests bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants