Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error from server: failed to prune fields: failed add back owned items: failed to convert pruned object at version karpenter.sh/v1: #6824

Closed
ManuelMueller1st opened this issue Aug 21, 2024 · 7 comments
Labels
bug Something isn't working needs-triage Issues that need to be triaged

Comments

@ManuelMueller1st
Copy link

Description

Observed Behavior:
We've migrated from Karpenter 0.37.1 to 1.0.0. Now if I apply a NodePool the Karpenter pod logs the following error:

http: panic serving 10.250.97.76:40810: runtime error: invalid memory address or nil pointer dereference
goroutine 306281 [running]:
net/http.(*conn).serve.func1()
	net/http/server.go:1903 +0xbe
panic({0x277f360?, 0x4c9f9d0?})
	runtime/panic.go:770 +0x132
sigs.k8s.io/karpenter/pkg/apis/v1.(*NodeClaimTemplate).convertFrom(0xc008c34a10, {0x3476a98, 0xc0160db320}, 0xc0045a7308)
	sigs.k8s.io/[email protected]/pkg/apis/v1/nodepool_conversion.go:181 +0x19e
sigs.k8s.io/karpenter/pkg/apis/v1.(*NodePoolSpec).convertFrom(0xc008c34a10, {0x3476a98, 0xc0160db320}, 0xc0045a7308)
	sigs.k8s.io/[email protected]/pkg/apis/v1/nodepool_conversion.go:145 +0x106
sigs.k8s.io/karpenter/pkg/apis/v1.(*NodePool).ConvertFrom(0xc008c34908, {0x3476a98?, 0xc0160db320?}, {0x3458370?, 0xc0045a7200})
	sigs.k8s.io/[email protected]/pkg/apis/v1/nodepool_conversion.go:121 +0x15d
knative.dev/pkg/webhook/resourcesemantics/conversion.(*reconciler).convert(0xc000c27680, {0x3476a98, 0xc0160db140}, {{0xc000f09200, 0x8d6, 0x900}, {0x0, 0x0}}, {0xc008d21720, 0xf})
	knative.dev/[email protected]/webhook/resourcesemantics/conversion/conversion.go:137 +0x16d2
knative.dev/pkg/webhook/resourcesemantics/conversion.(*reconciler).Convert(0xc000c27680, {0x3476a98?, 0xc0160db0e0?}, 0xc0104f0040)
	knative.dev/[email protected]/webhook/resourcesemantics/conversion/conversion.go:57 +0x1e5
knative.dev/pkg/webhook.New.conversionHandler.func5({0x3467638, 0xc013205a40}, 0xc0119a8c60)
	knative.dev/[email protected]/webhook/conversion.go:66 +0x34a
net/http.HandlerFunc.ServeHTTP(0xc0012f0f80?, {0x3467638?, 0xc013205a40?}, 0x6c371f?)
	net/http/server.go:2171 +0x29
net/http.(*ServeMux).ServeHTTP(0xc0160daf90?, {0x3467638, 0xc013205a40}, 0xc0119a8c60)
	net/http/server.go:2688 +0x1ad
knative.dev/pkg/webhook.(*Webhook).ServeHTTP(0xc0012f0f00, {0x3467638, 0xc013205a40}, 0xc0119a8c60)
	knative.dev/[email protected]/webhook/webhook.go:310 +0xab
knative.dev/pkg/network/handlers.(*Drainer).ServeHTTP(0xc0004f9ea0, {0x3467638, 0xc013205a40}, 0xc0119a8c60)
	knative.dev/[email protected]/network/handlers/drain.go:113 +0x150
net/http.serverHandler.ServeHTTP({0x34521f0?}, {0x3467638?, 0xc013205a40?}, 0x6?)
	net/http/server.go:3142 +0x8e
net/http.(*conn).serve(0xc007fc7ef0, {0x3476a98, 0xc0012fe8d0})
	net/http/server.go:2044 +0x5e8
created by net/http.(*Server).Serve in goroutine 359
	net/http/server.go:3290 +0x4b4

Kubectl logs the following error:

Error from server: failed to prune fields: failed add back owned items: failed to convert pruned object at version karpenter.sh/v1: conversion webhook for karpenter.sh/v1beta1, Kind=NodePool failed: Post "https://karpenter.karpenter.svc:8443/conversion/karpenter.sh?timeout=30s": EOF

Here is the NodePool I want to apply:

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  disruption:
    budgets:
      - nodes: 10%
    consolidateAfter: 0s
    consolidationPolicy: WhenEmptyOrUnderutilized
  limits:
    cpu: "180"
    memory: 720Gi
  template:
    metadata:
      labels:
        f3z/env: playground
        f3z/managed-by: karpenter
        f3z/nodegroup: default
        f3z/nodepool: default
    spec:
      expireAfter: 720h
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
      requirements:
        - key: karpenter.k8s.aws/instance-hypervisor
          operator: In
          values:
            - nitro
        - key: kubernetes.io/arch
          operator: In
          values:
            - amd64
        - key: kubernetes.io/os
          operator: In
          values:
            - linux
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values:
            - "2"
        - key: karpenter.sh/capacity-type
          operator: In
          values:
            - spot
            - on-demand
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values:
            - c
            - m
            - r
        - key: karpenter.k8s.aws/instance-size
          operator: NotIn
          values:
            - nano
            - micro
            - small
            - medium

Expected Behavior:

NodePool gets applied without a error.

Reproduction Steps (Please include YAML):

Apply the yaml from above with 0.37.1, and reapply it with 1.0.0.

Versions:

  • Chart Version: 1.0.0
  • Kubernetes Version (kubectl version): 1.0.0
  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@ManuelMueller1st ManuelMueller1st added bug Something isn't working needs-triage Issues that need to be triaged labels Aug 21, 2024
@jmdeal
Copy link
Contributor

jmdeal commented Aug 21, 2024

My understanding from the reproduction steps is that I should be able to reproduce this by applying the provided NodePool on 0.37.1, upgrading Karpenter to 1.0.0, and reapplying the same NodePool after the upgrade has completed. I've been unable to replicate this with the provided NodePool, are you able to elaborate on the order of events? Specifically, could you elaborate on what you did to upgrade to 1.0.0 and if there were any other changes to resources in the cluster as part of that upgrade process?

@ManuelMueller1st
Copy link
Author

ManuelMueller1st commented Aug 26, 2024

I noticed that the error only occurs if we use kubectl apply --server-side.
We followed the https://karpenter.sh/preview/upgrading/v1-migration/ instructions to upgrade to Karpenter 1.0.0.

@sherifabdlnaby
Copy link

I noticed that the error only occurs if we use kubectl apply --server-side. We followed the karpenter.sh/preview/upgrading/v1-migration instructions to upgrade to Karpenter 1.0.0.

Using client-side apply mitigated the issue for us. It's not perfect for out GitOps solution tho.

@Ezcyo
Copy link

Ezcyo commented Aug 28, 2024

Hi!
Same setup on our side, upgrade from 0.37.1 to 1.0.0, post-upgrade webhooks passed successfully.
We are trying to apply the following NodePool:

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  annotations:
    compatibility.karpenter.sh/v1beta1-kubelet-conversion: '{"clusterDNS":["x.x.x.x"]}'
    compatibility.karpenter.sh/v1beta1-nodeclass-reference: '{"kind":"EC2NodeClass","name":"bottlerocket","apiVersion":"karpenter.k8s.aws/v1beta1"}'
  labels:
    kustomize.toolkit.fluxcd.io/name: karpenter-node-pool
    kustomize.toolkit.fluxcd.io/namespace: karpenter
  name: default-ondemand-amd64
spec:
  disruption:
    budgets:
    - nodes: 10%
    consolidateAfter: 0s
    consolidationPolicy: WhenEmptyOrUnderutilized
  limits:
    cpu: "100"
  template:
    spec:
      expireAfter: 720h
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: bottlerocket
      requirements:
      - key: karpenter.sh/capacity-type
        operator: In
        values:
        - on-demand
      - key: kubernetes.io/arch
        operator: In
        values:
        - amd64
      - key: karpenter.k8s.aws/instance-category
        operator: In
        values:
        - c
      - key: karpenter.k8s.aws/instance-family
        operator: In
        values:
        - c5a
        - c6a
      - key: karpenter.k8s.aws/instance-cpu
        operator: In
        values:
        - "4"
        - "8"
        - "16"
      startupTaints:
      - effect: NoExecute
        key: node.cilium.io/agent-not-ready

This results in the following error during apply:

NodePool/arm-ondemand dry-run failed, error: failed to prune fields: failed add back owned items: failed to convert pruned object at version karpenter.sh/v1: conversion webhook for karpenter.sh/v1beta1, Kind=NodePool failed: Post "https://karpenter.karpenter.svc:8443/conversion/karpenter.sh?timeout=30s": EOF

And the following traceback on the karpenter controller:

karpenter-6b4bd4c96c-nb2lf controller {"level":"ERROR","time":"2024-08-28T15:10:58.539Z","logger":"webhook","message":"http: panic serving 172.23.219.89:52172: runtime error: invalid memory address or nil pointer dereference\ngoroutine 34311 [running]:\nnet/http.(*conn).serve.func1()\n\tnet/http/server.go:1903 +0xb0\npanic({0x2225100?, 0x4734a10?})\n\truntime/panic.go:770 +0x124\nsigs.k8s.io/karpenter/pkg/apis/v1.(*NodeClaimTemplate).convertFrom(0x4005cdd310, {0x2f1bb28, 0x40070d5470}, 0x4001832b08)\n\tsigs.k8s.io/[email protected]/pkg/apis/v1/nodepool_conversion.go:181 +0x188\nsigs.k8s.io/karpenter/pkg/apis/v1.(*NodePoolSpec).convertFrom(0x4005cdd310, {0x2f1bb28, 0x40070d5470}, 0x4001832b08)\n\tsigs.k8s.io/[email protected]/pkg/apis/v1/nodepool_conversion.go:145 +0xe8\nsigs.k8s.io/karpenter/pkg/apis/v1.(*NodePool).ConvertFrom(0x4005cdd208, {0x2f1bb28?, 0x40070d5470?}, {0x2efd390?, 0x4001832a00})\n\tsigs.k8s.io/[email protected]/pkg/apis/v1/nodepool_conversion.go:121 +0x124\nknative.dev/pkg/webhook/resourcesemantics/conversion.(*reconciler).convert(0x40005c8d80, {0x2f1bb28, 0x40070d5320}, {{0x4005e9e6c0, 0x214, 0x240}, {0x0, 0x0}}, {0x40046fa8b0, 0xf})\n\tknative.dev/[email protected]/webhook/resourcesemantics/conversion/conversion.go:137 +0x119c\nknative.dev/pkg/webhook/resourcesemantics/conversion.(*reconciler).Convert(0x40005c8d80, {0x2f1bb28?, 0x40070d52c0?}, 0x40088e69c0)\n\tknative.dev/[email protected]/webhook/resourcesemantics/conversion/conversion.go:57 +0x174\nknative.dev/pkg/webhook.New.conversionHandler.func5({0x2f0c658, 0x40047736c0}, 0x4005e25200)\n\tknative.dev/[email protected]/webhook/conversion.go:66 +0x24c\nnet/http.HandlerFunc.ServeHTTP(0x4000d18080?, {0x2f0c658?, 0x40047736c0?}, 0x1d01d10?)\n\tnet/http/server.go:2171 +0x38\nnet/http.(*ServeMux).ServeHTTP(0x40070d5170?, {0x2f0c658, 0x40047736c0}, 0x4005e25200)\n\tnet/http/server.go:2688 +0x1a4\nknative.dev/pkg/webhook.(*Webhook).ServeHTTP(0x4000d18000, {0x2f0c658, 0x40047736c0}, 0x4005e25200)\n\tknative.dev/[email protected]/webhook/webhook.go:310 +0xc4\nknative.dev/pkg/network/handlers.(*Drainer).ServeHTTP(0x40004743f0, {0x2f0c658, 0x40047736c0}, 0x4005e25200)\n\tknative.dev/[email protected]/network/handlers/drain.go:113 +0x158\nnet/http.serverHandler.ServeHTTP({0x2ef71d0?}, {0x2f0c658?, 0x40047736c0?}, 0x6?)\n\tnet/http/server.go:3142 +0xbc\nnet/http.(*conn).serve(0x400a3701b0, {0x2f1bb28, 0x4000a21290})\n\tnet/http/server.go:2044 +0x508\ncreated by net/http.(*Server).Serve in goroutine 364\n\tnet/http/server.go:3290 +0x3f0\n","commit":"5bdf9c3"}```
We'll take a look at the client-side apply but this would not be ideal for the same reason as @sherifabdlnaby 

@dschaaff
Copy link
Contributor

dschaaff commented Sep 6, 2024

related #6867

@jyotibhanot18
Copy link

What should be done?

@engedaam
Copy link
Contributor

Closing this issue as a duplicate of #6867. Please follow there on the progress of this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs-triage Issues that need to be triaged
Projects
None yet
Development

No branches or pull requests

7 participants