Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"inconsistent state error adding volume, StorageClass.storage.k8s.io "nfs" not found, please file an issue" #1998

Closed
armenr opened this issue Jun 27, 2022 · 22 comments · Fixed by #2033
Assignees
Labels
bug Something isn't working

Comments

@armenr
Copy link

armenr commented Jun 27, 2022

Version

Karpenter: v0.12.0

Kubernetes: v1.22.9-eks-a64ea69

Expected Behavior

Karpenter works as expected (no issues with nfs volumes)

Actual Behavior

Seeing the following in my karpenter logs:

controller 2022-06-27T06:30:24.907Z ERROR controller.node-state inconsistent state error adding volume, StorageClass.storage.k8s.io "nfs" not found, please file an issue {"commit": "588d4c8", "node": "ip-XXX-XX-XX-XXX.us-west-2.compute.internal"}

Steps to Reproduce the Problem

Resource Specs and Logs

Provisioner:

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: workers
spec:
  # https://github.com/aws/karpenter/issues/1252#issuecomment-1166894316
  labels:
    vpc.amazonaws.com/has-trunk-attached: "false"
  taints:
  - key: purpose
    effect: NoSchedule
    value: workers
  - key: purpose
    value: workers
    effect: NoExecute
  requirements:
    - key: "purpose"
      operator: In
      values: ["inflate-workers", "workers"]
    - key: "karpenter.sh/capacity-type"
      operator: In
      values: ["spot", "on-demand"]
    - key: "node.kubernetes.io/instance-type"
      operator: In
      values: ["t3.medium", "t3.large", "t3.xlarge", "m5n.xlarge", "m6a.large", "c6a.large"]
    - key: "kubernetes.io/arch"
      operator: In
      values: ["amd64"]
  provider:
    instanceProfile: eks-random-strings-redacted
    securityGroupSelector:
      Name: v2-cert-03-eks-node
    subnetSelector:
      Name: v2-cert-03-private-us-west-2*
  ttlSecondsAfterEmpty: 30

Sample Deployment Template

apiVersion: apps/v1
kind: Deployment
metadata:
  name: some-deployment
  labels:
    app: some-deployment
    app-kubecost: dev
spec:
  revisionHistoryLimit: 1
  replicas:
  selector:
    matchLabels:
      app: some-deployment
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 0
  template:
    metadata:
      annotations:
        cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
      labels:
        app: some-deployment
        app-kubecost: dev
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: purpose
                    operator: In
                    values:
                      - workers
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                labelSelector:
                  matchExpressions:
                    - key: app
                      operator: In
                      values:
                        - some-deployment
                topologyKey: kubernetes.io/hostname
      tolerations:
        - key: purpose
          operator: Equal
          value: workers
          effect: NoSchedule
        - key: purpose
          operator: Equal
          value: workers
          effect: NoExecute
      imagePullSecrets:
        - name: uat-workers-dockercred
      containers:
        - name: worker
          image: "REDACTED.dkr.ecr.us-west-2.amazonaws.com/some_repo:some_immutable_tag"
          imagePullPolicy: IfNotPresent
          env:
            - name: VERBOSE
              value: "3"
          resources:
            requests:
              cpu: "2"
              memory: "2000Mi"
            limits:
              cpu: "2"
              memory: "2000Mi"
          volumeMounts:
            - name: files
              mountPath: /code/.configs.yaml
              subPath: configs.yaml
            - mountPath: "/protected_path/uat"
              name: nfs
      volumes:
        - name: files
          configMap:
            name: config-files
        - name: nfs
          persistentVolumeClaim:
            claimName: nfs


Sample PVC tempalte:

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nfs
spec:
  volumeName: nfs-{{ .Release.Namespace }}
  storageClassName: nfs
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 10Gi

Sample PV template:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-{{ .Release.Namespace }}
spec:
  capacity:
    storage: 10Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  storageClassName: nfs
  mountOptions:
    - nfsvers=4.1
    - rsize=1048576
    - wsize=1048576
    - hard
    - timeo=600
    - retrans=2
    - noresvport
  nfs:
    server: {{ .Values.nfs.server }}
    path: {{ .Values.nfs.path }}
  claimRef:
    name: nfs
    namespace: {{ .Release.Namespace }}
@armenr armenr added the bug Something isn't working label Jun 27, 2022
@armenr
Copy link
Author

armenr commented Jun 27, 2022

Additional problem with Karpenter when it tries to scale up nodes to match unscheduled pods.

The pod I am testing with is sized to match 1 pod against 1 node (mem/cpu combo + allowed instance types are designed this way...).

Karpenter will spin up 9-11 NEW nodes, just to match the 1 unschedulable pod...then will schedule the pod, and remove the extra nodes it added for no reason.

Please see below:
Screen Shot 2022-06-27 at 10 50 16 AM

In this case, Karpenter only needed to spin up TWO new nodes, in order to fit all the pods....but spun up 11 nodes instead.

@armenr
Copy link
Author

armenr commented Jun 27, 2022

Reverting to 0.10.1 appears to solve both the StorageClass & "too many new nodes" problems...going to stay on that version until we figure out what I might be doing wrong...

@tzneal
Copy link
Contributor

tzneal commented Jun 27, 2022

The extra nodes problem should be solved by #1980 . It's caused by the pod self affinity in your spec. If you remove the pod affinity rule, do you still see the error relating to storage class?

@armenr
Copy link
Author

armenr commented Jun 27, 2022

@tzneal - I'll check and report back shortly.

@armenr
Copy link
Author

armenr commented Jun 27, 2022

The NFS issue/storage class thing seems to go away when I remove:

      # affinity:
      #   nodeAffinity:
      #     requiredDuringSchedulingIgnoredDuringExecution:
      #       nodeSelectorTerms:
      #         - matchExpressions:
      #             - key: purpose
      #               operator: In
      #               values:
      #                 - {{ .Values.nodeGroup }}
      #   podAntiAffinity:
      #     preferredDuringSchedulingIgnoredDuringExecution:
      #       - weight: 100
      #         podAffinityTerm:
      #           labelSelector:
      #             matchExpressions:
      #               - key: app
      #                 operator: In
      #                 values:
      #                   - {{ .Values.app }}
      #           topologyKey: kubernetes.io/hostname

The too many nodes issue persists, aggressively. My boss is going to kill me when he sees the way node counts were bursting in our account today. $$

I'm going to stay downgraded @ 0.10.x for now...

@tzneal
Copy link
Contributor

tzneal commented Jun 27, 2022

Can you paste some of your Karpenter logs? I'm wondering if its the short TTL time and the nodes are being deleted before the persistent volume binds.

@pat-s
Copy link

pat-s commented Jun 28, 2022

I saw this also in the logs with 0.13.1 - what helped was adding the "arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy" IAM policy.

      iam_role_additional_policies = [
        # Required by Karpenter
        "arn:${local.partition}:iam::aws:policy/AmazonSSMManagedInstanceCore",
        # NFS
        "arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy"
      ]

This was also addressed in #1775 (comment). Maybe a FAQ candidate?

@armenr
Copy link
Author

armenr commented Jun 29, 2022

@tzneal - I can try and get you some logs.

@pat-s - I will look into this and see if it helps.

@tzneal - I'm still getting WAYYYYYYY too many new EC2 nodes when I try to scale up a single pod (the single pod is sized to fit a single EC2 of type m6.large...this is intentional). It results in a TON of new nodes.

Screen Shot 2022-06-29 at 12 16 44 PM

^^ The above was triggered when I scaled one of my Deployments from Replicas: 1 ==> Replicas:2.

On Karpenter 0.10.1, this issue does not happen. Please let me know what other info I can provide to help debug. I'd be happy to cooperate!

This is with Karpenter version 0.13.1

@tzneal
Copy link
Contributor

tzneal commented Jun 29, 2022

@armenr Can you provide the provisioner and pod spec for the situation where Karpenter launches too many nodes?

@armenr
Copy link
Author

armenr commented Jun 29, 2022

@tzneal - I'll get that to you shortly! Thanks again for the strong Bias for Action & Customer Obsession!! :)

@ace22b
Copy link

ace22b commented Jun 30, 2022

FYI we saw a similar issue when using static PVs with PVC containing spec.storageClassName: "".

Karpenter would keep provisioning new nodes until first node finally came up, then delete all the extra nodes.

Error was: 2022-06-29T22:56:04.721Z ERROR controller.node-state inconsistent state error adding volume, StorageClass.storage.k8s.io "" not found, please file an issue {"commit": "1f7a67b", "node": "X"} for every new node.

Replacing it with a real StorageClass fixed the issue and node provisioning is working as expected.

karpenter v0.13.1

@tzneal
Copy link
Contributor

tzneal commented Jun 30, 2022

@ace22b Thanks for that report, I believe that's the issue. We pull the CSI driver name off the storage class, but for static volumes we need to pull it from the volume itself.

@tzneal
Copy link
Contributor

tzneal commented Jun 30, 2022

@armenr Do you have an "nfs" storage class?

tzneal added a commit to tzneal/karpenter that referenced this issue Jul 1, 2022
For static volumes, we pull the CSI driver name off of the PV
after it's bound instead of from the SC named on the PVC.  The SC
may not even exist in these cases and is informational.


Fixes aws#1998
@armenr
Copy link
Author

armenr commented Jul 1, 2022

@tzneal -

We have the following:

PV Manifest

---
# Source: 5k/templates/pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-uat
spec:
  capacity:
    storage: 10Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  storageClassName: nfs
  mountOptions:
    - nfsvers=4.1
    - rsize=1048576
    - wsize=1048576
    - hard
    - timeo=600
    - retrans=2
    - noresvport
  nfs:
    server: fs-REDACTED_EFS_ADDRESS.efs.us-west-2.amazonaws.com
    path: /
  claimRef:
    name: nfs
    namespace: uat

PVC Manifest

---
# Source: 5k/templates/pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nfs
spec:
  volumeName: nfs-uat
  storageClassName: nfs
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 10Gi

Karpenter Provisioner

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: default
spec:
  labels:
    #https://github.com/aws/karpenter/issues/1252#issuecomment-1166894316
    vpc.amazonaws.com/has-trunk-attached: "false"
  requirements:
    - key: "karpenter.sh/capacity-type"
      operator: In
      values: ["spot", "on-demand"]
    - key: "node.kubernetes.io/instance-type"
      operator: In
      values: ["m6a.large", "m6a.xlarge", "c6a.large", "c6a.xlarge", "m6a.2xlarge", "c6a.2xlarge", "c6i.2xlarge", "m5n.large", "m5n.xlarge", "m5n.2xlarge"]
    - key: "kubernetes.io/arch"
      operator: In
      values: ["amd64"]
  provider:
    instanceProfile: REDACTED_PROFILE_STRING
    securityGroupSelector:
      Name: redacted-sec-group-name
    subnetSelector:
      Name: redacted-env-private-us-west-2*
  ttlSecondsAfterEmpty: 30

Deployment Manifest (sample)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: meta
  labels:
    app: meta
    app-kubecost: dev
spec:
  revisionHistoryLimit: 1
  replicas: 1
  selector:
    matchLabels:
      app: meta
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 0
  template:
    metadata:
      annotations:
        cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
      labels:
        app: meta
        app-kubecost: dev
    spec:
      imagePullSecrets:
        - name: uat-dockercred
      initContainers:
        - name: init
          image: "REDACTED_ECR_IMAGE"
          imagePullPolicy: IfNotPresent
          args: ["/code/bin/init.sh"]
          env:
          volumeMounts:
            - name: files
              mountPath: /some/path/.file.yaml
              subPath: file.yaml
      containers:
        - name: web
          image: "REDACTED_ECR_IMAGE"
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 80
              name: http
          env:
          volumeMounts:
            - name: files
              mountPath: /some/path/.file.yaml
              subPath: file.yaml
        - name: app
          image: "REDACTED_ECR_IMAGE"
          imagePullPolicy: IfNotPresent
          resources:
            requests:
              cpu: "6"
              memory: "8Gi"
            limits:
              cpu: "6"
              memory: "8Gi"
          volumeMounts:
            - name: files
              mountPath: /some/path/.file.yaml
              subPath: file.yaml
            - mountPath: "/some_path/uat"
              name: nfs
      volumes:
        - name: files
          configMap:
            name: config-files
        - name: nfs
          persistentVolumeClaim:
            claimName: nfs

@armenr
Copy link
Author

armenr commented Jul 1, 2022

@tzneal - I'm also confused about something -

In the karpenter provisioner above, I used to also include t3 instances in sizes L + XL, along with the array of other instance types...but karpenter keeps provisioning only m5n.L and m5n.XL instances...nothing else, no matter what pod resources I schedule :-\

@tzneal
Copy link
Contributor

tzneal commented Jul 1, 2022

@armenr Can you file a separate issue? It's difficult to track multiple items in a single issue.

@tzneal
Copy link
Contributor

tzneal commented Jul 1, 2022

@armenr Thanks for the manifests, I've verified that the fix I've implemented solves this problem using those manifests as well.

tzneal added a commit to tzneal/karpenter that referenced this issue Jul 1, 2022
For static volumes, we pull the CSI driver name off of the PV
after it's bound instead of from the SC named on the PVC.  The SC
may not even exist in these cases and is informational.


Fixes aws#1998
tzneal added a commit that referenced this issue Jul 1, 2022
For static volumes, we pull the CSI driver name off of the PV
after it's bound instead of from the SC named on the PVC.  The SC
may not even exist in these cases and is informational.


Fixes #1998
@armenr
Copy link
Author

armenr commented Jul 2, 2022

@tzneal - Thank you so much for the quick turnaround and explanation/verification.

What would be the way to pull down a nightly (or from main/master branch) to test the fix until a release is tagged?

@tzneal
Copy link
Contributor

tzneal commented Jul 2, 2022

Yes, this should work for you. 4c35c0f is the current latest commit in main.

export COMMIT="4c35c0fe3cc13f55f7edba361cb2f5e662ac9867"
export CLUSTER_NAME="<INSERT_CLUSTER_NAME>"
export AWS_ACCOUNT_ID="$(aws sts get-caller-identity --query Account --output text)"
export KARPENTER_IAM_ROLE_ARN="arn:aws:iam::${AWS_ACCOUNT_ID}:role/${CLUSTER_NAME}-karpenter"
export CLUSTER_ENDPOINT="$(aws eks describe-cluster --name ${CLUSTER_NAME} --query "cluster.endpoint" --output text)"

helm install karpenter oci://public.ecr.aws/karpenter-snapshots/karpenter --version v0-${COMMIT} --namespace karpenter \
  --set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=${KARPENTER_IAM_ROLE_ARN} \
  --set clusterName=${CLUSTER_NAME} \
  --set clusterEndpoint=${CLUSTER_ENDPOINT} \
  --set aws.defaultInstanceProfile=KarpenterNodeInstanceProfile-${CLUSTER_NAME} \
  --wait # for the defaulting webhook to install before creating a Provisioner

@armenr
Copy link
Author

armenr commented Jul 2, 2022

Thanks so much @tzneal - I'll annoy you with my other ask (the instance sizes, which instance types get provisioned) on a separate issue.

Enjoy the long weekend, friend! Thanks again from a (former) fellow Amazonian!

@tzneal
Copy link
Contributor

tzneal commented Jul 9, 2022

Do you know if this is from a pvc or ephemeral volume? If it's a PVC, can you paste the spec? I tested with what was supplied above and can't reproduce the error. I'm expecting on this static PVC to see a volumeName which causes us to then ignore the storage class on the PVC and look it up from the volume.

@tzneal tzneal reopened this Jul 9, 2022
@tzneal tzneal self-assigned this Jul 10, 2022
@tzneal
Copy link
Contributor

tzneal commented Jul 10, 2022

Re-closing this, there may have been a comment deleted. This fix wasn't in the v0.13.2 release as it only resolved an issue related to pricing sync.

@tzneal tzneal closed this as completed Jul 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants