Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standard_E2pds_v6: "OptimizeDiskPerformance: Node info is not provided. Error: invalid parameter" #2715

Closed
jkroepke opened this issue Dec 11, 2024 · 10 comments · Fixed by #2718

Comments

@jkroepke
Copy link

What happened: After switch to Standard_E2pds_v6 based node pools, all CSI Disk volumes can't mounts. Pods with PVC getting the following error

OptimizeDiskPerformance: Node info is not provided. Error: invalid parameter

What you expected to happen: It should just works

How to reproduce it:

  • Spin up node pools based on Standard_E2pds_v6
  • Setup an own StorageClass
parameters:
  cachingMode: ReadOnly
  fsType: xfs
  networkAccessPolicy: DenyAll
  perfProfile: Basic
  publicNetworkAccess: Disabled
  skuname: Premium_ZRS
provisioner: disk.csi.azure.com
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer

Anything else we need to know?:

Guessing parameters.perfProfile from the SC could be the root cause.

Node:

Name:               aks-opsstack-41465895-vmss000001
Roles:              <none>
Labels:             agentpool=opsstack
                    beta.kubernetes.io/arch=arm64
                    beta.kubernetes.io/instance-type=Standard_E2pds_v6
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=germanywestcentral
                    failure-domain.beta.kubernetes.io/zone=germanywestcentral-2
                    kubernetes.azure.com/agentpool=opsstack
                    kubernetes.azure.com/cluster=rg-dev-opsstack-aks-germanywestcentral-001
                    kubernetes.azure.com/consolidated-additional-properties=4b4ed7a4-b812-11ef-842b-1a8b7fea7dfa
                    kubernetes.azure.com/enable-apiserver-vnet-integration=true
                    kubernetes.azure.com/kubelet-identity-client-id=24ebfe9f-1282-4938-930d-d4fb2f88651d
                    kubernetes.azure.com/mode=system
                    kubernetes.azure.com/node-image-version=AKSAzureLinux-V2gen2arm64-202411.12.0
                    kubernetes.azure.com/nodepool-type=VirtualMachineScaleSets
                    kubernetes.azure.com/os-sku=AzureLinux
                    kubernetes.azure.com/role=agent
                    kubernetes.io/arch=arm64
                    kubernetes.io/hostname=aks-opsstack-41465895-vmss000001
                    kubernetes.io/os=linux
                    node.kubernetes.io/instance-type=Standard_E2pds_v6
                    topology.disk.csi.azure.com/zone=germanywestcentral-2
                    topology.kubernetes.io/region=germanywestcentral
                    topology.kubernetes.io/zone=germanywestcentral-2
Annotations:        alpha.kubernetes.io/provided-node-ip: 10.0.240.104
                    csi.volume.kubernetes.io/nodeid:
                      {"disk.csi.azure.com":"aks-opsstack-41465895-vmss000001","file.csi.azure.com":"aks-opsstack-41465895-vmss000001"}
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true

Environment:

  • CSI Driver version: mcr.microsoft.com/oss/kubernetes-csi/azuredisk-csi:v1.30.5
  • Kubernetes version (use kubectl version): v1.31.2
  • OS (e.g. from /etc/os-release): NAME="Common Base Linux Mariner" VERSION="2.0.20241029"
  • Kernel (e.g. uname -a): Linux aks-opsstack-41465895-vmss000000 5.15.167.1-2.cm2 Confirm to Kubernetes repo requirements #1 SMP Tue Oct 29 03:05:00 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux
  • Install tools:
  • Others: AKS
@andyzhangx
Copy link
Member

thanks, could you remove perfProfile: Basic in storage class? and for v6 vm sku, I think even you remove that parameter, it still won't work since it's missing a udev rule on the node, you need to run following command to download the udev rules on the node:

kubectl apply -f https://raw.githubusercontent.com/andyzhangx/demo/refs/heads/master/aks/download-v6-disk-rules.yaml

@jkroepke
Copy link
Author

jkroepke commented Dec 12, 2024

Hi, thanks!

The perfProfile: Basic setting are baked into some PVs as well. Any chance to mutate the PVs to I can remove the setting?

And do you know, if there is an alpine images available at mcr.microsoft.com

@andyzhangx
Copy link
Member

@jkroepke you should set --enable-perf-optimization=true in the azuredisk container, thus you don't need to change pv setting.
not sure about alpine image in mcr.microsoft.com

@jkroepke
Copy link
Author

Damn, I can't control that setting in AKS.

@jkroepke
Copy link
Author

jkroepke commented Dec 12, 2024

Manually apply the udev rule on that node wont recover existing PVs. It's not an production impact for me, but I saw that an PR is there to update it for AKS?

And for the future, using VolumeAttributeClasses might be a better option once GA.

https://kubernetes.io/docs/concepts/storage/volume-attributes-classes/

Then, such performance related field can be mutated after creation.

@andyzhangx
Copy link
Member

@jkroepke could you reschedule the pod to other node and make it remount? what's the error msg now?

@jkroepke
Copy link
Author

The test cluster have v6 SKU nodes exclusive. And the Daemonset seems working fine.

k logs download-v6-disk-rules-sq4ns -n kube-system
downloading 80-azure-disk.rules file to /etc/udev/rules.d/80-azure-disk.rules ...
--2024-12-12 12:19:39--  https://raw.githubusercontent.com/Azure/azure-vm-utils/refs/heads/main/udev/80-azure-disk.rules
Resolving raw.githubusercontent.com... 185.199.110.133, 185.199.111.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4194 (4.1K) [text/plain]
Saving to: '/etc/udev/rules.d/80-azure-disk.rules'

     0K ....                                                  100% 30.0M=0s

2024-12-12 12:19:39 (30.0 MB/s) - '/etc/udev/rules.d/80-azure-disk.rules' saved [4194/4194]

80-azure-disk.rules is taking effect now

Then I set the replica to 0 to unmount all pvc. I also looked to Azure Portal, if disks are detached from VM

image

After rescheude, disk are attached again:

image

and k describe pod show the same error:

Warning FailedMount 22s (x8 over 87s) kubelet MountVolume.MountDevice failed for volume "pvc-ab913fad-7053-4ba8-af8c-7a6cdb872a7f" : rpc error: code = Internal desc = failed to optimize device performance for target(/dev/disk/azure/scsi1/lun0) error(OptimizeDiskPerformance: Node info is not provided. Error: invalid parameter)

@andyzhangx
Copy link
Member

@jkroepke you are blocked by the first error, could you remove perfProfile: Basic in pv?

@jkroepke
Copy link
Author

I wish I could, but that setting is immutable

# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
# persistentvolumes "pvc-f9a78556-f41e-4057-87e1-17eb03010ff8" was not valid:
# * spec.persistentvolumesource: Forbidden: spec.persistentvolumesource is immutable after creation
# ....core.PersistentVolumeSource{
# ....  ... // 19 identical fields
# ....  Local:     nil,
# ....  StorageOS: nil,
# ....  CSI: &core.CSIPersistentVolumeSource{
# ....          ... // 2 identical fields
# ....          ReadOnly: false,
# ....          FSType:   "xfs",
# ....          VolumeAttributes: map[string]string{
# ....                  ... // 4 identical entries
# ....                  "fsType":              "xfs",
# ....                  "networkAccessPolicy": "DenyAll",
# -..                   "perfProfile":         "Basic",
# ....                  "publicNetworkAccess": "Disabled",
# ....                  "requestedsizegib":    "31",
# ....                  ... // 2 identical entries
# ....          },
# ....          ControllerPublishSecretRef: nil,
# ....          NodeStageSecretRef:         nil,
# ....          ... // 3 identical fields
# ....  },
# ....}

@andyzhangx
Copy link
Member

@jkroepke you need to create new pv unfortunately

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants