Trouble attaching volume #884

yvan · 2019-03-28T11:20:01Z

Having an issue where I'm getting multi attach errors when I try to attach a pvc. This issue was already brought up again 6 days ago #615. I'm just reopening it here as per the instruction on that thread.

What happened:

Pods cannot attach a pvc because it's bound somewhere else (though it should not be).

What I expect to happen:

Pods should be able to bind a pvc.

How to reproduce:

Not sure as I don't know why pvc's that are not in use would be attached or seen as attached by k8s.

k8s version:

Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.0", GitCommit:"ddf47ac13c1a9483ea035a79cd7c10005ff21a6d", GitTreeState:"clean", BuildDate:"2018-12-03T21:04:45Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.5", GitCommit:"51dd616cdd25d6ee22c83a858773b607328a18ec", GitTreeState:"clean", BuildDate:"2019-01-16T18:14:49Z", GoVersion:"go1.10.7", Compiler:"gc", Platform:"linux/amd64"}

azure region:

west europe

kubectl describe pod hub-7476649468-qfj75 -n res-jhub:

Name:               hub-7476649468-qfj75
Namespace:          res-jhub
Priority:           0
PriorityClassName:  <none>
Node:               aks-agentpool-57634498-0/10.237.233.4
Start Time:         Thu, 28 Mar 2019 10:12:33 +0000
Labels:             app=jupyterhub
                    component=hub
                    hub.jupyter.org/network-access-proxy-api=true
                    hub.jupyter.org/network-access-proxy-http=true
                    hub.jupyter.org/network-access-singleuser=true
                    pod-template-hash=7476649468
                    release=res-jhub
Annotations:        checksum/config-map: c9a28304bf7ebba72288eca12557c9ef656850d388cb6ddc9131ba46476eec32
                    checksum/secret: XXX
Status:             Pending
IP:
Controlled By:      ReplicaSet/hub-7476649468
Containers:
  hub:
    Container ID:
    Image:         jupyterhub/k8s-hub:0.8.0
    Image ID:
    Port:          8081/TCP
    Host Port:     0/TCP
    Command:
      jupyterhub
      --config
      /srv/jupyterhub_config.py
      --upgrade-db
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Requests:
      cpu:     200m
      memory:  512Mi
    Environment:
      PYTHONUNBUFFERED:              1
      HELM_RELEASE_NAME:             res-jhub
      POD_NAMESPACE:                 res-jhub (v1:metadata.namespace)
      CONFIGPROXY_AUTH_TOKEN:        <set to the key 'proxy.token' in secret 'hub-secret'>                            Optional: false
      AAD_TENANT_ID:                 <set to the key 'XXX' in secret 'tenant-id-secret'>                Optional: false
      AAD_CLIENT_ID:                 <set to the key 'XXX' in secret 'client-id-secret'>                Optional: false
      AAD_CLIENT_SECRET:             <set to the key 'XX' in secret 'client-id-secret-secret'>  Optional: false
      KUBERNETES_PORT_443_TCP_ADDR:  ds-cluster-1c087a4c.hcp.westeurope.azmk8s.io
      KUBERNETES_PORT:               tcp://ds-cluster-1c087a4c.hcp.westeurope.azmk8s.io:443
      KUBERNETES_PORT_443_TCP:       tcp://ds-cluster-1c087a4c.hcp.westeurope.azmk8s.io:443
      KUBERNETES_SERVICE_HOST:       ds-cluster-1c087a4c.hcp.westeurope.azmk8s.io
    Mounts:
      /etc/jupyterhub/config/ from config (rw)
      /etc/jupyterhub/secret/ from secret (rw)
      /srv/jupyterhub from hub-db-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from hub-token-dqmsr (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      hub-config
    Optional:  false
  secret:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  hub-secret
    Optional:    false
  hub-db-dir:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  hub-db-dir
    ReadOnly:   false
  hub-token-dqmsr:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  hub-token-dqmsr
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason              Age                 From                               Message
  ----     ------              ----                ----                               -------
  Normal   Scheduled           37m                 default-scheduler                  Successfully assigned res-jhub/hub-7476649468-qfj75 to aks-agentpool-57634498-0
  Warning  FailedAttachVolume  37m                 attachdetach-controller            Multi-Attach error for volume "pvc-0d7740b9-3a43-11e9-93d5-dee1946e6ce9" Volume is already exclusively attached to one node and can't be attached to another
  Warning  FailedMount         88s (x16 over 35m)  kubelet, aks-agentpool-57634498-0  Unable to mount volumes for pod "hub-7476649468-qfj75_res-jhub(fd95bd73-5141-11e9-bb1e-166ce5dfcaae)": timeout expired waiting for volumes to attach or mount for pod "res-jhub"/"hub-7476649468-qfj75". list of unmounted volumes=[hub-db-dir]. list of unattached volumes=[config secret hub-db-dir hub-token-dqmsr]
  Warning  FailedAttachVolume  12s (x26 over 37m)  attachdetach-controller            AttachVolume.Attach failed for volume "pvc-0d7740b9-3a43-11e9-93d5-dee1946e6ce9" : Attach volume "kubernetes-dynamic-pvc-0d7740b9-3a43-11e9-93d5-dee1946e6ce9" to instance "/subscriptions/XXXSUBIDXXX/resourceGroups/XXX/providers/Microsoft.Compute/virtualMachines/aks-agentpool-57634498-0" failed with compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: autorest/azure: Service returned an error. Status=<nil> Code="AttachDiskWhileBeingDetached" Message="Cannot attach data disk 'kubernetes-dynamic-pvc-d1bf467f-43dd-11e9-aff5-9a447838f109' to VM 'aks-agentpool-57634498-0' because the disk is currently being detached or the last detach operation failed. Please wait until the disk is completely detached and then try again or delete/detach the disk explicitly again." Target="dataDisks"

how many disks mouting into one VM in parallel:

kubectl get pvc -n res-jhub

NAME                            STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
claim-res1         Bound    pvc-9d33792c-41a9-11e9-978a-be1137772178   25Gi       RWO           default        19d
claim-res2         Bound    pvc-4452b002-44a9-11e9-aff5-9a447838f109   25Gi       RWO            default        16d
claim-res3  Bound    pvc-7d700c3e-3b39-11e9-93d5-dee1946e6ce9   25Gi       RWO            default        28d
claim-res4           Bound    pvc-7b7976d7-3a46-11e9-93d5-dee1946e6ce9   25Gi       RWO            default        29d
hub-db-dir                      Bound    pvc-0d7740b9-3a43-11e9-93d5-dee1946e6ce9   1Gi        RWO            default        29d
jupyterhub-shared-res-volume    Bound    pvc-cf599c83-44ee-11e9-aff5-9a447838f109   50Gi       RWX            azurefile      15d

The hub pod (whose describe is posted above) mounts the 1Gi hub-db-dir claim.
Every user pod that tries to spawn mounts both one of claim-res(1-4) and also mounts jupyterhub-shares-res-volume which is an azurefile.

what vms:

4 nodes/vms that have spec: Standard D16s v3 (16 vcpus, 64 GB memory)

No disk, cpu, or memory pressure that is in the node descriptions.

Other Similar Issues:

#477
#615

Not sure If related but my image puller also seems to be failing because it is looking for a file in the kubelet folder that it expects but cannot find.

Error: open /var/lib/kubelet/pods/5da90308-5141-11e9-bb1e-166ce5dfcaae/etc-hosts: no such file or directory

The text was updated successfully, but these errors were encountered:

yvan · 2019-03-28T11:22:23Z

@andyzhangx any thoughts? i'm a bit uncomfortable to just force the unattach my PVCs.

yvan · 2019-03-28T11:30:24Z

One of my pvc's that failed to attach described:

kubectl describe pvc claim-resX-n res-jhub

Name:          claim-resX
Namespace:     res-jhub
StorageClass:  default
Status:        Bound
Volume:        pvc-7b7976d7-3a46-11e9-93d5-dee1946e6ce9
Labels:        app=jupyterhub
               chart=jupyterhub-0.8.0
               component=singleuser-storage
               heritage=jupyterhub
               release=res-jhub
Annotations:   hub.jupyter.org/username: XXX
               pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-class: default
               volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/azure-disk
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      25Gi
Access Modes:  RWO
Events:        <none>
Mounted By:    <none>

andyzhangx · 2019-03-28T11:36:25Z

@yvan could you check the status of VM aks-agentpool-57634498-0 and disk kubernetes-dynamic-pvc-0d7740b9-3a43-11e9-93d5-dee1946e6ce9?
You may also try to use az vm update -g <group> -n <name> to update that vm status to workaround.

yvan · 2019-03-28T13:24:02Z

kubectl get no

NAME                       STATUS   ROLES   AGE   VERSION
aks-agentpool-57634498-0   Ready    agent   34d   v1.12.5

there is no such pvc as:

kubernetes-dynamic-pvc-0d7740b9-3a43-11e9-93d5-dee1946e6ce9

This seems to refer to:

hub-db-dir                      Bound    pvc-0d7740b9-3a43-11e9-93d5-dee1946e6ce9   1Gi        RWO            default        29d

All my PVCs always have status bound, even when they are not in use by a user or an app. It never caused an issue like this before. Just started experiencing this this morning.

andyzhangx · 2019-03-28T13:27:21Z

@yvan, I mean could you goto azure portal to check the status of VM aks-agentpool-57634498-0 and disk kubernetes-dynamic-pvc-0d7740b9-3a43-11e9-93d5-dee1946e6ce9?

yvan · 2019-03-28T13:39:56Z

There's a problem with the node aks-agentpool-57634498-0, it has status 'Running':

I actually see no such data disk (kubernetes-dynamic-pvc-0d7740b9-3a43-11e9-93d5-dee1946e6ce9) in the portal. Here are all my PVCs with form name 'kuberenetes-dynamic-pvc':

aks-agentpool-57634498-0

kubernetes-dynamic-pvc-5f0150b2-3633-11e9-93d5-dee1946e6ce9
kubernetes-dynamic-pvc-3ad459ae-3629-11e9-93d5-dee1946e6ce9
kubernetes-dynamic-pvc-d1bf467f-43dd-11e9-aff5-9a447838f109

aks-agentpool-57634498-1

kubernetes-dynamic-pvc-7d700c3e-3b39-11e9-93d5-dee1946e6ce9

aks-agentpool-57634498-2

kubernetes-dynamic-pvc-4452b002-44a9-11e9-aff5-9a447838f109

aks-agentpool-57634498-3

kubernetes-dynamic-pvc-8a085d64-367c-11e9-93d5-dee1946e6ce9
kubernetes-dynamic-pvc-75a4463a-4fb4-11e9-bb1e-166ce5dfcaae

There is one with a VERY similar name on aks-agentpool-57634498-1 but it differs by 0d7740b9-3a43. This node has status of 'Running.'

andyzhangx · 2019-03-28T14:02:36Z

could you run az vm update -g <group> -n aks-agentpool-57634498-0 to update your vm? If that does not work, you may file an azure ticket to fix that.

yvan · 2019-03-28T14:22:13Z

I gave it a go, the result:

az vm update -g MC_risc-ml_ds-cluster_westeurope -n aks-agentpool-57634498-0

Cannot attach data disk 'kubernetes-dynamic-pvc-d1bf467f-43dd-11e9-aff5-9a447838f109' to VM 'aks-agentpool-57634498-0' because the disk is currently being detached or the last detach operation failed. Please wait until the disk is completely detached and then try again or delete/detach the disk explicitly again.

andyzhangx · 2019-03-28T14:26:58Z

could you help find that pvc-0d7740b9-3a43-11e9-93d5-dee1946e6ce9 by:

find the pv name
kubectl get pvc pvc-0d7740b9-3a43-11e9-93d5-dee1946e6ce9
kubectl get pv PV-NAME -o yaml

You would get the full resource path of that disk, check whether that disk exists or not

yvan · 2019-03-28T14:34:33Z

Ok so it exists if I show all namespaces:

kubectl get pvc --all-namespaces

NAMESPACE   NAME                                  STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
...
res-jhub    hub-db-dir                            Bound    pvc-0d7740b9-3a43-11e9-93d5-dee1946e6ce9   1Gi        RWO            default        29d
...

But if I check the namespace where it should be I get:

kubectl get pvc pvc-0d7740b9-3a43-11e9-93d5-dee1946e6ce9 -n res-jhub

Error from server (NotFound): persistentvolumeclaims "pvc-0d7740b9-3a43-11e9-93d5-dee1946e6ce9" not found

ellieayla · 2019-03-29T06:03:32Z

@yvan Your PV has "pvc" in the name, creating some confusion. Contrastkubectl get pvc hub-db-dir -n res-jhub and kubectl get pv pvc-0d7740b9-3a43-11e9-93d5-dee1946e6ce9

yvan · 2019-03-29T08:15:58Z

The names are just generated by jupyterhub. I agree it's mildly annoying. At the end of the day I want to understand why this happened and care a lot less about the names.

kubectl get pvc hub-db-dir -n res-jhub

NAME         STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
hub-db-dir   Bound    pvc-0d7740b9-3a43-11e9-93d5-dee1946e6ce9   1Gi        RWO            default        30d

kubectl get pv pvc-0d7740b9-3a43-11e9-93d5-dee1946e6ce9 -n res-jhub

NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                 STORAGECLASS   REASON   AGE
pvc-0d7740b9-3a43-11e9-93d5-dee1946e6ce9   1Gi        RWO            Delete           Bound    res-jhub/hub-db-dir   default                 30d

andyzhangx · 2019-03-29T12:10:47Z

@yvan could you run kubectl get pv pvc-0d7740b9-3a43-11e9-93d5-dee1946e6ce9 -o yaml -n res-jhub and get the azure resource path of that disk, and then check whether that azure disk exists or not?

yvan · 2019-03-29T12:37:53Z

Ok I think this is what you wanted to locate:

kubectl get pv pvc-0d7740b9-3a43-11e9-93d5-dee1946e6ce9 -o yaml -n res-jhub

...
spec:
  accessModes:
  - ReadWriteOnce
  azureDisk:
    cachingMode: ReadOnly
    diskName: kubernetes-dynamic-pvc-0d7740b9-3a43-11e9-93d5-dee1946e6ce9
    diskURI: /subscriptions/XXX/resourceGroups/XXX/providers/Microsoft.Compute/disks/kubernetes-dynamic-pvc-0
...

I checked to see if the the diskURI exists and found 2 disks with similar names:

kubernetes-dynamic-pvc-0d7740b9-3a43-11e9-93d5-dee1946e6ce9 (disk state attached)
kubernetes-dynamic-pvc-7d700c3e-3b39-11e9-93d5-dee1946e6ce9 (disk state attached)

andyzhangx · 2019-03-29T12:43:05Z

could you check which node is disk kubernetes-dynamic-pvc-0d7740b9-3a43-11e9-93d5-dee1946e6ce9 is attached to in azure portal? your original error is:

AttachVolume.Attach failed for volume "pvc-0d7740b9-3a43-11e9-93d5-dee1946e6ce9" : Attach volume "kubernetes-dynamic-pvc-0d7740b9-3a43-11e9-93d5-dee1946e6ce9" to instance "/subscriptions/XXXSUBIDXXX/resourceGroups/XXX/providers/Microsoft.Compute/virtualMachines/aks-agentpool-57634498-0" failed with compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: autorest/azure: Service returned an error. Status=<nil> Code="AttachDiskWhileBeingDetached" Message="Cannot attach data disk 'kubernetes-dynamic-pvc-d1bf467f-43dd-11e9-aff5-9a447838f109' to VM 'aks-agentpool-57634498-0' because the disk is currently being detached or the last detach operation failed. Please wait until the disk is completely detached and then try again or delete/detach the disk explicitly again." Target="dataDisks"

yvan · 2019-03-29T14:22:26Z

1 - the disk is attached to node: aks-agentpool-57634498-0

Also mysteriously a bunch of disk pressure messages (that have definitely not been popping up over 14-22d) have appeared in my event log for my 4th node:

kubectl describe no aks-agentpool-57634498-3

Events:
  Type     Reason                 Age                From                               Message
  ----     ------                 ----               ----                               -------
  Warning  EvictionThresholdMet   16m (x4 over 14d)  kubelet, aks-agentpool-57634498-3  Attempting to reclaim ephemeral-storage
  Normal   NodeHasDiskPressure    16m (x4 over 14d)  kubelet, aks-agentpool-57634498-3  Node aks-agentpool-57634498-3 status is now: NodeHasDiskPressure
  Warning  FreeDiskSpaceFailed    13m                kubelet, aks-agentpool-57634498-3  failed to garbage collect required amount of images. Wanted to free 2705995366 bytes, but freed 0 bytes
  Normal   NodeHasNoDiskPressure  11m (x7 over 22d)  kubelet, aks-agentpool-57634498-3  Node aks-agentpool-57634498-3 status is now: NodeHasNoDiskPressure

seems related to pulling images. no events for aks-agentpool-57634498-0. Maybe related to kubernetes/kubernetes#32542.

yvan · 2019-04-25T15:41:42Z

small update: in the end i just waited a day and the cluster eventually helped clear up the resources. it seems connected to a broader service outage/issue mounting disks on AKS.

Gangareddy · 2019-09-05T19:17:44Z

A similar issue arose in my cluster, the pod could not start as the PV corresponding to its PVC is bound to another node and it could not detach the disk(PV). This happened after changing the service principal on all my nodes. This seems very no-deterministic in nature, yet this happens occasionally in aks-engine generated clusters.

andyzhangx · 2019-09-06T04:42:54Z

@Gangareddy you may also change the service principal on the master node, otherwise detach disk operation could not succeed. In that condition, you could manually detach that disk from the agent node, and k8s will automatically attach disk PV to the new node.

Gangareddy · 2019-09-09T17:07:51Z

@andyzhangx: I have updated the service principal on master nodes as well. Thought manually detaching disk would complicate self-healing nature of AKS. However, I was able to create additional disks from the snapshots of the disks that were stuck to the VM. Made changes to my PVs (persistent volumes) to use new creatly disks from snapshots. But I wonder, Is a VM reboot necessary after changing service principal on the VM.?

andyzhangx · 2019-09-10T02:59:25Z

@Gangareddy pls follow this guide to reset service principal: https://docs.microsoft.com/en-us/azure/aks/update-credentials, on agent node, you only need to restart kubelet:

sudo systemctl daemon-reload
sudo systemctl restart kubelet

mcurry-brierley · 2019-10-23T17:57:14Z

Unable to Attach PVCs to a basic K8s deploy in Azure, when is K8s going to be production ready? This is just sad.
I am on the newest stable build in azure, and I have tried standard/premium storage and a more powerful node, as the PVCs just time out and ruin the deploy.

This cluster is brand new created today.

andyzhangx · 2019-10-25T05:40:23Z

@mcurry-brierley could you provide more details about this issue? .e.g

pod description with error details
k8s verison
vmss or vmas (recently we hit a disk attach/detach issue in vmss, team are hotfixing this issue)

mcurry-brierley · 2019-10-25T14:44:45Z

You can close this, as I have removed all resources and we have decided not to go forward with azure as a result. Azure is not an enterprise environment. When I feel it is, I will try again...

andyzhangx · 2019-10-25T14:52:15Z

@mcurry-brierley I would say you may happen to hit this vmss disk attach issue which only happens in these two weeks, vmss team are hotfixing this disk attach issue. Just set up a VM Availability Set VM based cluster, I am pretty sure it won't have such issue. Just ping me if you hit such issue in slack.

vijaygos · 2019-10-28T22:55:18Z

@andyzhangx , we seem to be hit by this on our production environments as well. I'll ping you on teams (internal) to see what we can do with this problem. In general, I agree with @mcurry-brierley on how annoying this is. We have had way too many issues with VMSS in the last 6 months and I am really tempted to track their team down and ask them for their SLA and where they have been wrt to the SLA in the last 6 months.

davestephens · 2019-11-06T15:06:47Z

@vijaygos @andyzhangx Has there been any progress on this? VMSS are totally unusable with AKS, which obviously means multiple nodepools are out of the window.

Following a lengthy support call with Andy Wu in your support team, he advised I give up on my previous cluster and create a new one. As a test I've deleted a pod to reschedule it elsewhere, and already seeing the Volume is already exclusively attached to one node and can't be attached to another issue - and the cluster is literally an hour old with nothing else running on it.

Surely loads of people must be seeing this, is it ever going to be fixed?

andyzhangx · 2019-11-06T15:25:14Z

@davestephens Your issue Volume is already exclusively attached to one node and can't be attached to another looks like a different issue, the error info is from k8s volume controller which means the volume is not unmounted from the previous node. Could you open a new issue and provide more details, e.g. aks version, vmas or vmss, full describe pod info, etc.
VMSS issue has been resolved last weekend.

mcurry-brierley · 2019-11-06T16:06:16Z

@vijaygos and I have a point. This is a production product?
Asking me to completely remove and redeploy a kubernetes cluster just because your cloud isn't properly configured is not acceptable. Also attaching a PVC is a BASE FUNCTION. It not working for ANY amount of time is UNACCEPTABLE. Dont offer services if they are not enterprise grade.

langecode · 2019-11-12T12:52:29Z

We are actually experiencing something similar here. We had a failing instance in a VMSS based cluster. After deleting the instance it seems the Kubernetes control pane still sees the disks as attached. Looking in the Azure portal (or using the Azure CLI) the disks are unattached however starting up the POD we get the following status:

Events:
  Type     Reason              Age                From                                      Message
  ----     ------              ----               ----                                      -------
  Normal   Scheduled           6m3s               default-scheduler                         Successfully assigned monitoring/prometheus-prometheus-operator-prometheus-0 to aks-default-79370661-vmss000006
  Warning  FailedAttachVolume  6m3s               attachdetach-controller                   Multi-Attach error for volume "pvc-135c092c-fed0-11e9-b544-fa7d139c501a" Volume is already exclusively attached to one node and can't be attached to
another
  Warning  FailedMount         106s (x2 over 4m)  kubelet, aks-default-79370661-vmss000006  Unable to mount volumes for pod "prometheus-prometheus-operator-prometheus-0_monitoring(35a44efd-054a-11ea-bebb-c2acabd91d50)": timeout expired waiting for volumes to attach or mount for pod "monitoring"/"prometheus-prometheus-operator-prometheus-0". list of unmounted volumes=[prometheus-prometheus-operator-prometheus-db]. list of unattached volumes=[prometheus-prometheus-operator-prometheus-db config config-out prometheus-prometheus-operator-prometheus-rulefiles-0 prometheus-operator-prometheus-token-qzk8s]

Its like Kubernetes has cached the information that this disk is attached to the node which is no longer part of the cluster. We are running the latest non-preview version on AKS 1.14.8.

mustermania · 2019-11-12T22:15:49Z

hi all, I'm not sure if this is the same issue or not... but, I performed a k8s version upgrade on our non-prod node today. During that, one of the nodes died and caused problems. After restarting that node, the service that runs on that node wouldn't redeploy and is stuck in a perpetual Multi-Attach error for volume "pvc-77113250-15a7-11e9-ad5d-0a58ac1f0bbd" Volume is already used by pod(s) jenkins-695975cbb7-pmvvv . However, when I describe that pv, it says it is mounted to the correct, new pod. Additionally, the pvc storage disk is associated with correct node.

jabba2324 · 2019-12-03T10:25:29Z

I'm also experiencing this problem intermittently when making deployments:

The following error is given:

Multi-Attach error for volume "pvc-e9b72e86-129a-11ea-9a02-9abdbf393c78" Volume is already used by pod(s) prometheus-server-5c8b68f5cd-qrskq

Unable to mount volumes for pod "prometheus-server-7b887899b7-l95n2_monitoring(6784bec0-15b3-11ea-9a02-9abdbf393c78)": timeout expired waiting for volumes to attach or mount for pod "monitoring"/"prometheus-server-7b887899b7-l95n2". 

list of unmounted volumes=[storage-volume]. list of unattached volumes=[config-volume storage-volume prometheus-server-token-vqcc9]

We aern't in production yet but quite hesitant unless this issue is resolved.
Thanks

andyzhangx · 2019-12-03T10:59:20Z

I'm also experiencing this problem intermittently when making deployments:

The following error is given:

Multi-Attach error for volume "pvc-e9b72e86-129a-11ea-9a02-9abdbf393c78" Volume is already used by pod(s) prometheus-server-5c8b68f5cd-qrskq

Unable to mount volumes for pod "prometheus-server-7b887899b7-l95n2_monitoring(6784bec0-15b3-11ea-9a02-9abdbf393c78)": timeout expired waiting for volumes to attach or mount for pod "monitoring"/"prometheus-server-7b887899b7-l95n2". 

list of unmounted volumes=[storage-volume]. list of unattached volumes=[config-volume storage-volume prometheus-server-token-vqcc9]

We aern't in production yet but quite hesitant unless this issue is resolved.
Thanks

the error info is from k8s volume controller which means the volume is not unmounted from the previous node. Did the volume attach succeeded finally?
BTW, the original vmss disk issue is already fixed.

andyzhangx · 2020-01-06T14:49:27Z

back to this question again, there are two kinds of Multi-Attach error issues:

Multi-Attach error for volume "pvc-e9b72e86-129a-11ea-9a02-9abdbf393c78" Volume is already used by pod(s) (by design issue)
two pods are using same disk PVC, this issue could happen even using Deployment with one replica, check detailed explanation and workaround here: https://github.com/andyzhangx/demo/blob/master/issues/azuredisk-issues.md#25-multi-attach-error. In brief, you should use following config(maxSurge: 0) in Deployment config:

  strategy:
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 1
    type: RollingUpdate

Multi-Attach error for volume "pvc-0d7740b9-3a43-11e9-93d5-dee1946e6ce9" Volume is already exclusively attached to one node and can't be attached to another (fixed in Nov.2019)
This is a transient issue, and we already fixed this issue on AKS in Nov.2019, and also we have added disk attach/detach self-healing feature

I will close issue. Let me know if you have any question.

ItayZviCohen · 2020-03-19T15:25:29Z

Happened to me as well when deploying several helm charts. Created a cluster without vmss and the problem was solved.

jnoller added known-issue pvcs labels Apr 5, 2019

yvan mentioned this issue Apr 25, 2019

400 error when mounting pvc for calls to retrieve instance id. Azure/aks-engine#1146

Closed

Gangareddy mentioned this issue Oct 16, 2019

Unable to mount volumes for pod: timeout expired waiting for volumes to attach or mount kubernetes/kubernetes#65500

Closed

kwikwag mentioned this issue Oct 17, 2019

Disk attachment/mounting problems, all pods with PVCs stuck in ContainerCreating #1278

Closed

This comment has been minimized.

Sign in to view

andyzhangx closed this as completed Jan 6, 2020

bitsofinfo mentioned this issue Jun 10, 2020

1.16.9: managed-premium, slow volume mounting/attaching: Unable to attach or mount volumes: timed out waiting for the condition #1663

Closed

ghost locked as resolved and limited conversation to collaborators Jul 24, 2020

Trouble attaching volume #884

Trouble attaching volume #884

Comments

yvan commented Mar 28, 2019

yvan commented Mar 28, 2019

yvan commented Mar 28, 2019

andyzhangx commented Mar 28, 2019

yvan commented Mar 28, 2019

andyzhangx commented Mar 28, 2019

yvan commented Mar 28, 2019

andyzhangx commented Mar 28, 2019

yvan commented Mar 28, 2019

andyzhangx commented Mar 28, 2019

yvan commented Mar 28, 2019

ellieayla commented Mar 29, 2019

yvan commented Mar 29, 2019

andyzhangx commented Mar 29, 2019

yvan commented Mar 29, 2019

andyzhangx commented Mar 29, 2019 • edited Loading

yvan commented Mar 29, 2019 • edited Loading

yvan commented Apr 25, 2019

Gangareddy commented Sep 5, 2019

andyzhangx commented Sep 6, 2019

Gangareddy commented Sep 9, 2019

andyzhangx commented Sep 10, 2019

mcurry-brierley commented Oct 23, 2019 • edited Loading

andyzhangx commented Oct 25, 2019

mcurry-brierley commented Oct 25, 2019 via email • edited Loading

andyzhangx commented Oct 25, 2019

vijaygos commented Oct 28, 2019 • edited Loading

davestephens commented Nov 6, 2019

andyzhangx commented Nov 6, 2019

mcurry-brierley commented Nov 6, 2019

langecode commented Nov 12, 2019

mustermania commented Nov 12, 2019

This comment has been minimized.

jabba2324 commented Dec 3, 2019 • edited Loading

andyzhangx commented Dec 3, 2019

andyzhangx commented Jan 6, 2020 • edited Loading

ItayZviCohen commented Mar 19, 2020

andyzhangx commented Mar 29, 2019 •

edited

Loading

yvan commented Mar 29, 2019 •

edited

Loading

mcurry-brierley commented Oct 23, 2019 •

edited

Loading

mcurry-brierley commented Oct 25, 2019 via email •

edited

Loading

vijaygos commented Oct 28, 2019 •

edited

Loading

jabba2324 commented Dec 3, 2019 •

edited

Loading

andyzhangx commented Jan 6, 2020 •

edited

Loading