Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The smb volume can't be mounted to the windows pod after reboot #219

Closed
willinwu opened this issue Feb 1, 2021 · 19 comments
Closed

The smb volume can't be mounted to the windows pod after reboot #219

willinwu opened this issue Feb 1, 2021 · 19 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.

Comments

@willinwu
Copy link

willinwu commented Feb 1, 2021

I have installed a k8s environment with two linux worker nodes and one windows worker node.

k8s version: 1.20

kubectl get nodes
NAME                              STATUS   ROLES                  AGE   VERSION
docp-k8s-master1                  Ready    control-plane,master   24d   v1.20.1
docp-k8s-worker-11                Ready    worker                 24d   v1.20.1
docp-k8s-worker-21                Ready    worker                 24d   v1.20.1
docp-win191                       Ready    <none>                 23d   v1.20.0

csi-smb-driver version: 0.6.0
kubectl get pod -n kube-system | grep csi

csi-smb-controller-645cd7b449-6dm2d                     3/3     Running   4          21d
csi-smb-controller-645cd7b449-qzr6m                     3/3     Running   7          21d
csi-smb-node-4t5n7                                      3/3     Running   1          21d
csi-smb-node-9j5xm                                      3/3     Running   1          21d
csi-smb-node-kbtb6                                      3/3     Running   1          21d
csi-smb-node-win-mdrv7                                  3/3     Running   3          109m

csi-proxy version: 0.22

I0201 02:13:57.776512    6596 main.go:55] Starting CSI-Proxy Server ...
I0201 02:13:57.781666    6596 main.go:56] Version: v0.2.2-0-gffb169f

sc.yaml:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: smb
provisioner: smb.csi.k8s.io
parameters:
  source: "//docp-smb1/smbservice"
  csi.storage.k8s.io/node-stage-secret-name: "smbcreds"
  csi.storage.k8s.io/node-stage-secret-namespace: "default"
  createSubDir: "true"  # optional: create a sub dir for new volume
reclaimPolicy: Retain  # only retain is supported
volumeBindingMode: Immediate
mountOptions:
  - dir_mode=0777
  - file_mode=0777
  - uid=1001
  - gid=1001

deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: busybox-smb-2
  labels:
    app: busybox-2
spec:
  replicas: 1
  template:
    metadata:
      name: busybox-2
      labels:
        app: busybox-2
    spec:
      tolerations:
        - key: "OS"
          operator: "Equal"
          value: "Windows"
          effect: "NoSchedule"
      nodeSelector:
        "kubernetes.io/os": windows
      containers:
        - name: busybox-2
          image: e2eteam/busybox:1.29
          command:
            - "powershell.exe"
            - "-Command"
            - "while (1) { Add-Content -Encoding Ascii C:\\mnt\\smb\\data.txt $(Get-Date -Format u); sleep 1 }"
          volumeMounts:
            - name: smb
              mountPath: "/mnt/smb"
      volumes:
        - name: smb
          persistentVolumeClaim:
            claimName: pvc-smb-test
  selector:
    matchLabels:
      app: busybox-2

I got the error message like this after reboot the windows node.

MountVolume.MountDevice failed for volume "pvc-1efb71f1-ab8a-4bbf-8db7-84a8e58877b4" : rpc error: code = Internal desc = volume(pvc-1efb71f1-ab8a-4bbf-8db7-84a8e58877b4) mount "//docp-smb1/smbservice" on "\var\lib\kubelet\plugins\kubernetes.io\csi\pv\pvc-1efb71f1-ab8a-4bbf-8db7-84a8e58877b4\globalmount" failed with smb mapping failed with error: rpc error: code = Unknown desc = NewSmbGlobalMapping failed. output: "New-SmbGlobalMapping : Multiple connections to a server or shared resource by the same user, using more than one user \r\nname, are not allowed. Disconnect all previous connections to the server or shared resource and try again. \r\nAt line:1 char:190\r\n+ ... ser, $PWord;New-SmbGlobalMapping -RemotePath $Env:smbremotepath -Cred ...\r\n+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\r\n + CategoryInfo : NotSpecified: (MSFT_SmbGlobalMapping:ROOT/Microsoft/...mbGlobalMapping) [New-SmbGlobalMa \r\n pping], CimException\r\n + FullyQualifiedErrorId : Windows System Error 1219,New-SmbGlobalMapping\r\n \r\n", err: exit status 1

@andyzhangx
Copy link
Member

andyzhangx commented Feb 1, 2021

could you provide the csi-proxy logs? thanks.
csi-proxy code is here, need to check what went wrong:
https://github.com/kubernetes-csi/csi-proxy/blob/v0.2.2/internal/server/smb/server.go#L43-L72

@andyzhangx andyzhangx added the kind/bug Categorizes issue or PR as related to a bug. label Feb 1, 2021
@willinwu
Copy link
Author

willinwu commented Feb 1, 2021

I can only get log from the csi-procy console like below. Is there any other logs I can provide?

E0201 04:05:06.630924 9220 server.go:69] failed NewSmbGlobalMapping NewSmbGlobalMapping failed. output: "New-SmbGlobalMapping : Multiple connections to a server or shared resource by the same user, using more than one user \r\nname, are not allowed. Disconnect all previous connections to the server or shared resource and try again. \r\nAt line:1 char:190\r\n+ ... ser, $PWord;New-SmbGlobalMapping -RemotePath $Env:smbremotepath -Cred ...\r\n+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\r\n + CategoryInfo : NotSpecified: (MSFT_SmbGlobalMapping:ROOT/Microsoft/...mbGlobalMapping) [New-SmbGlobalMa \r\n pping], CimException\r\n + FullyQualifiedErrorId : Windows System Error 1219,New-SmbGlobalMapping\r\n \r\n", err: exit status 1

@andyzhangx
Copy link
Member

andyzhangx commented Feb 1, 2021

there should be a log file, e.g. csi-proxy.log

@willinwu
Copy link
Author

willinwu commented Feb 2, 2021

It seems there are no other useful information.

Log file created at: 2021/02/01 17:04:05
Running on machine: docp-win191
Binary: Built with gc go1.15.2 for windows/amd64
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I0201 17:04:05.905442   14124 main.go:125] Running as a Windows service.
I0201 17:04:05.908391   14124 main.go:135] Windows Service initialized through SCM
I0201 17:04:05.916432   14124 main.go:55] Starting CSI-Proxy Server ...
I0201 17:04:05.916432   14124 main.go:56] Version: v0.2.2-0-gffb169f
E0201 17:05:22.544719   14124 server.go:69] failed NewSmbGlobalMapping NewSmbGlobalMapping failed. output: "New-SmbGlobalMapping : Multiple connections to a server or shared resource by the same user, using more than one user \r\nname, are not allowed. Disconnect all previous connections to the server or shared resource and try again. \r\nAt line:1 char:190\r\n+ ... ser, $PWord;New-SmbGlobalMapping -RemotePath $Env:smbremotepath -Cred ...\r\n+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\r\n    + CategoryInfo          : NotSpecified: (MSFT_SmbGlobalMapping:ROOT/Microsoft/...mbGlobalMapping) [New-SmbGlobalMa \r\n   pping], CimException\r\n    + FullyQualifiedErrorId : Windows System Error 1219,New-SmbGlobalMapping\r\n \r\n", err: exit status 1
E0201 17:07:31.919044   14124 server.go:69] failed NewSmbGlobalMapping NewSmbGlobalMapping failed. output: "New-SmbGlobalMapping : Multiple connections to a server or shared resource by the same user, using more than one user \r\nname, are not allowed. Disconnect all previous connections to the server or shared resource and try again. \r\nAt line:1 char:190\r\n+ ... ser, $PWord;New-SmbGlobalMapping -RemotePath $Env:smbremotepath -Cred ...\r\n+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\r\n    + CategoryInfo          : NotSpecified: (MSFT_SmbGlobalMapping:ROOT/Microsoft/...mbGlobalMapping) [New-SmbGlobalMa \r\n   pping], CimException\r\n    + FullyQualifiedErrorId : Windows System Error 1219,New-SmbGlobalMapping\r\n \r\n", err: exit status 1
E0201 17:09:41.261347   14124 server.go:69] failed NewSmbGlobalMapping NewSmbGlobalMapping failed. output: "New-SmbGlobalMapping : Multiple connections to a server or shared resource by the same user, using more than one user \r\nname, are not allowed. Disconnect all previous connections to the server or shared resource and try again. \r\nAt line:1 char:190\r\n+ ... ser, $PWord;New-SmbGlobalMapping -RemotePath $Env:smbremotepath -Cred ...\r\n+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\r\n    + CategoryInfo          : NotSpecified: (MSFT_SmbGlobalMapping:ROOT/Microsoft/...mbGlobalMapping) [New-SmbGlobalMa \r\n   pping], CimException\r\n    + FullyQualifiedErrorId : Windows System Error 1219,New-SmbGlobalMapping\r\n \r\n", err: exit status 1
E0201 17:11:50.234415   14124 server.go:69] failed NewSmbGlobalMapping NewSmbGlobalMapping failed. output: "New-SmbGlobalMapping : Multiple connections to a server or shared resource by the same user, using more than one user \r\nname, are not allowed. Disconnect all previous connections to the server or shared resource and try again. \r\nAt line:1 char:190\r\n+ ... ser, $PWord;New-SmbGlobalMapping -RemotePath $Env:smbremotepath -Cred ...\r\n+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\r\n    + CategoryInfo          : NotSpecified: (MSFT_SmbGlobalMapping:ROOT/Microsoft/...mbGlobalMapping) [New-SmbGlobalMa \r\n   pping], CimException\r\n    + FullyQualifiedErrorId : Windows System Error 1219,New-SmbGlobalMapping\r\n \r\n", err: exit status 1

@willinwu
Copy link
Author

willinwu commented Feb 3, 2021

Seems work well after I changed the storageclass source from "//docp-smb1/smbservice" to ip like "//10...*/smbservice".
Help by this link: https://docs.microsoft.com/en-us/troubleshoot/windows-server/networking/cannot-connect-to-network-share

Not sure it's the root cause.

@willinwu willinwu closed this as completed Feb 3, 2021
@willinwu
Copy link
Author

willinwu commented Feb 3, 2021

After reboot, reproduced.
Restore the snapshot or change the sc (the remote share path changed) will be useful. But not forever.
And delete the smb driver and create again is no use.
The log is just like before.

Seems like something is cached on the windows worker node.
I just want to know if it can come back by run some command on the windows worker node.

@willinwu willinwu reopened this Feb 3, 2021
@andyzhangx
Copy link
Member

After reboot, reproduced.
Restore the snapshot or change the sc (the remote share path changed) will be useful. But not forever.
And delete the smb driver and create again is no use.
The log is just like before.

Seems like something is cached on the windows worker node.
I just want to know if it can come back by run some command on the windows worker node.

to workaround, run following commands on the agent node:

  • Get-SmbGlobalMapping to get the SMB mapping
  • Remove-SmbGlobalMapping to remove the existing mapping

@Mohkam
Copy link

Mohkam commented Feb 8, 2021

Would it be safe to remove Smb Global Mapping ? I am afraid this will cause data loss for to the pod already using this mount.

@andyzhangx

This comment has been minimized.

@andyzhangx
Copy link
Member

andyzhangx commented Feb 17, 2021

Update:
It's more like a Windows bug, after reboot, mount a different smb share will report following error, will reach to windows team for help, stay tuned.

Microsoft Windows [Version 10.0.17763.1697]
(c) 2018 Microsoft Corporation. All rights reserved.

azureuser@2892k8s001 C:\Users\azureuser>powershell
Windows PowerShell
Copyright (C) Microsoft Corporation. All rights reserved.

PS C:\Users\azureuser> Get-SmbGlobalMapping

Status Local Path Remote Path
------ ---------- -----------
OK                \\20.xx.xx.56\share\test
OK                \\20.xx.xx.56\share


PS C:\Users\azureuser> $User = "AZURE\USERNAME"
PS C:\Users\azureuser> $PWord = ConvertTo-SecureString -String "PASSWORD" -AsPlainText -Force
PS C:\Users\azureuser> $Credential = New-Object –TypeName System.Management.Automation.PSCredential –ArgumentList $User, $Pword
PS C:\Users\azureuser> New-SmbGlobalMapping -RemotePath \\20.xx.xx.56\share\test\a -Credential $Credential -Persistent $true
New-SmbGlobalMapping : Multiple connections to a server or shared resource by the same user, using more than one user name, are not allowed. Disconnect all
previous connections to the server or shared resource and try again.
At line:1 char:1
+ New-SmbGlobalMapping -RemotePath \\20.xx.xx.56\share\test\a -Credenti ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (MSFT_SmbGlobalMapping:ROOT/Microsoft/...mbGlobalMapping) [New-SmbGlobalMapping], CimException
    + FullyQualifiedErrorId : Windows System Error 1219,New-SmbGlobalMapping

@andyzhangx andyzhangx changed the title The smb volume can't be mounted to the windows pod The smb volume can't be mounted to the windows pod after reboot Feb 19, 2021
@andyzhangx
Copy link
Member

andyzhangx commented Feb 19, 2021

Update:
With help of windows team, I found that -Persistent value is true by default in New-SmbGlobalMapping command, that’s the reason why New-SmbGlobalMapping does not work after reboot, details:
"
Once the persistent credentials are saved into credman, they are no longer visible to the RDR. So after reboot, it will look like a connection without any explicit credentials even though internally authentication is done using the credentials saved in credman. Now, when you re-do the mapping again with the same explicit credentials, we do a credential check against the already existing mapping. The credential check will fail because the credentials used for the original mapping are not visible anymore and consequently won’t match the incoming credentials.
"

Change this value -Persistent to $false could solve this issue while it introduce another issue: original connection would be invalid after reboot.

  • below are the error if change -Persistent to $false in csi-proxy
  Type     Reason        Age                    From               Message
  ----     ------        ----                   ----               -------
  Normal   Scheduled     9m32s                  default-scheduler  Successfully assigned default/busybox-smb5-0 to 2892k8s001
  Normal   Pulled        9m28s                  kubelet            Container image "e2eteam/busybox:1.29" already present on machine
  Normal   Created       9m28s                  kubelet            Created container busybox-smb
  Normal   Started       9m26s                  kubelet            Started container busybox-smb
  Warning  NodeNotReady  7m6s                   node-controller    Node is not ready
  Warning  FailedMount   5m51s (x8 over 6m55s)  kubelet            MountVolume.MountDevice failed for volume "pvc-ec8292a2-584c-4716-af95-f589cdaa5f66" : kubernetes.io/csi: attacher.MountDevice failed to create newCsiDriverClient: driver name smb.csi.k8s.io not found in the list of registered CSI drivers
  Warning  FailedMount   2m12s                  kubelet            Unable to attach or mount volumes: unmounted volumes=[persistent-storage], unattached volumes=[default-token-9g9cl persistent-storage]: timed out waiting for the condition
  Warning  FailedMount   43s (x3 over 4m47s)    kubelet            MountVolume.MountDevice failed for volume "pvc-ec8292a2-584c-4716-af95-f589cdaa5f66" : kubernetes.io/csi: attacher.MountDevice failed to create dir "\\var\\lib\\kubelet\\plugins\\kubernetes.io\\csi\\pv\\pvc-ec8292a2-584c-4716-af95-f589cdaa5f66\\globalmount":  mkdir \var\lib\kubelet\plugins\kubernetes.io\csi\pv\pvc-ec8292a2-584c-4716-af95-f589cdaa5f66\globalmount: Cannot create a file when that file already exists.
  • Workaround to prevent this error

Only use root share for one SMB server in one cluster and use subPath in deployment, e.g. in this project example, set \\20.xx.xx.56\share as source

I will add notion in the windows example doc, there is no fix for this issue currently. Thanks.

  • Mitigation if hit Multiple connections to a server or shared resource by the same user error

log on to the Windows node, run Get-SmbGlobalMapping to list all mappings, run Remove-SmbGlobalMapping -RemotePath xxx to remove existing mapping, after a while, pod remount would succeed automatically

@andyzhangx
Copy link
Member

added a doc PR here: #231, this issue won't be fixed in short term

@andyzhangx andyzhangx added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Feb 20, 2021
@WangMosquito
Copy link

before start kubelet Remove-SmbGlobalMapping -Force

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 31, 2021
@Mohkam
Copy link

Mohkam commented May 31, 2021

/remove-lifecycle stale

@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 30, 2021
@k8s-triage-robot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@andyzhangx
Copy link
Member

csi-proxy v1.1.2 fix this issue with PR: kubernetes-csi/csi-proxy#210

andyzhangx added a commit to andyzhangx/csi-driver-smb that referenced this issue Aug 11, 2023
670bb0ef1 Merge pull request kubernetes-csi#229 from marosset/fix-codespell-errors
35d5e783c Merge pull request kubernetes-csi#219 from yashsingh74/update-registry
63473cc96 Merge pull request kubernetes-csi#231 from coulof/bump-go-version-1.20.5
29a5c76c7 Merge pull request kubernetes-csi#228 from mowangdk/chore/adopt_kubernetes_recommand_labels
8dd28211b Update cloudbuild image with go 1.20.5
1df23dba6 Merge pull request kubernetes-csi#230 from msau42/prow
1f92b7e7c Add ginkgo timeout to e2e tests to help catch any stuck tests
2b8b80ead fixing some codespell errors
c10b67804 Merge pull request kubernetes-csi#227 from coulof/check-sidecar-supported-versions
72984ec0a chore: adopt kubernetes recommand label
b05553510 Header
bd0a10b65 typo
c39d73c33 Add comments
f6491af0e Script to verify EOL sidecar version
4133d1df0 Merge pull request kubernetes-csi#226 from msau42/cloudbuild
8d519d237 Pin buildkit to v0.10.6 to workaround v0.11 bug with docker manifest
6e04a0301 Merge pull request kubernetes-csi#224 from msau42/cloudbuild
26fdfffdd Update cloudbuild image
6613c3980 Merge pull request kubernetes-csi#223 from sunnylovestiramisu/update
0e7ae993d Update k8s image repo url
77e47cce8 Merge pull request kubernetes-csi#222 from xinydev/fix-dep-version
155854b09 Fix dep version mismatch
8f839056a Merge pull request kubernetes-csi#221 from sunnylovestiramisu/go-update
1d3f94dd5 Update go version to 1.20 to match k/k v1.27
901bcb5a9 Update registry k8s.gcr.io -> registry.k8s.io

git-subtree-dir: release-tools
git-subtree-split: 670bb0ef135a53be44643cc34440eff22ad3ac8c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

No branches or pull requests

7 participants