You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On my proxmox NFS-server VM, copying files from a .zfs snapshot hierarchy on a NFS-client VM means that the NFS server's kernel soft-locks on the umount when you try to destroy that snapshot, even after exportfs -f;
it destroys successfully on hard-reboot.
Describe how to reproduce the problem
Create a ZFS dataset on an NFS server (within an encapsulating dataset in my case)
Create a ZFS snapshot of that new dataset called 'snapped'
Copy some dirs and files from the .zfs/snapshot/snapped/ directory on an NFS client connected to that server.
Try to destroy that snapshot on the server - even after exportfs -f
The kernel soft-locks on attempting to umount the snapshot, and the vm eventually grinds to a halt.
I include here a script which reliably triggers this pathological behaviour on my system. I have renamed some of the paths and servers in the paste here for clarity:
#!/bin/bash# This script crashes a proxmox VM running an NFS server# The full name of the dataset to create
DATASET=pool/test
# The mountpoint (on my system, it's the same as the dataset name)
MNT=/$DATASET# The full name of the snapshot for the dataset
SNAPSHOT=$DATASET@today
# The full-path to the test directory hierarchy to create
TEST_DIR=$MNT/contents/foo
# The command with which to ssh to the server with the nfs-client mount
NFS_CLIENT_SSH=user@nfs-client-local
echo** Create the dataset
zfs create $DATASETecho** Create directories and files within
mkdir -p $TEST_DIR
touch $TEST_DIR/bar $TEST_DIR/baz
chmod 777 $MNT/.
echo** Make a snapshot
zfs snapshot $SNAPSHOTecho** SSH to the client and ask it to copy data from the snapshot
ssh $NFS_SSH cp -a $MNT/.zfs/snapshot/$SNAP/contents /$MNT/restored
echo** Destroy the snapshot
exportfs -f
zfs destroy $SNAPSHOT# The command never completes, with the umount entering D-state forever# and the kernel soft-locking## On hard rebooting, the above destroy command DOES work
Include any warning/errors/backtraces from the system logs
The running process-list once my script has stalled trying to umount the snapshot (using my actual path and hostnames, not the simpler examples in the test-script above):
$ ps rauxwww
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 11 0.0 0.0 0 0 ? R 10:19 0:00 [watchdog/1]
root 12 0.0 0.0 0 0 ? R 10:19 0:00 [migration/1]
root 28 0.0 0.0 0 0 ? R 10:19 0:00 [kworker/1:1]
root 3497 0.0 0.1 35320 3300 pts/0 D+ 10:25 0:00 zfs destroy data/shardstor01/home/testcrash snapped
root 3499 1.6 0.1 19856 2336 ? R 10:25 0:00 umount -t zfs -fn /data/shardstor01/home/testcrash/.zfs/snapshot/snapped
Message from syslogd@tpdev-nfs at Mar 2 10:26:07 ...
kernel:[ 412.048005] BUG: soft lockup - CPU#1 stuck for 22s! [umount:3499]
I have attached a kern.log that has everything from boot to the cycling soft-lock message..
System information
Describe the problem you're observing
On my proxmox NFS-server VM, copying files from a .zfs snapshot hierarchy on a NFS-client VM means that the NFS server's kernel soft-locks on the umount when you try to destroy that snapshot, even after
exportfs -f
;it destroys successfully on hard-reboot.
Describe how to reproduce the problem
Create a ZFS dataset on an NFS server (within an encapsulating dataset in my case)
Create a ZFS snapshot of that new dataset called 'snapped'
Copy some dirs and files from the
.zfs/snapshot/snapped/
directory on an NFS client connected to that server.Try to destroy that snapshot on the server - even after
exportfs -f
The kernel soft-locks on attempting to umount the snapshot, and the vm eventually grinds to a halt.
I include here a script which reliably triggers this pathological behaviour on my system. I have renamed some of the paths and servers in the paste here for clarity:
Include any warning/errors/backtraces from the system logs
The running process-list once my script has stalled trying to umount the snapshot (using my actual path and hostnames, not the simpler examples in the test-script above):
I have attached a kern.log that has everything from boot to the cycling soft-lock message..
broken_kern_log.txt
The text was updated successfully, but these errors were encountered: