NFS server soft-locks on attempting to umount a snapshot from which client copied dirs #5853

bokkiedog · 2017-03-02T10:37:58Z

System information

Type	Version/Name
Distribution Name	Debian GNU/Linux
Distribution Version	8 (Jessie)
Linux Kernel	3.16.0-4-amd64 #1 SMP Debian 3.16.39-1+deb8u1 (2017-02-22) x86_64 GNU/Linux
Architecture	x86_64
ZFS Version	v0.6.5.9-2~bpo8+1
SPL Version	v0.6.5.9-1~bpo8+1

Describe the problem you're observing

On my proxmox NFS-server VM, copying files from a .zfs snapshot hierarchy on a NFS-client VM means that the NFS server's kernel soft-locks on the umount when you try to destroy that snapshot, even after exportfs -f;
it destroys successfully on hard-reboot.

Describe how to reproduce the problem

Create a ZFS dataset on an NFS server (within an encapsulating dataset in my case)
Create a ZFS snapshot of that new dataset called 'snapped'
Copy some dirs and files from the .zfs/snapshot/snapped/ directory on an NFS client connected to that server.
Try to destroy that snapshot on the server - even after exportfs -f

The kernel soft-locks on attempting to umount the snapshot, and the vm eventually grinds to a halt.

I include here a script which reliably triggers this pathological behaviour on my system. I have renamed some of the paths and servers in the paste here for clarity:

#!/bin/bash

# This script crashes a proxmox VM running an NFS server

# The full name of the dataset to create
DATASET=pool/test

# The mountpoint (on my system, it's the same as the dataset name)
MNT=/$DATASET

# The full name of the snapshot for the dataset
SNAPSHOT=$DATASET@today

# The full-path to the test directory hierarchy to create
TEST_DIR=$MNT/contents/foo

# The command with which to ssh to the server with the nfs-client mount
NFS_CLIENT_SSH=user@nfs-client-local

echo ** Create the dataset
zfs create $DATASET

echo ** Create directories and files within
mkdir -p $TEST_DIR
touch $TEST_DIR/bar $TEST_DIR/baz
chmod 777 $MNT/.

echo ** Make a snapshot
zfs snapshot $SNAPSHOT

echo ** SSH to the client and ask it to copy data from the snapshot
ssh $NFS_SSH cp -a $MNT/.zfs/snapshot/$SNAP/contents  /$MNT/restored

echo ** Destroy the snapshot
exportfs -f
zfs destroy $SNAPSHOT

# The command never completes, with the umount entering D-state forever
# and the kernel soft-locking
#
# On hard rebooting, the above destroy command DOES work

Include any warning/errors/backtraces from the system logs

The running process-list once my script has stalled trying to umount the snapshot (using my actual path and hostnames, not the simpler examples in the test-script above):


$ ps rauxwww
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root        11  0.0  0.0      0     0 ?        R    10:19   0:00 [watchdog/1]
root        12  0.0  0.0      0     0 ?        R    10:19   0:00 [migration/1]
root        28  0.0  0.0      0     0 ?        R    10:19   0:00 [kworker/1:1]
root      3497  0.0  0.1  35320  3300 pts/0    D+   10:25   0:00 zfs destroy data/shardstor01/home/testcrash snapped
root      3499  1.6  0.1  19856  2336 ?        R    10:25   0:00 umount -t zfs -fn /data/shardstor01/home/testcrash/.zfs/snapshot/snapped

Message from syslogd@tpdev-nfs at Mar  2 10:26:07 ...
 kernel:[  412.048005] BUG: soft lockup - CPU#1 stuck for 22s! [umount:3499]

I have attached a kern.log that has everything from boot to the cycling soft-lock message..

broken_kern_log.txt

The text was updated successfully, but these errors were encountered:

bokkiedog · 2017-03-02T11:59:18Z

( BTW, is this related to this issue #5810 - should I attempt the patch there, or is this new? )

tuxoko · 2017-03-02T19:44:34Z

@bokkiedog
It's the same, you should try the patch, and post further update on #5810

tuxoko closed this as completed Mar 2, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NFS server soft-locks on attempting to umount a snapshot from which client copied dirs #5853

NFS server soft-locks on attempting to umount a snapshot from which client copied dirs #5853

bokkiedog commented Mar 2, 2017 •

edited

Loading

bokkiedog commented Mar 2, 2017

tuxoko commented Mar 2, 2017

NFS server soft-locks on attempting to umount a snapshot from which client copied dirs #5853

NFS server soft-locks on attempting to umount a snapshot from which client copied dirs #5853

Comments

bokkiedog commented Mar 2, 2017 • edited Loading

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

bokkiedog commented Mar 2, 2017

tuxoko commented Mar 2, 2017

bokkiedog commented Mar 2, 2017 •

edited

Loading