Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filesystems appear simultaneously mounted and unmounted - can't export pool, unmount, mount, or see files. #9082

Closed
ttelford opened this issue Jul 26, 2019 · 11 comments
Labels
Status: Stale No recent activity for issue Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@ttelford
Copy link

ttelford commented Jul 26, 2019

System information

Type Version/Name
Distribution Name Debian
Distribution Version Unstable ("Sid")
Linux Kernel 4.19.0-5-amd64 #1 SMP Debian 4.19.37-6 (2019-07-18)
Architecture x86_64
ZFS Version 0.8.1-3 (Debian Package)
SPL Version 0.8.1-3 (Debian Package)

Describe the problem you're observing

I have six pools, and 66 filesystems. All was working fine prior to updating to 0.8.1-3 (Debian Packages)

After upgrading, zfs mount -a reports all as mounted. However, they aren't. I'll try to explain:

The pool is named "burp", and has five filesystems. The pool is for backups; stopping the burp dæmon leaves nothing holding open files/directories:

burp
 `|-sluggo
  |-pilot
  |-firefairy
  `-testclient
  • mount -a reports no errors.
  • Visiting the mount location shows no files/directories there -- as if the disk is unmounted.
  • df -h | grep burp shows only burp/pilot as being mounted.
$ df -h | grep burp
burp/pilot	832G	128K	823G	1%	/var/spool/burp/pilot
  • mount and /proc/mounts shows that all of the filesystems are mounted.
$ cat /proc/mounts | grep burp
burp/sluggo /var/spool/burp/sluggo zfs rw,xattr,noacl 0 0
burp/ /var/spool/burp zfs rw,xattr,noacl 0 0
burp/pilot /var/spool/burp/pilot zfs rw,xattr,noacl 0 0
burp/firefairy /var/spool/burp/firefairy zfs rw,xattr,noacl 0 0
burp/testclient /var/spool/burp/testclient zfs rw,xattr,noacl 0 0

After verifying nothing is using the mounts, I tried to export the pool:

$ zpool export burp
umount: /var/spool/burp/testclient: not mounted.
cannot unmount '/var/spool/burp/testclient': umount failed.

Attempting to unmount any of the filesystems fails:

$ zpool umount burp/pilot
cannot unmount '/var/spool/burp/pilot': unmount failed

The mounting state appears to be inconsistent: /proc/mounts and the mount command shows the filesystems as mounted, while df and even zpool export shows the filesystem is not mounted.

When I manually export (and disconnect) two of the other pools, and then reboot, the burp pool mounts and works correctly.

I then attached the two disconnected pools and imported them. The filesystems on the 'most recently imported' pool then seem to have the issue. It appears like there's some sort of maximum pool or filesystem limit that's being tickled somehow...

Describe how to reproduce the problem

All I have to do is boot the system, and mount enough ZFS filesystems. (I'm not sure if the pool count is significant, or if it's really the only problem.)

Include any warning/errors/backtraces from the system logs

Unfortunately, there are no log messages in /var/log/syslog, nothing visible in dmesg. I'm happy to help get logging information, if you tell me what to do...

@ttelford ttelford changed the title Filesystems are both mounted and unmounted - can't export pool. Filesystems are both mounted and unmounted - can't export pool or read files. Jul 26, 2019
@ttelford ttelford changed the title Filesystems are both mounted and unmounted - can't export pool or read files. Filesystems are both mounted and unmounted - can't export pool or see files. Jul 26, 2019
@ttelford ttelford changed the title Filesystems are both mounted and unmounted - can't export pool or see files. Filesystems appear simultaneously mounted and unmounted - can't export pool, unmount, mount or see files. Jul 26, 2019
@ttelford ttelford changed the title Filesystems appear simultaneously mounted and unmounted - can't export pool, unmount, mount or see files. Filesystems appear simultaneously mounted and unmounted - can't export pool, unmount, mount, or see files. Jul 26, 2019
@kusumi
Copy link
Member

kusumi commented Jul 29, 2019

could be a parallel mount bug fixed in master ab5036d, take a look at
#8450
#8833
#8878

@kusumi
Copy link
Member

kusumi commented Jul 29, 2019

could be a parallel mount bug fixed in master ab5036d

btw, I could only reproduce this on Linux, but recently found this had been reported in FreeBSD as well.

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=237517
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=237397
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=239243

@behlendorf behlendorf added the Type: Defect Incorrect behavior (e.g. crash, hang) label Jul 30, 2019
@ttelford
Copy link
Author

I can reproduce it on Linux easily, with Debian's kernel 5.3.0-3-amd64, and Debian's zfs 0.8.2-5 package(s). I get a kernel General Protection Fault in one of the zfs modules (can't recall which one offhand) any time I run zpool mount -a (or when filesystems are mounted at boot, which I assume also runs some variant of zpool mount -a.)

Is there any additional information I can provide?

@behlendorf
Copy link
Contributor

@ttelford if possible can you upgrade to the zfs-0.8.3-1 packages provided by Debian and confirm if this is still an issue.

@ttelford
Copy link
Author

@behlendorf just updated to 0.8.3, and can confirm the problem still exists. Reliably, over several reboots.

And I still get a General protection failure from the kernel, in the zfs module, every time I run zfs mount -a

As long as I avoid the -a option to mount, I have no problems.

@ttelford
Copy link
Author

Addition: the behavior is slightly different.

The filesystems show as unmounted by both df and mount. However, in spite of that, I am unable to export the pool. If I manually try to zfs umount the filesystems (even though df and mount show them as unmounted), I will get an “unable to unmount” message.

@stellarpower
Copy link

stellarpower commented Jan 19, 2021

I'm also getting this on my system (Manjaro). Happy to get more/more precise info when I can, please let me know what would be required:

archie@archie ~> zfs list -r tank/FS #The target filesystem with the issue (I have two at this time)
tank/FS                               37.7M  42.8G     37.7M  /home/archie/FS
archie@archie ~ [2]> mount | grep FS
tank/FS on /home/archie/FS type zfs (rw,nodev,relatime,xattr,posixacl)
archie@archie ~> ls -la /home/archie/FS
ls: cannot access '/home/archie/FS': No such file or directory

#################################################################################

archie@archie ~> sudo zfs mount tank/FS 
cannot mount 'tank/FS': filesystem already mounted
archie@archie ~ [1]> ls -la /home/archie/FS
ls: cannot access '/home/archie/FS': No such file or directory
archie@archie ~ [2]> sudo zfs unmount tank/FS
cannot unmount '/home/archie/FS': unmount failed # I have previously had the message as above - 'filesystem not mounted'
archie@archie ~ [1]> sudo zfs unmount tank/FS
cannot unmount '/home/archie/FS': unmount failed
archie@archie ~ [1]> ls -la /home/archie/FS
ls: cannot access '/home/archie/FS': No such file or directory
archie@archie ~ [2]> sudo zfs mount tank/FS
cannot mount 'tank/FS': filesystem already mounted
archie@archie ~ [1]> sudo zfs unmount tank/FS
cannot unmount '/home/archie/FS': unmount failed

##################################################################################

archie@archie ~> df -h | grep -i FS
archie@archie ~ [0|1]> mount | grep -i FS
tank/FS on /home/archie/FS type zfs (rw,nodev,relatime,xattr,posixacl)
archie@archie ~> cat  /proc/self/mounts | grep -i FS
tank/FS /home/archie/FS zfs rw,nodev,relatime,xattr,posixacl 0 0

#################################################################

archie@archie ~ [2]> uname -a
Linux archie 5.9.16-1-MANJARO #1 SMP PREEMPT Mon Dec 21 22:00:46 UTC 2020 x86_64 GNU/Linux

archie@archie ~> lsmod | egrep -i 'spl|zfs'
spl                   118784  6 zfs,icp,zzstd,znvpair,zcommon,zavl
zfs                  4554752  31

archie@archie ~> pacman -Q | grep zfs
linux59-zfs 2.0.0-6
zfs-utils 2.0.0-2
archie@archie ~> pacman -Q linux
linux59 5.9.16-1

archie@archie ~> zfs --version
zfs-2.0.0-1
zfs-kmod-2.0.0-1

archie@archie ~> modinfo spl
filename:       /lib/modules/5.9.16-1-MANJARO/extramodules/spl.ko.gz
version:        2.0.0-1
license:        GPL
author:         OpenZFS
description:    Solaris Porting Layer
srcversion:     335A04BC269DBC60BA98447
depends:        
retpoline:      Y
name:           spl
vermagic:       5.9.16-1-MANJARO SMP preempt mod_unload modversions 

...

archie@archie ~> modinfo zfs
filename:       /lib/modules/5.9.16-1-MANJARO/extramodules/zfs.ko.gz
version:        2.0.0-1
license:        CDDL
author:         OpenZFS
description:    ZFS
alias:          devname:zfs
alias:          char-major-10-249
srcversion:     06C609BFC1AF1A3CF1BA6BA
depends:        spl,znvpair,icp,zlua,zzstd,zunicode,zcommon,zavl
retpoline:      Y
name:           zfs
vermagic:       5.9.16-1-MANJARO SMP preempt mod_unload modversions 

...

This seems somewhat serious to me as a user, in that this data is totally inaccessible to me at the moment, and that I'm also presuming, in a state-machine manner of thinking, mounted and unmounted should be the only two options, with both mutually exclusive.

I'm also slightly confused on the version numbers, whilst I'm here - my Mint system, previously stuck on the Bionic package base, was on ZoL version 0.7.something, and following the move to a Disco base, was then 0.8, in line with the Debian packages above. However the output is reporting 2.0. Has the project really moved through two major releases since then? Or is this caused by a change between the upstream versioning from illumos-gate vs. the Linux sources?

@ttelford
Copy link
Author

ttelford commented Mar 28, 2021

In my case, I found what was causing the issue -- though I'm not sure it isn't still a bug.

My issues were on the burp zpool (as documented above)

I had manually set some mountpoint for the burp pool. The parent/top-level zfs filesystem (ie. the zpool root) had a trailing backslash in the mountpoint. By repairing that issue (which required unmounting every filesystem in the pool, as the problem was at the top level), everything mounts fine now.

@stellarpower: Can you verify whether or not you have a trailing backslash in your mountpoints?

zfs get mountpoint <array> would be helpful here.

@stellarpower
Copy link

I can get that output for you next time I'm booted, I'll make a note, however, for this system, the root dataset shouldn't have a mountpoint, I have a boot pool and a root pool and it's only a few layers in they actually are mounted at all.

@stale
Copy link

stale bot commented Mar 31, 2022

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Mar 31, 2022
@stale stale bot closed this as completed Jul 10, 2022
@kyle0r
Copy link

kyle0r commented May 7, 2024

The parent/top-level zfs filesystem (ie. the zpool root) had a trailing backslash in the mountpoint.

👆 This trailing slash issue/situation doesn't exist for the scenario I'm about to share. Good to know though!

pveversion
pve-manager/8.1.3/b46aac3b42da5d15 (running kernel: 6.5.11-7-pve)
zfs -V
zfs-2.2.2-pve1
zfs-kmod-2.2.2-pve1

FWIW I have a potential explanation for a similar scenario, with seeming very close result/issue being reported in this thread.

Consider the following filesystem datasets:

storeN/data
storeN/data/vm
storeN/data/vm/raw

Consider the following properties

NAME                PROPERTY    VALUE                SOURCE
storeN/data         name        storeN/data          -
storeN/data         mounted     yes                  -
storeN/data         mountpoint  /storeN/data         default
storeN/data/vm      name        storeN/data/vm       -
storeN/data/vm      mounted     yes                  -
storeN/data/vm      mountpoint  /storeN/data/vm      default
storeN/data/vm/raw  name        storeN/data/vm/raw   -
storeN/data/vm/raw  mounted     yes                  -
storeN/data/vm/raw  mountpoint  /storeN/data/vm/raw  default

storeN/data is the encryption root but I don't think that is relevant other than clarifying why these mountpoints are not automatically mounted at boot.

After booting I can mount storeN/data/vm/raw without issues. This is totally valid until I wish to mount one of the parent datasets...

If I later wish to mount storeN/data/vm, I can, but I note that the /storeN/data/vm/raw hierarchy is now empty. The same if I then mount storeN/data the child dataset hierarchy is empty.

I would suppose this is the expected behaviour? In effect I've mounted over the top of the child dataset(s). It does seem to confuse ZFS though, as follows:

Trying to zfs unmount storeN/data/vm OR zfs unmount storeN/data does not work because its fails unmounting storeN/data/vm/raw and so on. Cannot export either due to the same. The only solution I found is a reboot.

I'm not sure if there's anything the developers can do here, but it might help someone in the future who's in the same situation as me and wonders what's going on.

The solution for me is a little loop, something like:

for fs in $(zfs list -r -t filesystem -o name -H store{1..6}/data sas/data); do echo mounting "${fs}"; zfs mount -l ${fs}; done

This loop walks the <pool>/data hierarchy of 7 pools and recursively mounts the datasets with the deepest dataset being mounted last. For example, then the .zfs dir works as expected at each dataset hierarchy level.

Hope that helps someone in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Stale No recent activity for issue Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

5 participants