Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Healthy Zpool unmountable due to error 1 (archzfs 0.6.5.7_4.5.4_1-1) #4729

Closed
Celmor opened this issue Jun 4, 2016 · 4 comments
Closed

Healthy Zpool unmountable due to error 1 (archzfs 0.6.5.7_4.5.4_1-1) #4729

Celmor opened this issue Jun 4, 2016 · 4 comments
Milestone

Comments

@Celmor
Copy link

Celmor commented Jun 4, 2016

Hello, I have a bit of a problem with ZFS.
I created a mirrored Zpool on 2 LUKS volumes, after exporting, reboot and importing it, it reports:

[arch]$ sudo zpool import -f  Data
filesystem 'Data' can not be mounted due to error 1
cannot mount 'Data': Invalid argument

Status:

[arch]$ sudo zpool status
   pool: Data
     id: 2751875717627853497
  state: ONLINE
 status: The pool was last accessed by another system.
 action: The pool can be imported using its name or numeric identifier and
    the '-f' flag.
   see: http://zfsonlinux.org/msg/ZFS-8000-EY
 config:

    Data        ONLINE
      mirror-0  ONLINE
        Data1   ONLINE
        Data2   ONLINE

While on Manjaro:

[manjaro]$ sudo zpool import -f Data
cannot share 'Data': smb add share failed
[manjaro]$ sudo zpool status
  pool: Data
  state: ONLINE
 status: The pool was last accessed by another system.
  scan: resilvered 0 in 0h0m with 0 errors on Mon May 23 22:03:08 2016
config:

        NAME        STATE     READ WRITE CKSUM
        Data        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            Data1   ONLINE       0     0     0
            Data2   ONLINE       0     0     0

errors: No known data errors

I downgraded now to the LTS / version for 4.4.11 of ZFS (from archzfs repo) and linux (kernel) which made the zpool mountable again.

Setup Info:
system specs
Manjaro, Arch
Zpool properties (After rename attempt), Zpool status

troubleshooting steps I took:

  • checking zpool/zfs properties (e.g. for canMount flag)
  • checking zfs-related packages for up-to-date versions
  • zpool scrub
  • zpool upgrade
  • reducing fstab to just mounting root and /boot
  • rename
  • checking dmesg (no relevant info found)
  • stracing mount command (problem not found)
  • created another zpool the same way on different volumes
    • no problems / error not reproducable
  • zdb Data

chronological steps I took before the zpool wasn't mountable under arch (4.5.*) anymore:

  • Creating Luks-formatted drives
  • Creating mirrored zpool on both drives (automatically mounted zpool to /Data)
  • copying Data into the zpool
  • unmounting zpool
  • exporting zpool
  • removing luks encryption mappings (luksClose)
  • reboot
  • mounting encryption mappings (luksOpen)
  • importing zpool
  • error

-> for more details see 'complete log'

log of commands from creation of zpool (and underlying LUKS volumes) till mount failure:
complete log incl. strace
zdb Data (very large text file)
Guide I followed (mostly): ZFS on Linux with LUKS encrypted disks | make then make install

As you can see there are no signs of corruption, errors or any other reported problems, just the inability of mounting them. As "scrub" and "zdb Data" successfully completes (and latter outputes files) I'm think there's nothing wrong with the zpool and data is readable, I could mount zpools without issues on with multiple test zpools with same name and scripts for mounting/unmounting.
Only other search result fore someone having Error 1 was also a guy with Arch but he had an EPERM in his mount strace which resulted the error 1, I don't

Conclusion
not mountable:
archzfs/zfs-linux & its dependencies
mountable:
archzfs/zfs-linux-lts & its dependencies
local/spl-utils 0.6.5.7-1 (manjarozfs) & its dependencies
For now I have to stay on linux-lts and tolerate various other packages and ALSA (sound) not working anymore because of the downgrade to be able to access ZFS.

More info in this subreddit post where I posted this problem before here with suggested solutions that also didn't work.

@Celmor
Copy link
Author

Celmor commented Oct 21, 2016

Update

With help from ryao on the freenode IRC channel #zfsonlinux I could figure out the cause of the "filesystem 'Data' can not be mounted due to error 1" / "Invalid Argument" - Error.
nbmand=1 property was set for 'Data' which wasn't supported after Linux kernel version 4.4. My history-log didn't show nbmand got enabled because I didn't use it with the -i parameter, which causes history to additionally show 'internally logged ZFS events in addition to user initiated events'.

Possible Solutions:

  • Either use Kernel 4.4,
  • wait for a Patch for ZFS
  • set nbmand=0,
  • use mount -t zfs Data /Data instead of zfs mount Data whereas Data is the dataset name.

troubleshooting steps I took:

# zpool import -fa
filesystem 'Data' can not be mounted due to error 1
cannot mount 'Data': Invalid argument
# capsh --print | grep --color cap_sys_admin >/dev/null && echo success
success
# dmesg | grep SELinux || echo not found
not found
#  strace -F mount -t zfs -i Data /Data > strace.log && umount /Data && echo success
success
# mount -o zfsutil -t zfs Data /Data && umount /Dat && echo success
# wget -O execsnoop.sh https://raw.githubusercontent.com/brendangregg/perf-tools/master/execsnoop
# chmod +x execsnoop.sh
# zfs mount Data
$ cat execsnoop
...
  8621   7970 zfs mount Data
  8622   8621 /bin/mount -t zfs -o defaults,atime,dev,exec,rw,suid,mand,zfsutil Data /Data
  8623   8622 /sbin/mount.zfs Data /Data
...
# /bin/mount -t zfs -o mand,zfsutil Data /Data
filesystem 'Data' can not be mounted due to error 1
# zfs get all Data | grep mand
Data  nbmand                on                     local
# zpool history -i | grep nbmand
2016-05-21.17:05:25 [txg:31295] set Data (21) nbmand=1
# zfs set nbmand=off Data
# zfs mount Data && echo success
success

As can be seen above zfs mount Data executes /bin/mount -t zfs -o defaults,atime,dev,exec,rw,suid,mand,zfsutil Data /Data which fails whereas an attempt without -o parameters or just -o zfsutil succeeds. Attempting to include the parameter -o mand makes the same command fail, therefore the mand - or as zfs calls it nbmand feature - caused the mount to fail. Further research concluded that this feature doesn't work in any newer kernel than 4.4,* which caused my experience of not being able to mount datasets in Data with any newer kernel than 4.4.*.
The questions how this feature got enabled in the first place (as leaving out -i when using history didn't show it it wasn't user initiated), why zfs didn't resolve this problems by for example disabling that feature when using a kernel which doesn't support it (but maybe outputting a warning) and why that feature support got removed remain open.

@JoKoT3
Copy link

JoKoT3 commented Nov 9, 2016

Thanks for opening an issue and finding the root cause.
I encountered the same problem some month ago on archlinux and worked around it by setting mountpoint to legacy and mounting manually (as you described).

Thanks again for taking the time to document this.

@behlendorf behlendorf added this to the 0.7.0 milestone Nov 9, 2016
@welbers
Copy link

welbers commented Apr 29, 2017

You saved my day!! Since kernel version 4.9 is the latest LTS I wasn't able to mount my zfs filesystem and couldn't find the reason.

nbmand=off was the hot-fix solution.

Thx so much

behlendorf added a commit to behlendorf/zfs that referenced this issue Jun 7, 2017
Commit torvalds/linux@9e8925b6 allowed for kernels to be built
without support for mandatory locking (MS_MANDLOCK).  This will
result in 'zfs mount' failing when the nbmand=on property is set
and the kernel is build without CONFIG_MANDATORY_FILE_LOCKING.

Unfortunately we can't detect prior to the mount(2) system call
if the kernel was built with this support.  The best we can do
is check if the mount failed with EPERM and if we passed 'mand'
as a mount option and print a more useful error message.  e.g.

  filesystem 'tank' has the 'nbmand=on' property set, this mount
  option may be disabled in your kernel.  Use 'zfs set nbmand=off'
  to disable this option and try to mount the filesystem again.

Additionally, switch the default error message case to use
strerror() to produce a more human readable message.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#4729
@behlendorf
Copy link
Contributor

I had a chance to look in to this and the best we can do is print a more useful error message in this case. Commit torvalds/linux@9e8925b allows for kernels to be built without support for mandatory file locking (CONFIG_MANDATORY_FILE_LOCKING) and distributions are now disabling it in their kernels. That means the options are rebuild your kernel with this option enabled, or set nbmand=off on your datasets.

I've opened #6199 with a small patch to add a useful error message for this case. e.g.

filesystem 'tank' has the 'nbmand=on' property set, this mount
option may be disabled in your kernel.  Use 'zfs set nbmand=off'
to disable this option and try to mount the filesystem again.

tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Jun 7, 2017
Commit torvalds/linux@9e8925b6 allowed for kernels to be built
without support for mandatory locking (MS_MANDLOCK).  This will
result in 'zfs mount' failing when the nbmand=on property is set
if the kernel is built without CONFIG_MANDATORY_FILE_LOCKING.

Unfortunately we can not reliably detect prior to the mount(2) system
call if the kernel was built with this support.  The best we can do
is check if the mount failed with EPERM and if we passed 'mand'
as a mount option and then print a more useful error message. e.g.

  filesystem 'tank/fs' has the 'nbmand=on' property set, this mount
  option may be disabled in your kernel.  Use 'zfs set nbmand=off'
  to disable this option and try to mount the filesystem again.

Additionally, switch the default error message case to use
strerror() to produce a more human readable message.

Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#4729
Closes openzfs#6199
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Jun 8, 2017
Commit torvalds/linux@9e8925b6 allowed for kernels to be built
without support for mandatory locking (MS_MANDLOCK).  This will
result in 'zfs mount' failing when the nbmand=on property is set
if the kernel is built without CONFIG_MANDATORY_FILE_LOCKING.

Unfortunately we can not reliably detect prior to the mount(2) system
call if the kernel was built with this support.  The best we can do
is check if the mount failed with EPERM and if we passed 'mand'
as a mount option and then print a more useful error message. e.g.

  filesystem 'tank/fs' has the 'nbmand=on' property set, this mount
  option may be disabled in your kernel.  Use 'zfs set nbmand=off'
  to disable this option and try to mount the filesystem again.

Additionally, switch the default error message case to use
strerror() to produce a more human readable message.

Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#4729
Closes openzfs#6199
tonyhutter pushed a commit that referenced this issue Jun 9, 2017
Commit torvalds/linux@9e8925b6 allowed for kernels to be built
without support for mandatory locking (MS_MANDLOCK).  This will
result in 'zfs mount' failing when the nbmand=on property is set
if the kernel is built without CONFIG_MANDATORY_FILE_LOCKING.

Unfortunately we can not reliably detect prior to the mount(2) system
call if the kernel was built with this support.  The best we can do
is check if the mount failed with EPERM and if we passed 'mand'
as a mount option and then print a more useful error message. e.g.

  filesystem 'tank/fs' has the 'nbmand=on' property set, this mount
  option may be disabled in your kernel.  Use 'zfs set nbmand=off'
  to disable this option and try to mount the filesystem again.

Additionally, switch the default error message case to use
strerror() to produce a more human readable message.

Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Giuseppe Di Natale <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #4729
Closes #6199
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants