Allow mounting datasets more than once #7207

alek-p · 2018-02-21T19:34:13Z

This is a refresh of #7120
I've added a couple extra tests along with the test that is described below.
Let's see what buildbot thinks.

Currently mounting an already mounted zfs dataset results in an
error, whereas it is typically allowed with other filesystems.
This causes some bad interactions with mount namespaces. Take
this sequence for example:

Create a dataset
Create a snapshot of the dataset
Create a clone of the snapshot
Create a new mount namespace
Rename the original dataset

The rename results in unmounting and remounting the clone in the
original mount namespace, however the remount fails because the
dataset is still mounted in the new mount namespace. (Note that
this means the mount in the new mount namespace is never being
unmounted, so perhaps the unmount/remount of the clone isn't
actually necessary.)

The problem here is a result of the way mounting is implemented
in the kernel module. Since it is not mounting block devices it
uses mount_nodev() instead of the usual mount_bdev(). However,
mount_nodev() is written for filesystems for which each mount is
a new instance (i.e. a new super block), and zfs should be able
to detect when a mount request can be satisfied using an existing
super block.

Change zpl_mount() to call sget() directly with it's own test
callback. Passing the objset_t object as the fs data allows
checking if a superblock already exists for the dataset, and in
that case we just need to return a new reference for the sb's
root dentry.

Signed-off-by: Seth Forshee [email protected]
Closes #5796

Description

Motivation and Context

How Has This Been Tested?

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the ZFS on Linux code style requirements.
I have updated the documentation accordingly.
I have read the CONTRIBUTING document.
I have added tests to cover my changes.
All new and existing tests passed.
All commit messages are properly formatted and contain Signed-off-by.
Change has been approved by a ZFS on Linux member.

vozhyk- · 2018-02-21T20:17:03Z

tests/zfs-tests/tests/functional/cli_root/zfs_mount/zfs_lazy_umount_remount.ksh

+	return 0
+}
+
+log_assert "Verify recovery from a lazy unmount is possilbe"


Typo: possible.

vozhyk- · 2018-02-21T20:17:21Z

tests/zfs-tests/tests/functional/cli_root/zfs_mount/zfs_lazy_umount_remount.ksh

+
+log_must pkill -P $PPID tail
+
+log_assert "Recovering from a lazy unmount is possilbe"


log_pass?

vozhyk- · 2018-02-21T20:17:35Z

tests/zfs-tests/tests/functional/cli_root/zfs_mount/zfs_lazy_umount_remount.ksh

+
+log_must pkill -P $PPID tail
+
+log_assert "Recovering from a lazy unmount is possilbe"


Typo: possible.

vozhyk- · 2018-02-21T20:27:11Z

config/kernel.m4

@@ -98,7 +98,7 @@ AC_DEFUN([ZFS_AC_CONFIG_KERNEL], [
 	ZFS_AC_KERNEL_TRUNCATE_SETSIZE
 	ZFS_AC_KERNEL_6ARGS_SECURITY_INODE_INIT_SECURITY
 	ZFS_AC_KERNEL_CALLBACK_SECURITY_INODE_INIT_SECURITY
-	ZFS_AC_KERNEL_MOUNT_NODEV
+	ZFS_AC_KERNEL_FST_MOUNT


Fixed. ~~Looks like the .m4 file defining this is missing.~~

vozhyk- · 2018-02-21T20:36:25Z

Is this also going to allow multiple normal mounts of a dataset in a single namespace?

mount -t zfs -o zfsutil pool/ds /mnt/first
mount -t zfs -o zfsutil pool/ds /mnt/second

behlendorf · 2018-02-21T21:59:24Z

include/linux/vfs_compat.h

+#define	SB_SILENT	MS_SILENT
+#endif
+
+#ifndef SB_ACTIVE


nit: should be a tab.

behlendorf · 2018-02-21T22:06:47Z

module/zfs/zpl_super.c

+	if (IS_ERR(s))
+		return (ERR_CAST(s));
+
+	if (!s->s_root) {


nit: s->s_root == NULL.

behlendorf · 2018-02-21T22:31:46Z

tests/zfs-tests/tests/functional/cli_root/zfs_mount/zfs_multi_mount.ksh

+log_onexit cleanup
+
+# 1. Create fs
+TESTFS="$TESTPOOL/multi-mount-test"


Rather than redefine TESTFS, which is already defined in default.cfg, how about using the existing definitition and $TESTPOOL/$TESTFS. Or alternately TESTDS=$TESTPOOL/multi-mount-test to avoid the reusing the variable name which could otherwise be confusing.

behlendorf · 2018-02-21T22:33:16Z

tests/zfs-tests/tests/functional/cli_root/zfs_mount/zfs_multi_mount.ksh

+
+# 2. Create and hold open file in filesystem
+FILENAME="$MNTPFS/file"
+log_must dd if=/dev/urandom of=$FILENAME bs=128k count=1


mkfile is perhaps a simpler alternative here.

behlendorf · 2018-02-21T22:34:15Z

tests/zfs-tests/tests/functional/cli_root/zfs_mount/zfs_multi_mount.ksh

+TAILPPID=$!
+
+# 3. Lazy umount
+log_must umount -l $MNTPFS


Can you also check that it was removed from the namespace.

behlendorf · 2018-02-21T22:35:09Z

tests/zfs-tests/tests/functional/cli_root/zfs_mount/zfs_multi_mount.ksh

+# 5. Verify multiple mounts of the same dataset are possible
+log_must mkdir $MNTFS2
+log_must mount -t zfs zfsutil $TESTFS $MNTPFS2
+log_must mkdir $MNTFS2


You mean $MNTFS3 here.

behlendorf · 2018-02-21T22:35:37Z

tests/zfs-tests/tests/functional/cli_root/zfs_mount/zfs_multi_mount.ksh

+
+# 5. Verify multiple mounts of the same dataset are possible
+log_must mkdir $MNTFS2
+log_must mount -t zfs zfsutil $TESTFS $MNTPFS2


Should be -o zfsutil.

behlendorf · 2018-02-21T22:36:29Z

Is this also going to allow multiple normal mounts of a dataset in a single namespace?

@vozhyk- yes.

alek-p · 2018-02-22T00:02:47Z

I think I've addressed all of the review comments now

behlendorf · 2018-02-22T00:21:41Z

module/zfs/zpl_super.c

+
+static struct super_block *
+zpl_mount_impl(struct file_system_type *fs_type, int flags,
+    zfs_mnt_t zm)


nit: this fits on the previous line, and you should pass this as a zfs_mnt_t *.

behlendorf · 2018-02-22T00:28:05Z

module/zfs/zpl_super.c

 }
 #else
 static int
 zpl_get_sb(struct file_system_type *fs_type, int flags,
    const char *osname, void *data, struct vfsmount *mnt)
 {
 	zfs_mnt_t zm = { .mnt_osname = osname, .mnt_data = data };
-
-	return (get_sb_nodev(fs_type, flags, &zm, zpl_fill_super, mnt));
+	struct super_block *sb = zpl_mount_impl(zm);


Should be, zpl_mount_impl(fs_type, flags, &zm).

alek-p · 2018-02-22T01:46:58Z

thanks for taking another look, I've updated the patch with the latest feedback and I've also added the "bind mount, then rename" test from Seth's original PR

behlendorf · 2018-02-22T02:35:47Z

module/zfs/zpl_super.c

+	objset_t *os;
+	int err;
+
+	err = dmu_objset_hold(zm.mnt_osname, FTAG, &os);


s/zm./zm->/

behlendorf · 2018-02-22T02:36:00Z

module/zfs/zpl_super.c

+		return (ERR_CAST(s));
+
+	if (s->s_root == NULL) {
+		err = zpl_fill_super(s, &zm, flags & SB_SILENT ? 1 : 0);


alek-p · 2018-02-23T19:37:15Z

I think I've addressed all the build issues, not sure what's going on w/ the tests though.

behlendorf · 2018-02-23T19:41:03Z

@alek-p my guess is they all panicked while running the zpool_create test group. I'd suggest giving it a spin locally and see if you can reproduce the issue.

sudo ./scripts/zfs.sh
./scripts/zfs-tests.sh -vx -T zpool_create

alek-p · 2018-02-23T22:16:56Z

no panic, but zpool_create_024_pos is locking up the kernel. I'll look into it

behlendorf · 2018-02-27T17:10:34Z

module/zfs/zpl_super.c

 static struct dentry *
 zpl_mount(struct file_system_type *fs_type, int flags,
    const char *osname, void *data)
 {
 	zfs_mnt_t zm = { .mnt_osname = osname, .mnt_data = data };
-
-	return (mount_nodev(fs_type, flags, &zm, zpl_fill_super));
+	struct super_block *sb = zpl_mount_impl(fs_type, flags, &zm);


There needs to be an:

if (IS_ERR(sb)) return (PTR_ERR(sb));

check here to avoid NULL dereferencing sb->s_root in the next line. Otherwise this can happen.

thanks for all the help @behlendorf
I've added something similar here and in zpl_test_super(). I'm now seeing test failures in the cli_root group, looking into those.

alek-p · 2018-03-07T23:16:10Z

zpool_import test group has zpool_import_012_pos and zpool_import_rename_001_pos failing.
zpool_import_errata3 now seems to cause a NULL pointer dereference so will be looking into that first

tcaputi

In general I think this is correct. I am not an expert on the Linux kernel's mounting code, but I understand the fix and this seems reasonable. Just a few things I would like to see cleaned up before it gets merged.

tcaputi · 2018-03-17T07:09:27Z

module/zfs/zpl_super.c

+		}
+		s->s_flags |= SB_ACTIVE;
+	} else if ((flags ^ s->s_flags) & SB_RDONLY) {
+		return (ERR_PTR(-EBUSY));


Leaks locked super?

Good catch, yes we're missing a call to deactivate_locked_super here.

tcaputi · 2018-03-17T07:12:17Z

tests/zfs-tests/tests/functional/cli_root/zfs_mount/zfs_multi_mount.ksh

+log_must zfs rename $TESTDS $RENAMEFS
+log_must zfs rename $RENAMEFS $TESTDS
+
+log_pass "Multiple mounts are possible"


In sections 5 and 6 I would want a check for the created files as well (just to be sure).

behlendorf

@alek-p in addition to addressing @tcaputi's comment when you refreshed this you accidentally dropped the previous fix and need to add it back in.

behlendorf · 2018-03-19T20:06:38Z

module/zfs/zpl_super.c

+		}
+		s->s_flags |= SB_ACTIVE;
+	} else if ((flags ^ s->s_flags) & SB_RDONLY) {
+		return (ERR_PTR(-EBUSY));


Good catch, yes we're missing a call to deactivate_locked_super here.

behlendorf · 2018-03-19T20:08:16Z

tests/zfs-tests/tests/functional/cli_root/zfs_mount/zfs_multi_mount.ksh

+log_must zfs rename $TESTDS $RENAMEFS
+log_must zfs rename $RENAMEFS $TESTDS
+
+log_pass "Multiple mounts are possible"


behlendorf · 2018-03-27T04:26:20Z

@alek-p when you get a chance the only thing holding up this PR is to run down the CentOS 6 zfs_multi_mount test failure.

Test: /usr/share/zfs/zfs-tests/tests/functional/cli_root/zfs_mount/zfs_multi_mount (run as root) [10:00] [KILLED]
21:49:56.41 ASSERTION: Verify multiple mounts into one namespace are possible
21:49:56.45 SUCCESS: zfs create testpool/multi-mount-test
21:49:56.46 SUCCESS: mkfile 128k /testpool/multi-mount-test/file
21:49:56.47 SUCCESS: umount -l /testpool/multi-mount-test
21:49:56.47 tail: cannot watch `/testpool/multi-mount-test/file': No such file or directory

alek-p · 2018-04-02T18:58:59Z

I haven't forgotten about this PR, but I was preempted with other work. I should be able to return to this task shortly.

alek-p · 2018-04-11T21:05:38Z

Thanks for all the help here @behlendorf, I think this one is ready to go now. I had to change the test so that execution order of commands is guaranteed.

behlendorf

Looks good. Thank's for wrapping this up!

behlendorf · 2018-04-11T21:24:53Z

tests/zfs-tests/tests/functional/cli_root/zfs_mount/zfs_multi_mount.ksh

+if [ ! -f $FILENAME ]; then
+	log_fail "Rename failed"
+fi
+log_must zfs rename $RENAMEFS $TESTDS


Actually, one last thing. We need a cleanup function to make sure this new filesystem has been unmounted and destroyed when this test case exits.

behlendorf · 2018-04-12T18:55:48Z

tests/zfs-tests/tests/functional/cli_root/zfs_mount/zfs_multi_mount.ksh

+
+function cleanup
+{
+	datasetexists $TESTDS && datasetdestroy $TESTDS


There is no datasetdestroy funciton. You can use the destroy_dataset hlper function for this. You should also make sure to explicitly unmount the additional mountpoints first.

destroy_dataset "$TESTDS" "-f"

Damn it, I left out log_must on that destroy so didn't notice that function didn't exist... Made updates to cleanup and it's looking better now:

SUCCESS: umount /testpool/multi-mount-test SUCCESS: umount /testpool/multi-mount-test-second SUCCESS: umount /testpool/multi-mount-test-third SUCCESS: zfs destroy -f testpool/multi-mount-test SUCCESS: destroy_dataset testpool/multi-mount-test -f```

Much better! Thanks, just waiting on the test results now.

Currently mounting an already mounted zfs dataset results in an error, whereas it is typically allowed with other filesystems. This causes some bad interactions with mount namespaces. Take this sequence for example: - Create a dataset - Create a snapshot of the dataset - Create a clone of the snapshot - Create a new mount namespace - Rename the original dataset The rename results in unmounting and remounting the clone in the original mount namespace, however the remount fails because the dataset is still mounted in the new mount namespace. (Note that this means the mount in the new mount namespace is never being unmounted, so perhaps the unmount/remount of the clone isn't actually necessary.) The problem here is a result of the way mounting is implemented in the kernel module. Since it is not mounting block devices it uses mount_nodev() instead of the usual mount_bdev(). However, mount_nodev() is written for filesystems for which each mount is a new instance (i.e. a new super block), and zfs should be able to detect when a mount request can be satisfied using an existing super block. Change zpl_mount() to call sget() directly with it's own test callback. Passing the objset_t object as the fs data allows checking if a superblock already exists for the dataset, and in that case we just need to return a new reference for the sb's root dentry. Signed-off-by: Seth Forshee <[email protected]> Closes openzfs#5796

behlendorf

@tcaputi can you look at this one final time.

tcaputi · 2018-04-12T20:42:11Z

LGTM (to the best of my ability).

codecov · 2018-04-13T01:44:55Z

Codecov Report

Merging #7207 into master will increase coverage by 0.19%.
The diff coverage is 84%.

@@            Coverage Diff             @@
##           master    #7207      +/-   ##
==========================================
+ Coverage    76.2%   76.39%   +0.19%     
==========================================
  Files         330      330              
  Lines      104294   104259      -35     
==========================================
+ Hits        79474    79647     +173     
+ Misses      24820    24612     -208

Flag	Coverage Δ
#kernel	`76.33% <84%> (+0.39%)`	⬆️
#user	`65.72% <ø> (+0.31%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7fab636...12bffa9. Read the comment docs.

Currently mounting an already mounted zfs dataset results in an error, whereas it is typically allowed with other filesystems. This causes some bad interactions with mount namespaces. Take this sequence for example: - Create a dataset - Create a snapshot of the dataset - Create a clone of the snapshot - Create a new mount namespace - Rename the original dataset The rename results in unmounting and remounting the clone in the original mount namespace, however the remount fails because the dataset is still mounted in the new mount namespace. (Note that this means the mount in the new mount namespace is never being unmounted, so perhaps the unmount/remount of the clone isn't actually necessary.) The problem here is a result of the way mounting is implemented in the kernel module. Since it is not mounting block devices it uses mount_nodev() instead of the usual mount_bdev(). However, mount_nodev() is written for filesystems for which each mount is a new instance (i.e. a new super block), and zfs should be able to detect when a mount request can be satisfied using an existing super block. Change zpl_mount() to call sget() directly with it's own test callback. Passing the objset_t object as the fs data allows checking if a superblock already exists for the dataset, and in that case we just need to return a new reference for the sb's root dentry. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tom Caputi <[email protected]> Signed-off-by: Alek Pinchuk <[email protected]> Signed-off-by: Seth Forshee <[email protected]> Closes openzfs#5796 Closes openzfs#7207

Currently mounting an already mounted zfs dataset results in an error, whereas it is typically allowed with other filesystems. This causes some bad interactions with mount namespaces. Take this sequence for example: - Create a dataset - Create a snapshot of the dataset - Create a clone of the snapshot - Create a new mount namespace - Rename the original dataset The rename results in unmounting and remounting the clone in the original mount namespace, however the remount fails because the dataset is still mounted in the new mount namespace. (Note that this means the mount in the new mount namespace is never being unmounted, so perhaps the unmount/remount of the clone isn't actually necessary.) The problem here is a result of the way mounting is implemented in the kernel module. Since it is not mounting block devices it uses mount_nodev() instead of the usual mount_bdev(). However, mount_nodev() is written for filesystems for which each mount is a new instance (i.e. a new super block), and zfs should be able to detect when a mount request can be satisfied using an existing super block. Change zpl_mount() to call sget() directly with it's own test callback. Passing the objset_t object as the fs data allows checking if a superblock already exists for the dataset, and in that case we just need to return a new reference for the sb's root dentry. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tom Caputi <[email protected]> Signed-off-by: Alek Pinchuk <[email protected]> Signed-off-by: Seth Forshee <[email protected]> Closes #5796 Closes #7207

vozhyk- reviewed Feb 21, 2018

View reviewed changes

alek-p force-pushed the multimount branch from ab7d601 to 23db45f Compare February 21, 2018 21:27

behlendorf requested changes Feb 21, 2018

View reviewed changes

alek-p force-pushed the multimount branch 4 times, most recently from 12843f1 to 93ba799 Compare February 21, 2018 23:58

behlendorf requested changes Feb 22, 2018

View reviewed changes

alek-p force-pushed the multimount branch 5 times, most recently from 690a924 to 058ed29 Compare February 22, 2018 01:41

alek-p force-pushed the multimount branch from 058ed29 to 2d2927b Compare February 22, 2018 02:06

behlendorf requested changes Feb 22, 2018

View reviewed changes

alek-p force-pushed the multimount branch from 2d2927b to 745e18f Compare February 22, 2018 03:47

behlendorf mentioned this pull request Feb 22, 2018

ZFS mount/unmount process stuck #6966

Closed

alek-p force-pushed the multimount branch from 745e18f to 661c318 Compare February 22, 2018 18:42

alek-p force-pushed the multimount branch from 661c318 to 0c3587f Compare February 27, 2018 00:37

behlendorf requested changes Feb 27, 2018

View reviewed changes

alek-p force-pushed the multimount branch from 0c3587f to 14df557 Compare February 27, 2018 18:59

behlendorf mentioned this pull request Feb 28, 2018

Allow mounting datasets more than once #7120

Closed

13 tasks

tcaputi suggested changes Mar 17, 2018

View reviewed changes

behlendorf requested changes Mar 19, 2018

View reviewed changes

alek-p force-pushed the multimount branch from a85ac29 to 1a5b486 Compare March 20, 2018 19:21

alek-p force-pushed the multimount branch 5 times, most recently from 50cf75a to 230be2c Compare April 10, 2018 01:32

behlendorf approved these changes Apr 11, 2018

View reviewed changes

behlendorf requested changes Apr 11, 2018

View reviewed changes

behlendorf mentioned this pull request Apr 11, 2018

zfs destroy: dataset is busy #4715

Closed

alek-p force-pushed the multimount branch from 230be2c to 86cab04 Compare April 12, 2018 02:49

behlendorf requested changes Apr 12, 2018

View reviewed changes

alek-p force-pushed the multimount branch from 86cab04 to 12bffa9 Compare April 12, 2018 19:24

behlendorf approved these changes Apr 12, 2018

View reviewed changes

tcaputi approved these changes Apr 12, 2018

View reviewed changes

behlendorf closed this in 93b43af Apr 13, 2018

alek-p deleted the multimount branch April 13, 2018 18:17


		log_must pkill -P $PPID tail

		log_assert "Recovering from a lazy unmount is possilbe"

Allow mounting datasets more than once #7207

Allow mounting datasets more than once #7207

Conversation

alek-p commented Feb 21, 2018 • edited Loading

Description

Motivation and Context

How Has This Been Tested?

Types of changes

Checklist:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vozhyk- Feb 21, 2018 • edited Loading

Choose a reason for hiding this comment

vozhyk- commented Feb 21, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

behlendorf commented Feb 21, 2018

alek-p commented Feb 22, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alek-p commented Feb 22, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alek-p commented Feb 23, 2018

behlendorf commented Feb 23, 2018

alek-p commented Feb 23, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alek-p commented Mar 7, 2018

tcaputi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

behlendorf left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

behlendorf commented Mar 27, 2018

alek-p commented Apr 2, 2018 • edited Loading

alek-p commented Apr 11, 2018

behlendorf left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

behlendorf left a comment

Choose a reason for hiding this comment

tcaputi commented Apr 12, 2018

codecov bot commented Apr 13, 2018 • edited Loading

Codecov Report

alek-p commented Feb 21, 2018 •

edited

Loading

vozhyk- Feb 21, 2018 •

edited

Loading

alek-p commented Apr 2, 2018 •

edited

Loading

codecov bot commented Apr 13, 2018 •

edited

Loading