Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix parallel mount race #8878

Merged
merged 1 commit into from
Jul 9, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions lib/libzfs/libzfs_mount.c
Original file line number Diff line number Diff line change
Expand Up @@ -1336,12 +1336,14 @@ mountpoint_cmp(const void *arga, const void *argb)
}

/*
* Return true if path2 is a child of path1.
* Return true if path2 is a child of path1 or path2 equals path1 or
* path1 is "/" (path2 is always a child of "/").
*/
static boolean_t
libzfs_path_contains(const char *path1, const char *path2)
{
return (strstr(path2, path1) == path2 && path2[strlen(path1)] == '/');
return (strcmp(path1, path2) == 0 || strcmp(path1, "/") == 0 ||
(strstr(path2, path1) == path2 && path2[strlen(path1)] == '/'));
}

/*
Expand Down
3 changes: 2 additions & 1 deletion tests/runfiles/linux.run
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,8 @@ tests = ['zfs_mount_001_pos', 'zfs_mount_002_pos', 'zfs_mount_003_pos',
'zfs_mount_007_pos', 'zfs_mount_008_pos', 'zfs_mount_009_neg',
'zfs_mount_010_neg', 'zfs_mount_011_neg', 'zfs_mount_012_neg',
'zfs_mount_all_001_pos', 'zfs_mount_encrypted', 'zfs_mount_remount',
'zfs_multi_mount', 'zfs_mount_all_fail', 'zfs_mount_all_mountpoints']
'zfs_multi_mount', 'zfs_mount_all_fail', 'zfs_mount_all_mountpoints',
'zfs_mount_test_race']
tags = ['functional', 'cli_root', 'zfs_mount']

[tests/functional/cli_root/zfs_program]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ dist_pkgdata_SCRIPTS = \
zfs_mount_all_mountpoints.ksh \
zfs_mount_encrypted.ksh \
zfs_mount_remount.ksh \
zfs_mount_test_race.sh \
zfs_multi_mount.ksh

dist_pkgdata_DATA = \
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
#!/bin/ksh

#
# This file and its contents are supplied under the terms of the
# Common Development and Distribution License ("CDDL"), version 1.0.
# You may only use this file in accordance with the terms of version
# 1.0 of the CDDL.
#
# A full copy of the text of the CDDL should have accompanied this
# source. A copy of the CDDL is also available via the Internet at
# http://www.illumos.org/license/CDDL.
#

#
# Copyright (c) 2019 by Tomohiro Kusumi. All rights reserved.
#

. $STF_SUITE/include/libtest.shlib
. $STF_SUITE/tests/functional/cli_root/zfs_mount/zfs_mount.cfg

#
# DESCRIPTION:
# Verify parallel mount ordering is consistent.
#
# There was a bug in initial thread dispatching algorithm which put threads
# under race condition which resulted in undefined mount order. The purpose
# of this test is to verify `zfs unmount -a` succeeds (not `zfs mount -a`
# succeeds, it always does) after `zfs mount -a`, which could fail if threads
# race. See github.com/zfsonlinux/zfs/issues/{8450,8833,8878} for details.
#
# STRATEGY:
# 1. Create pools and filesystems.
# 2. Set same mount point for >1 datasets.
# 3. Unmount all datasets.
# 4. Mount all datasets.
# 5. Unmount all datasets (verify this succeeds).
#

verify_runnable "both"

TMPDIR=${TMPDIR:-$TEST_BASE_DIR}
MNTPT=$TMPDIR/zfs_mount_test_race_mntpt
DISK1="$TMPDIR/zfs_mount_test_race_disk1"
DISK2="$TMPDIR/zfs_mount_test_race_disk2"

TESTPOOL1=zfs_mount_test_race_tp1
TESTPOOL2=zfs_mount_test_race_tp2

export __ZFS_POOL_RESTRICT="$TESTPOOL1 $TESTPOOL2"
log_must zfs $unmountall
unset __ZFS_POOL_RESTRICT

function cleanup
{
zpool destroy $TESTPOOL1
zpool destroy $TESTPOOL2
rm -rf $MNTPT
rm -rf /$TESTPOOL1
rm -rf /$TESTPOOL2
rm -f $DISK1
rm -f $DISK2
export __ZFS_POOL_RESTRICT="$TESTPOOL1 $TESTPOOL2"
log_must zfs $mountall
unset __ZFS_POOL_RESTRICT
}
log_onexit cleanup

log_note "Verify parallel mount ordering is consistent"

log_must truncate -s $MINVDEVSIZE $DISK1
log_must truncate -s $MINVDEVSIZE $DISK2

log_must zpool create -f $TESTPOOL1 $DISK1
log_must zpool create -f $TESTPOOL2 $DISK2

log_must zfs create $TESTPOOL1/$TESTFS1
log_must zfs create $TESTPOOL2/$TESTFS2

behlendorf marked this conversation as resolved.
Show resolved Hide resolved
log_must zfs set mountpoint=none $TESTPOOL1
log_must zfs set mountpoint=$MNTPT $TESTPOOL1/$TESTFS1

# Note that unmount can fail (due to race condition on `zfs mount -a`) with or
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you expand on this comment and explain what you mean? How can an unmount failure be caused by a race condition in zfs mount -a?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can an unmount failure be caused by a race condition in zfs mount -a?

????
That's what's been discussed since the original issue reported in #8833.
When threads are under race condition, mount order is undefined.
When a thread for a mountpoint deeper in directory tree wins (which is a bug), that's not what zfs unmount -a expects, hence unmount fails.

Copy link

@sebroy sebroy Jul 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kusumi Please be patient with me. I'm asking clarifying questions in good faith. I legitimately did not understand how a race condition in the mount code could cause unmount to fail, even after having read the issue. Furthermore, people reading this code in the future won't know to go look at that issue. Comments should be self-contained and self-explanatory.

This comment is evidently trying to explain why canmount has to be set to off on $TESTPOOL2, which has a mountpoint of $MNTPT. This has to be done because you can't mount two filesystems on the same mountpoint. In the test, $TESTPOOL1/$TESTFS1 has the same mountpoint. Both cannot be mounted; hence, one of them has to have canmount set to off.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I legitimately did not understand how a race condition in the mount code could cause unmount to fail, even after having read the issue.

The unmount assumes mount order was in correct order, so if a mount point deeper in the directory tree is shadowed by another one due to undefined mount order caused by the race among pthreads, then unmount will fail, because unmount can't see a mount that is supposed to exist.

(See my fist comment and example I pasted in #8833, as well as my first comment in this thread. Also see #8450 which is likely the same problem.)

Furthermore, people reading this code in the future won't know to go look at that issue.

I did add a bit more description (basically my previous comment to your question) in my previous push, but apparently I won't/can't explain all the details of implementation I explained in this thread as well as the commit message (plus this test file already does have lots of comments with ascii-art). That's why it has had a reference to GitHub url.

This comment is evidently trying to explain why canmount has to be set to off on $TESTPOOL2, which has a mountpoint of $MNTPT. This has to be done because you can't mount two filesystems on the same mountpoint. In the test, $TESTPOOL1/$TESTFS1 has the same mountpoint. Both cannot be mounted; hence, one of them has to have canmount set to off.

Right, that's why it has turn it off for convenience of mount layout used in this test case, but whether that property is set or not, the race itself in initial thread dispatching does happen. See #8450, it's not using canmount.

# without `canmount=off`. The race has nothing to do with canmount property,
# but turn it off for convenience of mount layout used in this test case.
log_must zfs set canmount=off $TESTPOOL2
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a test that does the exact same thing, except sets canmount=off on TESTPOOL1 instead, just in case the code is accidentally dependent on the ordering of the filesystems passed in?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just in case the code is accidentally dependent on the ordering of the filesystems passed in?

I don't get what "accidentally" is supposed to mean, but the entire point of this race fix on initial thread dispatching isn't anything to do with canmount property. If canmount=off on a different top level dir is causing difference or a problem, that's another issue.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kusumi From my understanding, the canmount property has to be set to off on the filesystems with duplicate mount points in order to hit this bug. Can you show me a case where this bug gets hit and canmount isn't set to off on any of the filesystems?

Essentially, this has to do with two or more filesystems having the same mountpoint, all but one of which does not get mounted (canmount=off). If a thread handling the mounting of a filesystem whose canmount property is set to off spawns child filesystem mounts before others, then those child mounts will fail because their parent filesystems will not have been mounted. This updated fix addresses this issue because the threads with duplicate mountpoints will no longer be spawned in parallel, and the one legitimate filesystem (the one that will be mounted) will be guaranteed to be mounted before the threads that mount its child filesystems get spawned.

I'm not trying to criticize your code or tests here, I'm providing you with feedback so that we can solidify the code. This is supposed to be constructive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you show me a case where this bug gets hit and canmount isn't set to off on any of the filesystems?

This example does fail to unmount without canmount=off. In this test or an example in #8833, canmount=off is set to TESTPOOL2 or p2, so while it does fail to mount TESTPOOL2 or p2 on set mountpoint=/... without set canmount=off, the race condition in other two mounts (which are to be mounted by different threads without this fix, but run in a single thread with this fix) exist.

Also #8450 isn't using canmount property as well.

log_must zfs set mountpoint=$MNTPT $TESTPOOL2

# At this point, layout of datasets in two pools will look like below.
# Previously, on next `zfs mount -a`, pthreads assigned to TESTFS1 and TESTFS2
# could race, and TESTFS2 usually (actually always) won in ZoL. Note that the
# problem is how two or more threads could initially be assigned to the same
# top level directory, not this specific layout. This layout is just an example
# that can reproduce race, and is also the layout reported in #8833.
#
# NAME MOUNTED MOUNTPOINT
# ----------------------------------------------
# /$TESTPOOL1 no none
# /$TESTPOOL1/$TESTFS1 yes $MNTPT
# /$TESTPOOL2 no $MNTPT
# /$TESTPOOL2/$TESTFS2 yes $MNTPT/$TESTFS2

# Apparently two datasets must be mounted.
log_must ismounted $TESTPOOL1/$TESTFS1
log_must ismounted $TESTPOOL2/$TESTFS2
# This unmount always succeeds, because potential race hasn't happened yet.
log_must zfs unmount -a
# This mount always succeeds, whether threads are under race condition or not.
log_must zfs mount -a

# Verify datasets are mounted (TESTFS2 fails if the race broke mount order).
log_must ismounted $TESTPOOL1/$TESTFS1
log_must ismounted $TESTPOOL2/$TESTFS2
# Verify unmount succeeds (fails if the race broke mount order).
log_must zfs unmount -a

log_pass "Verify parallel mount ordering is consistent passed"