-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mountall hangs if datasets have duplicate mountpoint properties #5
Comments
Here is a simplified test case. It does not matter if the datasets are empty or not.
|
I can reproduce this, but I noticed that the second
|
The bug is caused by |
This patch avoids the bug for me by eliminating duplicates from the mounts list. diff --git a/src/mountall.c b/src/mountall.c
index a23f407..e97d97a 100644
--- a/src/mountall.c
+++ b/src/mountall.c
@@ -783,6 +783,7 @@ parse_zfs_list (void)
char *saveptr;
char *zfs_name, *zfs_mountpoint, *zfs_canmount, *zfs_optatime, *zfs_optronly;
nih_local char *zfs_mntoptions = NULL;
+ Mount * mnt;
/* If the line didn't fit, then enlarge the buffer and retry. */
while ((! strchr (buf, '\n')) && (! feof (zfs_stream))) {
@@ -824,7 +825,14 @@ parse_zfs_list (void)
NIH_MUST (nih_strcat (&zfs_mntoptions, NULL, ",noatime"));
}
- new_mount (zfs_mountpoint, zfs_name, FALSE, "zfs", zfs_mntoptions);
+ mnt = find_mount (zfs_mountpoint);
+ if (mnt) {
+ update_mount (mnt, zfs_name, FALSE, "zfs",
+ zfs_mntoptions);
+ } else {
+ new_mount (zfs_mountpoint, zfs_name, FALSE, "zfs",
+ zfs_mntoptions);
+ }
}
zfs_result = pclose (zfs_stream); |
@nedbass, perhaps you found an upstream bug here. With ZFS disabled, I get an incomplete result from
But a regular |
Strange, I performed an equivalent test and didn't have a problem. Maybe we tested different versions. Based on my understanding of the bug, this code in parse_fstab should prevent the bug for duplicate mount points in the fstab file. It ensures only one of the entries exists in the mounts list when mount_policy is called. Do you know if the version you tested contains that logic? |
Lexical order and fstab order seems to affect the result.
I tried the current releases for Precise and Quantal. I get --- a/src/mountall.c
+++ b/src/mountall.c
@@ -146,6 +146,7 @@
};
#define MOUNT_NAME(_mnt) (strcmp ((_mnt)->type, "swap") \
+ && strcmp ((_mnt)->type, "zfs") \
&& strcmp ((_mnt)->mountpoint, "none") \
? (_mnt)->mountpoint : (_mnt)->device)
@@ -1131,6 +1132,17 @@
nih_assert (root != NULL);
nih_assert (path != NULL);
+ /*
+ * A path cannot be a root of itself.
+ *
+ * Without this test, `mountall` does not preserve `mount -a` behavior for
+ * fstab entries that share a mount point, and hangs on ZFS datasets and
+ * other virtual filesystems that are not backed by a real block device.
+ */
+
+ if (! strcmp (root, path))
+ return FALSE;
+
len = strlen (root);
if ((! strncmp (path, root, len))
&& ((path[len] == '\0') |
It is unclear to me what the original intention of that |
When mountall is iterating in try_mounts() on new mount point, change is_parent() to return false when a record is tested against itself because a mount point cannot be a root of itself. Mountall does not preserve the behavior of `mount -a` if more than one device shares a mountpoint in the /etc/fstab file. Suppose: /var/tmp/alfa /mnt/charlie ext2 loop 0 0 /var/tmp/bravo /mnt/charlie ext2 loop 0 0 Both filesystems are mounted on /mnt/charlie by `mount -a`, but bravo is ignored by `mountall`. This seems to be an artifact of the fix for LP ##443035 in eca315d. Furthermore, mountall hangs if such mounts are ZFS datasets or other virtual filesystems that are not backed by a real device node. Closes: #5
When mountall is iterating in try_mounts() on new mount point, change is_parent() to return false when a record is tested against itself because a mount point cannot be a root of itself. Mountall does not preserve the behavior of `mount -a` if more than one device shares a mountpoint in the /etc/fstab file. Suppose: /var/tmp/alfa /mnt/charlie ext2 loop 0 0 /var/tmp/bravo /mnt/charlie ext2 loop 0 0 Both filesystems are mounted on /mnt/charlie by `mount -a`, but bravo is ignored by `mountall`. This seems to be an artifact of the fix for LP ##443035 in eca315d. Furthermore, mountall hangs if such mounts are ZFS datasets or other virtual filesystems that are not backed by a real device node. Closes: #5
When mountall is iterating in try_mounts() on new mount point, change is_parent() to return false when a record is tested against itself because a mount point cannot be a root of itself. Mountall does not preserve the behavior of `mount -a` if more than one device shares a mountpoint in the /etc/fstab file. Suppose: /var/tmp/alfa /mnt/charlie ext2 loop 0 0 /var/tmp/bravo /mnt/charlie ext2 loop 0 0 Both filesystems are mounted on /mnt/charlie by `mount -a`, but bravo is ignored by `mountall`. This seems to be an artifact of the fix for LP #443035 in eca315d. Furthermore, mountall hangs if such mounts are ZFS datasets or other virtual filesystems that are not backed by a real device node. Supercedes: solves-dependencies-problem-endless-loop.patch Closes: #5 for Quantal
When mountall is iterating in try_mounts() on new mount point, change is_parent() to return false when a record is tested against itself because a mount point cannot be a root of itself. Mountall does not preserve the behavior of `mount -a` if more than one device shares a mountpoint in the /etc/fstab file. Suppose: /var/tmp/alfa /mnt/charlie ext2 loop 0 0 /var/tmp/bravo /mnt/charlie ext2 loop 0 0 Both filesystems are mounted on /mnt/charlie by `mount -a`, but bravo is ignored by `mountall`. This seems to be an artifact of the fix for LP #443035 in eca315d. Furthermore, mountall hangs if such mounts are ZFS datasets or other virtual filesystems that are not backed by a real device node. Supercedes: solves-dependencies-problem-endless-loop.patch Closes: #5 for Precise
@nedbass, I used the concise alternative that you suggested. After checking the code history and pondering the issue, I can't think of a circumstance where the existing This is also a complete solution for the hang that @mk01 found earlier, so I dropped the |
@dajhorn when you plane to put the package into prod? I put mountall on hold for updates after it was able to mount my machines, now it would be the time to release it again. And please, I was out for some time not having time to follow the grub 1.99 story as well and is on update also. Is it safe to auto update (I upgraded the pools to 5000, but kept /boot with co compression as we discussed last time). |
@mk01, today I will promote the mountall packages from the daily PPA to the stable PPA. ZoL root systems that want to upgrade from Precise to Raring should wait until several weeks after the Raring release. |
@dajhorn upgraded today. endless loop went back to 2.36.4-zfs1 |
@mk01, okay, unfortunate, but thanks for the update. FYI, my bench configuration had the form:
With ext4 and tmpfs mounts on and under
|
fstab:
zfslist (is name, mountpoint, canmount)
|
@mk01, I created a skeleton of this filesystem layout in a virtual machine and tried to reproduce the failure. I noticed this:
Can you isolate for the doubly-mounted root filesystem on real hardware? This could be the corner-case. (I didn't think to check for |
those doubled / and /boot were (at that time) old boot and rootfs - but set "noauto". |
I accidentally got my workstation into a state where the boot would hang after the "Running init-bottom...done". The cause turned out to be two datasets that had identical mountpoint properties. This seemed to prevent mountall from mounting either dataset and it would never emit the virtual-filesystems event, blocking udev and other services. If I start a shell on tty2 and mount one of the filesystems manually, mountall completes and the boot continues normally.
I need to verify this, but I think the first dataset with the duplicate mountpoint to show up in
zfs list
must be non-empty to reproduce this issue. However, as another test, I created a dataset with a non-empty underlying mountpoint, but this did not cause a boot problem. So the duplicate mountpoint properties seem to cause an issue beyond just the mountpoint being non-empty.For example, this reproduces the issue:
This does not reproduce it:
This reproducer is admittedly a configuration error (I introduced it by receiving a replication stream into a test pool) but I thought we should evaluate the impact and how it could be handled better. I could collect more debug data, but I wanted to start by just reporting what happened since @dajhorn may have some insight about it.
The text was updated successfully, but these errors were encountered: