Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CentOS 7.2, zfs-0.6.5 - vdev_id.conf aliasing has broken #4517

Closed
AeonJJohnson opened this issue Apr 13, 2016 · 4 comments
Closed

CentOS 7.2, zfs-0.6.5 - vdev_id.conf aliasing has broken #4517

AeonJJohnson opened this issue Apr 13, 2016 · 4 comments
Milestone

Comments

@AeonJJohnson
Copy link

Greetings,

We have encountered a scenario where a zfs-0.6.5 installation using sas_direct aliasing is dysfunctional. We have also tried sas_switch and direct devlink aliasing in /etc/zfs/vdev_id.conf.

Any zfs operations to /dev/sdX devices work. Any zfs operations executed using /dev/disk/by-vdev aliases fails. Using other device aliases under /dev/disk/... fail as well (by-path/..., by-id/...)

zpool create spaceballs raidz /dev/sda /dev/sdb /dev/sdc /dev/sdd
**works**
zpool create spaceballs raidz Z0 Z1 Z2 Z3
**fails**
cannot create 'spaceballs': one or more devices is currently unavailable

The system in question is running CentOS 7.2 (updated) with kernel-3.10.0-327.13.1.el7.x86_64.

I would say this aligns with issue #4214 so this can be additional information as I have tried several hardware topologies and alias methods all with the same failing result.

@AeonJJohnson
Copy link
Author

Additional information:

I was able to create pools using aliased devnames by manually creating links to full path block device names. (/dev/sda as opposed to ../../sda). In further testing I was able to get a zpool create process to work using standard vdev_id.conf aliasing where it hadn't worked before by running 'udevadm trigger' a couple times in a row, admittedly this was done out of frustration and subsequently stumbled into it making it work.

At the time of running a couple successive 'udevadm trigger' routines I was getting messages like this in /var/log/messages:
Apr 12 22:21:15 lustre-oss-00 systemd: Device dev-disk-by\x2dpartlabel-zfs.device appeared twice with different sysfs paths /sys/devices/pci0000:00/0000:00:03.2/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:17/end_device-0:0:17/target0:0:17/0:0:17:0/block/sdr/sdr1 and /sys/devices/pci0000:00/0000:00:03.2/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:0/end_device-0:0:0/target0:0:0/0:0:0:0/block/sda/sda1
Apr 12 22:21:15 lustre-oss-00 systemd: Device dev-disk-by\x2dpartlabel-zfs.device appeared twice with different sysfs paths /sys/devices/pci0000:00/0000:00:03.2/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:17/end_device-0:0:17/target0:0:17/0:0:17:0/block/sdr/sdr1 and /sys/devices/pci0000:00/0000:00:03.2/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:6/end_device-0:0:6/target0:0:6/0:0:6:0/block/sdg/sdg1
Apr 12 22:21:15 lustre-oss-00 systemd: Device dev-disk-by\x2dpartlabel-zfs.device appeared twice with different sysfs paths /sys/devices/pci0000:00/0000:00:03.2/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:17/end_device-0:0:17/target0:0:17/0:0:17:0/block/sdr/sdr1 and /sys/devices/pci0000:00/0000:00:03.2/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:11/end_device-0:0:11/target0:0:11/0:0:11:0/block/sdl/sdl1
Apr 12 22:21:24 lustre-oss-00 systemd: Device dev-disk-by\x2dpartlabel-zfs.device appeared twice with different sysfs paths /sys/devices/pci0000:00/0000:00:03.2/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:0/end_device-0:0:0/target0:0:0/0:0:0:0/block/sda/sda1 and /sys/devices/pci0000:00/0000:00:03.2/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:11/end_device-0:0:11/target0:0:11/0:0:11:0/block/sdl/sdl1
Apr 12 22:21:24 lustre-oss-00 systemd: Device dev-disk-by\x2dpartlabel-zfs.device appeared twice with different sysfs paths /sys/devices/pci0000:00/0000:00:03.2/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:0/end_device-0:0:0/target0:0:0/0:0:0:0/block/sda/sda1 and /sys/devices/pci0000:00/0000:00:03.2/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:17/end_device-0:0:17/target0:0:17/0:0:17:0/block/sdr/sdr1
Apr 12 22:21:24 lustre-oss-00 systemd: Device dev-disk-by\x2dpartlabel-zfs.device appeared twice with different sysfs paths /sys/devices/pci0000:00/0000:00:03.2/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:0/end_device-0:0:0/target0:0:0/0:0:0:0/block/sda/sda1 and /sys/devices/pci0000:00/0000:00:03.2/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:6/end_device-0:0:6/target0:0:6/0:0:6:0/block/sdg/sdg1

This test system is not multipath in its hardware configuration or software (no device-mapper-multipath) so I am not clear how a device or device label is being seen twice unless there is some sting parsing process that is broken.

@AeonJJohnson
Copy link
Author

I think this may be caused by the way systemd scans and enumerates devices that differs from udev in RHEL/CentOS 6.x release. I've run through the process of creating (and failing) zpools and it appears that systemd doesn't create the device aliases that zfs needs during the zpool create process. It looks like systemd is always recreating device aliases and when it does so it looks like it has a problem with more than one partition device having the same partition name and it ends up seeing them as duplicates or just fails to create them in the time required during the 'zpool create' process.

Just one 'zpool create' of a four drive raidz generates:
Apr 12 23:52:00 lustre-oss-00 kernel: sd 0:0:13:0: [sdp] 1465130646 4096-byte logical blocks: (6.00 TB/5.45 TiB)
Apr 12 23:52:00 lustre-oss-00 kernel: sdp: sdp1 sdp9
Apr 12 23:52:00 lustre-oss-00 kernel: sd 0:0:13:0: [sdp] 1465130646 4096-byte logical blocks: (6.00 TB/5.45 TiB)
Apr 12 23:52:00 lustre-oss-00 kernel: sdp: sdp1 sdp9
Apr 12 23:52:00 lustre-oss-00 systemd: Device dev-disk-by\x2dpartlabel-zfs.device appeared twice with different sysfs paths /sys/devices/pci0000:00/0000:00:03.2/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:1/end_device-0:0:1/target0:0:1/0:0:1:0/block/sdd/sdd1 and /sys/devices/pci0000:00/0000:00:03.2/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:13/end_device-0:0:13/target0:0:13/0:0:13:0/block/sdp/sdp1
Apr 12 23:52:00 lustre-oss-00 kernel: sd 0:0:19:0: [sdv] 1465130646 4096-byte logical blocks: (6.00 TB/5.45 TiB)
Apr 12 23:52:00 lustre-oss-00 kernel: sdv: sdv1 sdv9
Apr 12 23:52:00 lustre-oss-00 kernel: sd 0:0:19:0: [sdv] 1465130646 4096-byte logical blocks: (6.00 TB/5.45 TiB)
Apr 12 23:52:00 lustre-oss-00 kernel: sdv: sdv1 sdv9
Apr 12 23:52:00 lustre-oss-00 systemd: Device dev-disk-by\x2dpartlabel-zfs.device appeared twice with different sysfs paths /sys/devices/pci0000:00/0000:00:03.2/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:1/end_device-0:0:1/target0:0:1/0:0:1:0/block/sdd/sdd1 and /sys/devices/pci0000:00/0000:00:03.2/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:13/end_device-0:0:13/target0:0:13/0:0:13:0/block/sdp/sdp1
Apr 12 23:52:00 lustre-oss-00 systemd: Device dev-disk-by\x2dpartlabel-zfs.device appeared twice with different sysfs paths /sys/devices/pci0000:00/0000:00:03.2/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:1/end_device-0:0:1/target0:0:1/0:0:1:0/block/sdd/sdd1 and /sys/devices/pci0000:00/0000:00:03.2/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:19/end_device-0:0:19/target0:0:19/0:0:19:0/block/sdv/sdv1
Apr 12 23:52:00 lustre-oss-00 kernel: sd 0:0:3:0: [sdf] 1465130646 4096-byte logical blocks: (6.00 TB/5.45 TiB)
Apr 12 23:52:00 lustre-oss-00 kernel: sdf: sdf1 sdf9
Apr 12 23:52:00 lustre-oss-00 kernel: sd 0:0:3:0: [sdf] 1465130646 4096-byte logical blocks: (6.00 TB/5.45 TiB)
Apr 12 23:52:00 lustre-oss-00 kernel: sdf: sdf1 sdf9
Apr 12 23:52:00 lustre-oss-00 systemd: Device dev-disk-by\x2dpartlabel-zfs.device appeared twice with different sysfs paths /sys/devices/pci0000:00/0000:00:03.2/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:1/end_device-0:0:1/target0:0:1/0:0:1:0/block/sdd/sdd1 and /sys/devices/pci0000:00/0000:00:03.2/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:13/end_device-0:0:13/target0:0:13/0:0:13:0/block/sdp/sdp1
Apr 12 23:52:00 lustre-oss-00 systemd: Device dev-disk-by\x2dpartlabel-zfs.device appeared twice with different sysfs paths /sys/devices/pci0000:00/0000:00:03.2/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:1/end_device-0:0:1/target0:0:1/0:0:1:0/block/sdd/sdd1 and /sys/devices/pci0000:00/0000:00:03.2/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:3/end_device-0:0:3/target0:0:3/0:0:3:0/block/sdf/sdf1
Apr 12 23:52:00 lustre-oss-00 systemd: Device dev-disk-by\x2dpartlabel-zfs.device appeared twice with different sysfs paths /sys/devices/pci0000:00/0000:00:03.2/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:1/end_device-0:0:1/target0:0:1/0:0:1:0/block/sdd/sdd1 and /sys/devices/pci0000:00/0000:00:03.2/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:19/end_device-0:0:19/target0:0:19/0:0:19:0/block/sdv/sdv1
Apr 12 23:52:00 lustre-oss-00 kernel: sd 0:0:9:0: [sdl] 1465130646 4096-byte logical blocks: (6.00 TB/5.45 TiB)
Apr 12 23:52:00 lustre-oss-00 kernel: sdl: sdl1 sdl9
Apr 12 23:52:00 lustre-oss-00 kernel: sd 0:0:9:0: [sdl] 1465130646 4096-byte logical blocks: (6.00 TB/5.45 TiB)
Apr 12 23:52:00 lustre-oss-00 kernel: sdl: sdl1 sdl9
Apr 12 23:52:00 lustre-oss-00 kernel: sd 0:0:9:0: [sdl] 1465130646 4096-byte logical blocks: (6.00 TB/5.45 TiB)
Apr 12 23:52:00 lustre-oss-00 kernel: sdl: sdl1 sdl9
Apr 12 23:52:00 lustre-oss-00 kernel: sd 0:0:9:0: [sdl] 1465130646 4096-byte logical blocks: (6.00 TB/5.45 TiB)
Apr 12 23:52:00 lustre-oss-00 kernel: sdl: sdl1 sdl9
Apr 12 23:52:00 lustre-oss-00 systemd: Device dev-disk-by\x2dpartlabel-zfs.device appeared twice with different sysfs paths /sys/devices/pci0000:00/0000:00:03.2/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:1/end_device-0:0:1/target0:0:1/0:0:1:0/block/sdd/sdd1 and /sys/devices/pci0000:00/0000:00:03.2/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:19/end_device-0:0:19/target0:0:19/0:0:19:0/block/sdv/sdv1
Apr 12 23:52:00 lustre-oss-00 systemd: Device dev-disk-by\x2dpartlabel-zfs.device appeared twice with different sysfs paths /sys/devices/pci0000:00/0000:00:03.2/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:1/end_device-0:0:1/target0:0:1/0:0:1:0/block/sdd/sdd1 and /sys/devices/pci0000:00/0000:00:03.2/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:3/end_device-0:0:3/target0:0:3/0:0:3:0/block/sdf/sdf1
Apr 12 23:52:00 lustre-oss-00 systemd: Device dev-disk-by\x2dpartlabel-zfs.device appeared twice with different sysfs paths /sys/devices/pci0000:00/0000:00:03.2/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:1/end_device-0:0:1/target0:0:1/0:0:1:0/block/sdd/sdd1 and /sys/devices/pci0000:00/0000:00:03.2/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:9/end_device-0:0:9/target0:0:9/0:0:9:0/block/sdl/sdl1
Apr 12 23:52:00 lustre-oss-00 zed: eid=47 class=statechange
Apr 12 23:52:00 lustre-oss-00 zed: eid=48 class=statechange
Apr 12 23:52:00 lustre-oss-00 zed: eid=49 class=statechange
Apr 12 23:52:00 lustre-oss-00 zed: eid=50 class=vdev.unknown pool=megamaid
Apr 12 23:52:01 lustre-oss-00 systemd: Device dev-disk-by\x2dpartlabel-zfs.device appeared twice with different sysfs paths /sys/devices/pci0000:00/0000:00:03.2/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:1/end_device-0:0:1/target0:0:1/0:0:1:0/block/sdd/sdd1 and /sys/devices/pci0000:00/0000:00:03.2/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:3/end_device-0:0:3/target0:0:3/0:0:3:0/block/sdf/sdf1
Apr 12 23:52:01 lustre-oss-00 systemd: Device dev-disk-by\x2dpartlabel-zfs.device appeared twice with different sysfs paths /sys/devices/pci0000:00/0000:00:03.2/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:1/end_device-0:0:1/target0:0:1/0:0:1:0/block/sdd/sdd1 and /sys/devices/pci0000:00/0000:00:03.2/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:9/end_device-0:0:9/target0:0:9/0:0:9:0/block/sdl/sdl1
Apr 12 23:52:01 lustre-oss-00 systemd: Device dev-disk-by\x2dpartlabel-zfs.device appeared twice with different sysfs paths /sys/devices/pci0000:00/0000:00:03.2/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:1/end_device-0:0:1/target0:0:1/0:0:1:0/block/sdd/sdd1 and /sys/devices/pci0000:00/0000:00:03.2/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:9/end_device-0:0:9/target0:0:9/0:0:9:0/block/sdl/sdl1
Apr 12 23:52:01 lustre-oss-00 systemd: Device dev-disk-by\x2dpartlabel-zfs.device appeared twice with different sysfs paths /sys/devices/pci0000:00/0000:00:03.2/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:1/end_device-0:0:1/target0:0:1/0:0:1:0/block/sdd/sdd1 and /sys/devices/pci0000:00/0000:00:03.2/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:9/end_device-0:0:9/target0:0:9/0:0:9:0/block/sdl/sdl1

And if you run blkid on all of those devices they all have the same partition label for partition 1, PARTLABEL="zfs". It looks like systemd doesn't like non-unique partition labels for the devices and partitions it finds.

A potential resolution would be for 'zfs create' to issue a random and unique string for the partition label instead of "zfs". Maybe the right six digits of the UUID or something.

Apologies for the rambling, I've been on this a while and it's late.

behlendorf added a commit to behlendorf/zfs that referenced this issue Apr 13, 2016
When partitioning a device a name may be specified for each partion.
Internally zfs doesn't use this partition name for anything so it
has always just been set to "zfs".

However this isn't optimal because udev will create symlinks using
this name in /dev/disk/by-partlabel/.  If the name isn't unique
then all the links cannot be created.

Therefore a random 64-bit value has been added to the partition
label, i.e "zfs-1234567890abcdef".  Additional information could
be encoded here since partitions may be reused that could result
in confusion and was decided against.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#4517
@behlendorf
Copy link
Contributor

@AeonJJohnson I've opened PR #4523 with a proposed fix for this. It adds a randomly generated unique id to the end of each partition name. Could you verify it resolves the issue.

@AeonJJohnson
Copy link
Author

@behlendorf #4523 resolves the systemd-udev errors appearing in /var/log/messages

Apr 12 23:52:01 lustre-oss-00 systemd: Device dev-disk-by\x2dpartlabel-zfs.device appeared twice with different sysfs paths /sys/devices/pci0000:00/0000:00:03.2/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:1/end_device-0:0:1/target0:0:1/0:0:1:0/block/sdd/sdd1 and /sys/devices/pci0000:00/0000:00:03.2/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:9/end_device-0:0:9/target0:0:9/0:0:9:0/block/sdl/sdl1
Apr 12 23:52:01 lustre-oss-00 systemd: Device dev-disk-by\x2dpartlabel-zfs.device appeared twice with different sysfs paths /sys/devices/pci0000:00/0000:00:03.2/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:1/end_device-0:0:1/target0:0:1/0:0:1:0/block/sdd/sdd1 and /sys/devices/pci0000:00/0000:00:03.2/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:9/end_device-0:0:9/target0:0:9/0:0:9:0/block/sdl/sdl1

However the zpool create process still fails. Something else involving systemd-udev and the directories it creates dev links in is interfering with zpool create. I will continue to dig into it.

behlendorf added a commit to behlendorf/zfs that referenced this issue Apr 15, 2016
When partitioning a device a name may be specified for each partion.
Internally zfs doesn't use this partition name for anything so it
has always just been set to "zfs".

However this isn't optimal because udev will create symlinks using
this name in /dev/disk/by-partlabel/.  If the name isn't unique
then all the links cannot be created.

Therefore a random 64-bit value has been added to the partition
label, i.e "zfs-1234567890abcdef".  Additional information could
be encoded here but since partitions may be reused that could
result in confusion and it was decided against.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#4517
behlendorf added a commit to behlendorf/zfs that referenced this issue Apr 20, 2016
When partitioning a device a name may be specified for each partion.
Internally zfs doesn't use this partition name for anything so it
has always just been set to "zfs".

However this isn't optimal because udev will create symlinks using
this name in /dev/disk/by-partlabel/.  If the name isn't unique
then all the links cannot be created.

Therefore a random 64-bit value has been added to the partition
label, i.e "zfs-1234567890abcdef".  Additional information could
be encoded here but since partitions may be reused that could
result in confusion and it was decided against.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#4517
behlendorf added a commit to behlendorf/zfs that referenced this issue Apr 20, 2016
When ZFS partitions a block device it must wait for udev to create
both a device node and all the device symlinks.  This process takes
a variable length of time and depends factors such how many links
must be created, the complexity of the rules, etc.  Complicating
the situation further it is not uncommon for udev to create and
then remove a link multiple times while processing the rules.

Given the above, the existing scheme of waiting for an expected
partition to appear by name isn't 100% reliable.  At this point
udev may still remove and recreate think link resulting in the
kernel modules being unable to open the device.

In order to address this the zpool_label_disk_wait() function
has been updated to use libudev.  Until the registered system
device acknowledges that it in fully initialized the function
will wait.  Once fully initialized all device links are checked
and allowed to settle for 50ms.  This makes it far more certain
that all the device nodes will existing when the kernel modules
need to open them.

For systems without libudev an alternate zpool_label_disk_wait()
was implemented which includes a settle time.  In addition, the
kernel modules were updated to include retry logic for this
ENOENT case.  Due to the improved checks in the utilities it
is unlikely this logic will be invoked, however in the rare
event it is needed to will prevent a failure.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#3708
Issue openzfs#4077
Issue openzfs#4144
Issue openzfs#4214
Issue openzfs#4517
behlendorf added a commit to behlendorf/zfs that referenced this issue Apr 20, 2016
When ZFS partitions a block device it must wait for udev to create
both a device node and all the device symlinks.  This process takes
a variable length of time and depends factors such how many links
must be created, the complexity of the rules, etc.  Complicating
the situation further it is not uncommon for udev to create and
then remove a link multiple times while processing the rules.

Given the above, the existing scheme of waiting for an expected
partition to appear by name isn't 100% reliable.  At this point
udev may still remove and recreate think link resulting in the
kernel modules being unable to open the device.

In order to address this the zpool_label_disk_wait() function
has been updated to use libudev.  Until the registered system
device acknowledges that it in fully initialized the function
will wait.  Once fully initialized all device links are checked
and allowed to settle for 50ms.  This makes it far more certain
that all the device nodes will existing when the kernel modules
need to open them.

For systems without libudev an alternate zpool_label_disk_wait()
was implemented which includes a settle time.  In addition, the
kernel modules were updated to include retry logic for this
ENOENT case.  Due to the improved checks in the utilities it
is unlikely this logic will be invoked, however in the rare
event it is needed to will prevent a failure.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#3708
Issue openzfs#4077
Issue openzfs#4144
Issue openzfs#4214
Issue openzfs#4517
behlendorf added a commit to behlendorf/zfs that referenced this issue Apr 22, 2016
When partitioning a device a name may be specified for each partion.
Internally zfs doesn't use this partition name for anything so it
has always just been set to "zfs".

However this isn't optimal because udev will create symlinks using
this name in /dev/disk/by-partlabel/.  If the name isn't unique
then all the links cannot be created.

Therefore a random 64-bit value has been added to the partition
label, i.e "zfs-1234567890abcdef".  Additional information could
be encoded here but since partitions may be reused that could
result in confusion and it was decided against.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#4517
behlendorf added a commit to behlendorf/zfs that referenced this issue Apr 22, 2016
When ZFS partitions a block device it must wait for udev to create
both a device node and all the device symlinks.  This process takes
a variable length of time and depends factors such how many links
must be created, the complexity of the rules, etc.  Complicating
the situation further it is not uncommon for udev to create and
then remove a link multiple times while processing the rules.

Given the above, the existing scheme of waiting for an expected
partition to appear by name isn't 100% reliable.  At this point
udev may still remove and recreate think link resulting in the
kernel modules being unable to open the device.

In order to address this the zpool_label_disk_wait() function
has been updated to use libudev.  Until the registered system
device acknowledges that it in fully initialized the function
will wait.  Once fully initialized all device links are checked
and allowed to settle for 50ms.  This makes it far more certain
that all the device nodes will existing when the kernel modules
need to open them.

For systems without libudev an alternate zpool_label_disk_wait()
was implemented which includes a settle time.  In addition, the
kernel modules were updated to include retry logic for this
ENOENT case.  Due to the improved checks in the utilities it
is unlikely this logic will be invoked, however in the rare
event it is needed to will prevent a failure.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#3708
Issue openzfs#4077
Issue openzfs#4144
Issue openzfs#4214
Issue openzfs#4517
behlendorf added a commit to behlendorf/zfs that referenced this issue Apr 25, 2016
When ZFS partitions a block device it must wait for udev to create
both a device node and all the device symlinks.  This process takes
a variable length of time and depends on factors such how many links
must be created, the complexity of the rules, etc.  Complicating
the situation further it is not uncommon for udev to create and
then remove a link multiple times while processing the udev rules.

Given the above, the existing scheme of waiting for an expected
partition to appear by name isn't 100% reliable.  At this point
udev may still remove and recreate think link resulting in the
kernel modules being unable to open the device.

In order to address this the zpool_label_disk_wait() function
has been updated to use libudev.  Until the registered system
device acknowledges that it in fully initialized the function
will wait.  Once fully initialized all device links are checked
and allowed to settle for 50ms.  This makes it far more likely
that all the device nodes will exist when the kernel modules
need to open them.

For systems without libudev an alternate zpool_label_disk_wait()
was updated to include a settle time.  In addition, the kernel
modules were updated to include retry logic for this ENOENT case.
Due to the improved checks in the utilities it is unlikely this
logic will be invoked.  However, if the rare event it is needed
it will prevent a failure.

Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Signed-off-by: Richard Laager <[email protected]>
Closes openzfs#4523
Closes openzfs#3708
Closes openzfs#4077
Closes openzfs#4144
Closes openzfs#4214
Closes openzfs#4517
@behlendorf behlendorf added this to the 0.6.5.7 milestone Apr 25, 2016
nedbass pushed a commit to nedbass/zfs that referenced this issue May 6, 2016
When partitioning a device a name may be specified for each partition.
Internally zfs doesn't use this partition name for anything so it
has always just been set to "zfs".

However this isn't optimal because udev will create symlinks using
this name in /dev/disk/by-partlabel/.  If the name isn't unique
then all the links cannot be created.

Therefore a random 64-bit value has been added to the partition
label, i.e "zfs-1234567890abcdef".  Additional information could
be encoded here but since partitions may be reused that might
result in confusion and it was decided against.

Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Signed-off-by: Richard Laager <[email protected]>
Closes openzfs#4517
nedbass pushed a commit to nedbass/zfs that referenced this issue May 6, 2016
When ZFS partitions a block device it must wait for udev to create
both a device node and all the device symlinks.  This process takes
a variable length of time and depends on factors such how many links
must be created, the complexity of the rules, etc.  Complicating
the situation further it is not uncommon for udev to create and
then remove a link multiple times while processing the udev rules.

Given the above, the existing scheme of waiting for an expected
partition to appear by name isn't 100% reliable.  At this point
udev may still remove and recreate think link resulting in the
kernel modules being unable to open the device.

In order to address this the zpool_label_disk_wait() function
has been updated to use libudev.  Until the registered system
device acknowledges that it in fully initialized the function
will wait.  Once fully initialized all device links are checked
and allowed to settle for 50ms.  This makes it far more likely
that all the device nodes will exist when the kernel modules
need to open them.

For systems without libudev an alternate zpool_label_disk_wait()
was updated to include a settle time.  In addition, the kernel
modules were updated to include retry logic for this ENOENT case.
Due to the improved checks in the utilities it is unlikely this
logic will be invoked.  However, if the rare event it is needed
it will prevent a failure.

Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Signed-off-by: Richard Laager <[email protected]>
Closes openzfs#4523
Closes openzfs#3708
Closes openzfs#4077
Closes openzfs#4144
Closes openzfs#4214
Closes openzfs#4517
nedbass pushed a commit to nedbass/zfs that referenced this issue May 6, 2016
When ZFS partitions a block device it must wait for udev to create
both a device node and all the device symlinks.  This process takes
a variable length of time and depends on factors such how many links
must be created, the complexity of the rules, etc.  Complicating
the situation further it is not uncommon for udev to create and
then remove a link multiple times while processing the udev rules.

Given the above, the existing scheme of waiting for an expected
partition to appear by name isn't 100% reliable.  At this point
udev may still remove and recreate think link resulting in the
kernel modules being unable to open the device.

In order to address this the zpool_label_disk_wait() function
has been updated to use libudev.  Until the registered system
device acknowledges that it in fully initialized the function
will wait.  Once fully initialized all device links are checked
and allowed to settle for 50ms.  This makes it far more likely
that all the device nodes will exist when the kernel modules
need to open them.

For systems without libudev an alternate zpool_label_disk_wait()
was updated to include a settle time.  In addition, the kernel
modules were updated to include retry logic for this ENOENT case.
Due to the improved checks in the utilities it is unlikely this
logic will be invoked.  However, if the rare event it is needed
it will prevent a failure.

Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Signed-off-by: Richard Laager <[email protected]>
Closes openzfs#4523
Closes openzfs#3708
Closes openzfs#4077
Closes openzfs#4144
Closes openzfs#4214
Closes openzfs#4517
ryao pushed a commit to ClusterHQ/zfs that referenced this issue Jun 7, 2016
When partitioning a device a name may be specified for each partition.
Internally zfs doesn't use this partition name for anything so it
has always just been set to "zfs".

However this isn't optimal because udev will create symlinks using
this name in /dev/disk/by-partlabel/.  If the name isn't unique
then all the links cannot be created.

Therefore a random 64-bit value has been added to the partition
label, i.e "zfs-1234567890abcdef".  Additional information could
be encoded here but since partitions may be reused that might
result in confusion and it was decided against.

Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Signed-off-by: Richard Laager <[email protected]>
Closes openzfs#4517
ryao pushed a commit to ClusterHQ/zfs that referenced this issue Jun 7, 2016
When ZFS partitions a block device it must wait for udev to create
both a device node and all the device symlinks.  This process takes
a variable length of time and depends on factors such how many links
must be created, the complexity of the rules, etc.  Complicating
the situation further it is not uncommon for udev to create and
then remove a link multiple times while processing the udev rules.

Given the above, the existing scheme of waiting for an expected
partition to appear by name isn't 100% reliable.  At this point
udev may still remove and recreate think link resulting in the
kernel modules being unable to open the device.

In order to address this the zpool_label_disk_wait() function
has been updated to use libudev.  Until the registered system
device acknowledges that it in fully initialized the function
will wait.  Once fully initialized all device links are checked
and allowed to settle for 50ms.  This makes it far more likely
that all the device nodes will exist when the kernel modules
need to open them.

For systems without libudev an alternate zpool_label_disk_wait()
was updated to include a settle time.  In addition, the kernel
modules were updated to include retry logic for this ENOENT case.
Due to the improved checks in the utilities it is unlikely this
logic will be invoked.  However, if the rare event it is needed
it will prevent a failure.

Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Signed-off-by: Richard Laager <[email protected]>
Closes openzfs#4523
Closes openzfs#3708
Closes openzfs#4077
Closes openzfs#4144
Closes openzfs#4214
Closes openzfs#4517
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants