-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CentOS 7.2, zfs-0.6.5 - vdev_id.conf aliasing has broken #4517
Comments
Additional information: I was able to create pools using aliased devnames by manually creating links to full path block device names. (/dev/sda as opposed to ../../sda). In further testing I was able to get a zpool create process to work using standard vdev_id.conf aliasing where it hadn't worked before by running 'udevadm trigger' a couple times in a row, admittedly this was done out of frustration and subsequently stumbled into it making it work. At the time of running a couple successive 'udevadm trigger' routines I was getting messages like this in /var/log/messages: This test system is not multipath in its hardware configuration or software (no device-mapper-multipath) so I am not clear how a device or device label is being seen twice unless there is some sting parsing process that is broken. |
I think this may be caused by the way systemd scans and enumerates devices that differs from udev in RHEL/CentOS 6.x release. I've run through the process of creating (and failing) zpools and it appears that systemd doesn't create the device aliases that zfs needs during the zpool create process. It looks like systemd is always recreating device aliases and when it does so it looks like it has a problem with more than one partition device having the same partition name and it ends up seeing them as duplicates or just fails to create them in the time required during the 'zpool create' process. Just one 'zpool create' of a four drive raidz generates: And if you run blkid on all of those devices they all have the same partition label for partition 1, PARTLABEL="zfs". It looks like systemd doesn't like non-unique partition labels for the devices and partitions it finds. A potential resolution would be for 'zfs create' to issue a random and unique string for the partition label instead of "zfs". Maybe the right six digits of the UUID or something. Apologies for the rambling, I've been on this a while and it's late. |
When partitioning a device a name may be specified for each partion. Internally zfs doesn't use this partition name for anything so it has always just been set to "zfs". However this isn't optimal because udev will create symlinks using this name in /dev/disk/by-partlabel/. If the name isn't unique then all the links cannot be created. Therefore a random 64-bit value has been added to the partition label, i.e "zfs-1234567890abcdef". Additional information could be encoded here since partitions may be reused that could result in confusion and was decided against. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#4517
@AeonJJohnson I've opened PR #4523 with a proposed fix for this. It adds a randomly generated unique id to the end of each partition name. Could you verify it resolves the issue. |
@behlendorf #4523 resolves the systemd-udev errors appearing in /var/log/messages
However the zpool create process still fails. Something else involving systemd-udev and the directories it creates dev links in is interfering with zpool create. I will continue to dig into it. |
When partitioning a device a name may be specified for each partion. Internally zfs doesn't use this partition name for anything so it has always just been set to "zfs". However this isn't optimal because udev will create symlinks using this name in /dev/disk/by-partlabel/. If the name isn't unique then all the links cannot be created. Therefore a random 64-bit value has been added to the partition label, i.e "zfs-1234567890abcdef". Additional information could be encoded here but since partitions may be reused that could result in confusion and it was decided against. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#4517
When partitioning a device a name may be specified for each partion. Internally zfs doesn't use this partition name for anything so it has always just been set to "zfs". However this isn't optimal because udev will create symlinks using this name in /dev/disk/by-partlabel/. If the name isn't unique then all the links cannot be created. Therefore a random 64-bit value has been added to the partition label, i.e "zfs-1234567890abcdef". Additional information could be encoded here but since partitions may be reused that could result in confusion and it was decided against. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#4517
When ZFS partitions a block device it must wait for udev to create both a device node and all the device symlinks. This process takes a variable length of time and depends factors such how many links must be created, the complexity of the rules, etc. Complicating the situation further it is not uncommon for udev to create and then remove a link multiple times while processing the rules. Given the above, the existing scheme of waiting for an expected partition to appear by name isn't 100% reliable. At this point udev may still remove and recreate think link resulting in the kernel modules being unable to open the device. In order to address this the zpool_label_disk_wait() function has been updated to use libudev. Until the registered system device acknowledges that it in fully initialized the function will wait. Once fully initialized all device links are checked and allowed to settle for 50ms. This makes it far more certain that all the device nodes will existing when the kernel modules need to open them. For systems without libudev an alternate zpool_label_disk_wait() was implemented which includes a settle time. In addition, the kernel modules were updated to include retry logic for this ENOENT case. Due to the improved checks in the utilities it is unlikely this logic will be invoked, however in the rare event it is needed to will prevent a failure. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#3708 Issue openzfs#4077 Issue openzfs#4144 Issue openzfs#4214 Issue openzfs#4517
When ZFS partitions a block device it must wait for udev to create both a device node and all the device symlinks. This process takes a variable length of time and depends factors such how many links must be created, the complexity of the rules, etc. Complicating the situation further it is not uncommon for udev to create and then remove a link multiple times while processing the rules. Given the above, the existing scheme of waiting for an expected partition to appear by name isn't 100% reliable. At this point udev may still remove and recreate think link resulting in the kernel modules being unable to open the device. In order to address this the zpool_label_disk_wait() function has been updated to use libudev. Until the registered system device acknowledges that it in fully initialized the function will wait. Once fully initialized all device links are checked and allowed to settle for 50ms. This makes it far more certain that all the device nodes will existing when the kernel modules need to open them. For systems without libudev an alternate zpool_label_disk_wait() was implemented which includes a settle time. In addition, the kernel modules were updated to include retry logic for this ENOENT case. Due to the improved checks in the utilities it is unlikely this logic will be invoked, however in the rare event it is needed to will prevent a failure. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#3708 Issue openzfs#4077 Issue openzfs#4144 Issue openzfs#4214 Issue openzfs#4517
When partitioning a device a name may be specified for each partion. Internally zfs doesn't use this partition name for anything so it has always just been set to "zfs". However this isn't optimal because udev will create symlinks using this name in /dev/disk/by-partlabel/. If the name isn't unique then all the links cannot be created. Therefore a random 64-bit value has been added to the partition label, i.e "zfs-1234567890abcdef". Additional information could be encoded here but since partitions may be reused that could result in confusion and it was decided against. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#4517
When ZFS partitions a block device it must wait for udev to create both a device node and all the device symlinks. This process takes a variable length of time and depends factors such how many links must be created, the complexity of the rules, etc. Complicating the situation further it is not uncommon for udev to create and then remove a link multiple times while processing the rules. Given the above, the existing scheme of waiting for an expected partition to appear by name isn't 100% reliable. At this point udev may still remove and recreate think link resulting in the kernel modules being unable to open the device. In order to address this the zpool_label_disk_wait() function has been updated to use libudev. Until the registered system device acknowledges that it in fully initialized the function will wait. Once fully initialized all device links are checked and allowed to settle for 50ms. This makes it far more certain that all the device nodes will existing when the kernel modules need to open them. For systems without libudev an alternate zpool_label_disk_wait() was implemented which includes a settle time. In addition, the kernel modules were updated to include retry logic for this ENOENT case. Due to the improved checks in the utilities it is unlikely this logic will be invoked, however in the rare event it is needed to will prevent a failure. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#3708 Issue openzfs#4077 Issue openzfs#4144 Issue openzfs#4214 Issue openzfs#4517
When ZFS partitions a block device it must wait for udev to create both a device node and all the device symlinks. This process takes a variable length of time and depends on factors such how many links must be created, the complexity of the rules, etc. Complicating the situation further it is not uncommon for udev to create and then remove a link multiple times while processing the udev rules. Given the above, the existing scheme of waiting for an expected partition to appear by name isn't 100% reliable. At this point udev may still remove and recreate think link resulting in the kernel modules being unable to open the device. In order to address this the zpool_label_disk_wait() function has been updated to use libudev. Until the registered system device acknowledges that it in fully initialized the function will wait. Once fully initialized all device links are checked and allowed to settle for 50ms. This makes it far more likely that all the device nodes will exist when the kernel modules need to open them. For systems without libudev an alternate zpool_label_disk_wait() was updated to include a settle time. In addition, the kernel modules were updated to include retry logic for this ENOENT case. Due to the improved checks in the utilities it is unlikely this logic will be invoked. However, if the rare event it is needed it will prevent a failure. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes openzfs#4523 Closes openzfs#3708 Closes openzfs#4077 Closes openzfs#4144 Closes openzfs#4214 Closes openzfs#4517
When partitioning a device a name may be specified for each partition. Internally zfs doesn't use this partition name for anything so it has always just been set to "zfs". However this isn't optimal because udev will create symlinks using this name in /dev/disk/by-partlabel/. If the name isn't unique then all the links cannot be created. Therefore a random 64-bit value has been added to the partition label, i.e "zfs-1234567890abcdef". Additional information could be encoded here but since partitions may be reused that might result in confusion and it was decided against. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes openzfs#4517
When ZFS partitions a block device it must wait for udev to create both a device node and all the device symlinks. This process takes a variable length of time and depends on factors such how many links must be created, the complexity of the rules, etc. Complicating the situation further it is not uncommon for udev to create and then remove a link multiple times while processing the udev rules. Given the above, the existing scheme of waiting for an expected partition to appear by name isn't 100% reliable. At this point udev may still remove and recreate think link resulting in the kernel modules being unable to open the device. In order to address this the zpool_label_disk_wait() function has been updated to use libudev. Until the registered system device acknowledges that it in fully initialized the function will wait. Once fully initialized all device links are checked and allowed to settle for 50ms. This makes it far more likely that all the device nodes will exist when the kernel modules need to open them. For systems without libudev an alternate zpool_label_disk_wait() was updated to include a settle time. In addition, the kernel modules were updated to include retry logic for this ENOENT case. Due to the improved checks in the utilities it is unlikely this logic will be invoked. However, if the rare event it is needed it will prevent a failure. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes openzfs#4523 Closes openzfs#3708 Closes openzfs#4077 Closes openzfs#4144 Closes openzfs#4214 Closes openzfs#4517
When ZFS partitions a block device it must wait for udev to create both a device node and all the device symlinks. This process takes a variable length of time and depends on factors such how many links must be created, the complexity of the rules, etc. Complicating the situation further it is not uncommon for udev to create and then remove a link multiple times while processing the udev rules. Given the above, the existing scheme of waiting for an expected partition to appear by name isn't 100% reliable. At this point udev may still remove and recreate think link resulting in the kernel modules being unable to open the device. In order to address this the zpool_label_disk_wait() function has been updated to use libudev. Until the registered system device acknowledges that it in fully initialized the function will wait. Once fully initialized all device links are checked and allowed to settle for 50ms. This makes it far more likely that all the device nodes will exist when the kernel modules need to open them. For systems without libudev an alternate zpool_label_disk_wait() was updated to include a settle time. In addition, the kernel modules were updated to include retry logic for this ENOENT case. Due to the improved checks in the utilities it is unlikely this logic will be invoked. However, if the rare event it is needed it will prevent a failure. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes openzfs#4523 Closes openzfs#3708 Closes openzfs#4077 Closes openzfs#4144 Closes openzfs#4214 Closes openzfs#4517
When partitioning a device a name may be specified for each partition. Internally zfs doesn't use this partition name for anything so it has always just been set to "zfs". However this isn't optimal because udev will create symlinks using this name in /dev/disk/by-partlabel/. If the name isn't unique then all the links cannot be created. Therefore a random 64-bit value has been added to the partition label, i.e "zfs-1234567890abcdef". Additional information could be encoded here but since partitions may be reused that might result in confusion and it was decided against. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes openzfs#4517
When ZFS partitions a block device it must wait for udev to create both a device node and all the device symlinks. This process takes a variable length of time and depends on factors such how many links must be created, the complexity of the rules, etc. Complicating the situation further it is not uncommon for udev to create and then remove a link multiple times while processing the udev rules. Given the above, the existing scheme of waiting for an expected partition to appear by name isn't 100% reliable. At this point udev may still remove and recreate think link resulting in the kernel modules being unable to open the device. In order to address this the zpool_label_disk_wait() function has been updated to use libudev. Until the registered system device acknowledges that it in fully initialized the function will wait. Once fully initialized all device links are checked and allowed to settle for 50ms. This makes it far more likely that all the device nodes will exist when the kernel modules need to open them. For systems without libudev an alternate zpool_label_disk_wait() was updated to include a settle time. In addition, the kernel modules were updated to include retry logic for this ENOENT case. Due to the improved checks in the utilities it is unlikely this logic will be invoked. However, if the rare event it is needed it will prevent a failure. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Signed-off-by: Richard Laager <[email protected]> Closes openzfs#4523 Closes openzfs#3708 Closes openzfs#4077 Closes openzfs#4144 Closes openzfs#4214 Closes openzfs#4517
Greetings,
We have encountered a scenario where a zfs-0.6.5 installation using sas_direct aliasing is dysfunctional. We have also tried sas_switch and direct devlink aliasing in /etc/zfs/vdev_id.conf.
Any zfs operations to /dev/sdX devices work. Any zfs operations executed using /dev/disk/by-vdev aliases fails. Using other device aliases under /dev/disk/... fail as well (by-path/..., by-id/...)
The system in question is running CentOS 7.2 (updated) with kernel-3.10.0-327.13.1.el7.x86_64.
I would say this aligns with issue #4214 so this can be additional information as I have tried several hardware topologies and alias methods all with the same failing result.
The text was updated successfully, but these errors were encountered: