Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using vdev_id with sas_direct and sas_switch #4214

Closed
abuschmann opened this issue Jan 13, 2016 · 2 comments
Closed

using vdev_id with sas_direct and sas_switch #4214

abuschmann opened this issue Jan 13, 2016 · 2 comments

Comments

@abuschmann
Copy link

I have set up zfs on a Supermicro 2027R-AR24NV running CentOS 7.2
It has three LSI SAS3008 to connect the 24 disks on the front side.
I have added an LSI SAS2008 to connect an external SAS array, where currently only 8 sas disks are populated. (This array is on loan for testing.)

I currently can not use the names in /dev/disk/by-vdev, but that is supposed to be a problem with the systemd/CentOS integration. My workaround is to use the /dev/sdX names, which is acceptable for testing.

The issue is "What can be done with vdev_id and vdev_id.conf to enable the use of the two topologies sas_direct and sas_switch in one system?"

Do you have any ideas on this issue?
(Is it relevant enough?)

Do you have a design definition for /usr/lib/udev/vdev_id or a specification on how it can be called and what I have to consider if I try to rework it?
I know it is called /usr/lib/udev/vdev_id -d device_name from udev.

I am currently not sure on how to differentiate between devices on sas switches and on direct attached ones.
The old direct attached sata disks have pathes like:

/devices/pci0000:00/0000:00:02.2/0000:04:00.0/host0/port-0:0/end_device-0:0/target0:0:0/0:0:0:0/block/sda

The new ones are:

/devices/pci0000:80/0000:80:02.0/0000:82:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:0/end_device-1:1:0/target1:0:0/1:0:0:0/block/sdi
/devices/pci0000:80/0000:80:02.0/0000:82:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:1/end_device-1:1:1/target1:0:2/1:0:2:0/block/sdj
/devices/pci0000:80/0000:80:02.0/0000:82:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:2/end_device-1:1:2/target1:0:3/1:0:3:0/block/sdk
/devices/pci0000:80/0000:80:02.0/0000:82:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:3/end_device-1:1:3/target1:0:4/1:0:4:0/block/sdl
/devices/pci0000:80/0000:80:02.0/0000:82:00.0/host1/port-1:0/expander-1:0/port-1:0:1/expander-1:2/port-1:2:0/end_device-1:2:0/target1:0:6/1:0:6:0/block/sdm
/devices/pci0000:80/0000:80:02.0/0000:82:00.0/host1/port-1:0/expander-1:0/port-1:0:1/expander-1:2/port-1:2:1/end_device-1:2:1/target1:0:7/1:0:7:0/block/sdn
/devices/pci0000:80/0000:80:02.0/0000:82:00.0/host1/port-1:0/expander-1:0/port-1:0:1/expander-1:2/port-1:2:2/end_device-1:2:2/target1:0:8/1:0:8:0/block/sdo
/devices/pci0000:80/0000:80:02.0/0000:82:00.0/host1/port-1:0/expander-1:0/port-1:0:1/expander-1:2/port-1:2:3/end_device-1:2:3/target1:0:9/1:0:9:0/block/sdp

Are these pathes normal, or are there other constructions?
(If the pathes are normal, I could switch on the expander part of the path.)

As an additional note, the pci_ids of some of the existing sas cards changed after I added an extra ethernet card. Is that normal behaviour?

From a logical point of view, I have to find out, what the pci_id is before doing anything else. Do I?

@stormeporm
Copy link

I use it to give vdev names matching the disk numbers on the case.
If you have a HBA you could use the automatic version but I've not tested it. This is a jbod arrangement on ubuntu and I've used the pci path.
http://zfsonlinux.org/faq.html#HowDoISetupVdevIdConf

Part of my /etc/zfs/vdev_id.conf:

# Generate a mapping so the disk names match the numbers on the casing.
# Numbers starting with an F are in the front of the machine.
# Numbers starting with an B ar in the back of the machine.
# Put this config in /etc/zfs/vdev_id.conf

# make sure to run "udevadm trigger" to update the /dev/disk/by-vdev/ list once each time you change this file

alias F0 pci-0000:01:00.0-sas-phy3-lun-0
alias F1 pci-0000:01:00.0-sas-phy2-lun-0
alias F2 pci-0000:01:00.0-sas-phy1-lun-0
alias F3 pci-0000:01:00.0-sas-phy0-lun-0
alias F4 pci-0000:01:00.0-sas-phy7-lun-0
alias F5 pci-0000:01:00.0-sas-phy6-lun-0
alias F6 pci-0000:01:00.0-sas-phy5-lun-0
alias F7 pci-0000:01:00.0-sas-phy4-lun-0

alias F8 pci-0000:02:00.0-sas-phy3-lun-0
alias F9 pci-0000:02:00.0-sas-phy2-lun-0
alias F10 pci-0000:02:00.0-sas-phy1-lun-0
alias F11 pci-0000:02:00.0-sas-phy0-lun-0
alias F12 pci-0000:02:00.0-sas-phy7-lun-0
alias F13 pci-0000:02:00.0-sas-phy6-lun-0
alias F14 pci-0000:02:00.0-sas-phy5-lun-0
alias F15 pci-0000:02:00.0-sas-phy4-lun-0

@stormeporm
Copy link

Some more clarification

In my case the pci path of sdc is
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/host0/phy-0:0/port/end_device-0:1/target0:0:1/0:0:1:0/block/sdc
Is in the alias file
pci-0000:01:00.0-sas-phy0-lun-0

pci path of sdd is
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/host0/phy-0:1/port/end_device-0:2/target0:0:2/0:0:2:0/block/sdd
Is in the alias file
pci-0000:01:00.0-sas-phy1-lun-0

This is on ubuntu and a LSI SAS2308

Whithout an HBA only link aliases like above work.
Since I dont have an HBA I dont know if the non-multipath, multipath and switch configurations work.
The nice way of using the pci path is when you swap a drive it wil get the same alias.

behlendorf added a commit to behlendorf/zfs that referenced this issue Apr 20, 2016
When ZFS partitions a block device it must wait for udev to create
both a device node and all the device symlinks.  This process takes
a variable length of time and depends factors such how many links
must be created, the complexity of the rules, etc.  Complicating
the situation further it is not uncommon for udev to create and
then remove a link multiple times while processing the rules.

Given the above, the existing scheme of waiting for an expected
partition to appear by name isn't 100% reliable.  At this point
udev may still remove and recreate think link resulting in the
kernel modules being unable to open the device.

In order to address this the zpool_label_disk_wait() function
has been updated to use libudev.  Until the registered system
device acknowledges that it in fully initialized the function
will wait.  Once fully initialized all device links are checked
and allowed to settle for 50ms.  This makes it far more certain
that all the device nodes will existing when the kernel modules
need to open them.

For systems without libudev an alternate zpool_label_disk_wait()
was implemented which includes a settle time.  In addition, the
kernel modules were updated to include retry logic for this
ENOENT case.  Due to the improved checks in the utilities it
is unlikely this logic will be invoked, however in the rare
event it is needed to will prevent a failure.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#3708
Issue openzfs#4077
Issue openzfs#4144
Issue openzfs#4214
Issue openzfs#4517
behlendorf added a commit to behlendorf/zfs that referenced this issue Apr 20, 2016
When ZFS partitions a block device it must wait for udev to create
both a device node and all the device symlinks.  This process takes
a variable length of time and depends factors such how many links
must be created, the complexity of the rules, etc.  Complicating
the situation further it is not uncommon for udev to create and
then remove a link multiple times while processing the rules.

Given the above, the existing scheme of waiting for an expected
partition to appear by name isn't 100% reliable.  At this point
udev may still remove and recreate think link resulting in the
kernel modules being unable to open the device.

In order to address this the zpool_label_disk_wait() function
has been updated to use libudev.  Until the registered system
device acknowledges that it in fully initialized the function
will wait.  Once fully initialized all device links are checked
and allowed to settle for 50ms.  This makes it far more certain
that all the device nodes will existing when the kernel modules
need to open them.

For systems without libudev an alternate zpool_label_disk_wait()
was implemented which includes a settle time.  In addition, the
kernel modules were updated to include retry logic for this
ENOENT case.  Due to the improved checks in the utilities it
is unlikely this logic will be invoked, however in the rare
event it is needed to will prevent a failure.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#3708
Issue openzfs#4077
Issue openzfs#4144
Issue openzfs#4214
Issue openzfs#4517
behlendorf added a commit to behlendorf/zfs that referenced this issue Apr 22, 2016
When ZFS partitions a block device it must wait for udev to create
both a device node and all the device symlinks.  This process takes
a variable length of time and depends factors such how many links
must be created, the complexity of the rules, etc.  Complicating
the situation further it is not uncommon for udev to create and
then remove a link multiple times while processing the rules.

Given the above, the existing scheme of waiting for an expected
partition to appear by name isn't 100% reliable.  At this point
udev may still remove and recreate think link resulting in the
kernel modules being unable to open the device.

In order to address this the zpool_label_disk_wait() function
has been updated to use libudev.  Until the registered system
device acknowledges that it in fully initialized the function
will wait.  Once fully initialized all device links are checked
and allowed to settle for 50ms.  This makes it far more certain
that all the device nodes will existing when the kernel modules
need to open them.

For systems without libudev an alternate zpool_label_disk_wait()
was implemented which includes a settle time.  In addition, the
kernel modules were updated to include retry logic for this
ENOENT case.  Due to the improved checks in the utilities it
is unlikely this logic will be invoked, however in the rare
event it is needed to will prevent a failure.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#3708
Issue openzfs#4077
Issue openzfs#4144
Issue openzfs#4214
Issue openzfs#4517
nedbass pushed a commit to nedbass/zfs that referenced this issue May 6, 2016
When ZFS partitions a block device it must wait for udev to create
both a device node and all the device symlinks.  This process takes
a variable length of time and depends on factors such how many links
must be created, the complexity of the rules, etc.  Complicating
the situation further it is not uncommon for udev to create and
then remove a link multiple times while processing the udev rules.

Given the above, the existing scheme of waiting for an expected
partition to appear by name isn't 100% reliable.  At this point
udev may still remove and recreate think link resulting in the
kernel modules being unable to open the device.

In order to address this the zpool_label_disk_wait() function
has been updated to use libudev.  Until the registered system
device acknowledges that it in fully initialized the function
will wait.  Once fully initialized all device links are checked
and allowed to settle for 50ms.  This makes it far more likely
that all the device nodes will exist when the kernel modules
need to open them.

For systems without libudev an alternate zpool_label_disk_wait()
was updated to include a settle time.  In addition, the kernel
modules were updated to include retry logic for this ENOENT case.
Due to the improved checks in the utilities it is unlikely this
logic will be invoked.  However, if the rare event it is needed
it will prevent a failure.

Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Signed-off-by: Richard Laager <[email protected]>
Closes openzfs#4523
Closes openzfs#3708
Closes openzfs#4077
Closes openzfs#4144
Closes openzfs#4214
Closes openzfs#4517
nedbass pushed a commit to nedbass/zfs that referenced this issue May 6, 2016
When ZFS partitions a block device it must wait for udev to create
both a device node and all the device symlinks.  This process takes
a variable length of time and depends on factors such how many links
must be created, the complexity of the rules, etc.  Complicating
the situation further it is not uncommon for udev to create and
then remove a link multiple times while processing the udev rules.

Given the above, the existing scheme of waiting for an expected
partition to appear by name isn't 100% reliable.  At this point
udev may still remove and recreate think link resulting in the
kernel modules being unable to open the device.

In order to address this the zpool_label_disk_wait() function
has been updated to use libudev.  Until the registered system
device acknowledges that it in fully initialized the function
will wait.  Once fully initialized all device links are checked
and allowed to settle for 50ms.  This makes it far more likely
that all the device nodes will exist when the kernel modules
need to open them.

For systems without libudev an alternate zpool_label_disk_wait()
was updated to include a settle time.  In addition, the kernel
modules were updated to include retry logic for this ENOENT case.
Due to the improved checks in the utilities it is unlikely this
logic will be invoked.  However, if the rare event it is needed
it will prevent a failure.

Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Signed-off-by: Richard Laager <[email protected]>
Closes openzfs#4523
Closes openzfs#3708
Closes openzfs#4077
Closes openzfs#4144
Closes openzfs#4214
Closes openzfs#4517
ryao pushed a commit to ClusterHQ/zfs that referenced this issue Jun 7, 2016
When ZFS partitions a block device it must wait for udev to create
both a device node and all the device symlinks.  This process takes
a variable length of time and depends on factors such how many links
must be created, the complexity of the rules, etc.  Complicating
the situation further it is not uncommon for udev to create and
then remove a link multiple times while processing the udev rules.

Given the above, the existing scheme of waiting for an expected
partition to appear by name isn't 100% reliable.  At this point
udev may still remove and recreate think link resulting in the
kernel modules being unable to open the device.

In order to address this the zpool_label_disk_wait() function
has been updated to use libudev.  Until the registered system
device acknowledges that it in fully initialized the function
will wait.  Once fully initialized all device links are checked
and allowed to settle for 50ms.  This makes it far more likely
that all the device nodes will exist when the kernel modules
need to open them.

For systems without libudev an alternate zpool_label_disk_wait()
was updated to include a settle time.  In addition, the kernel
modules were updated to include retry logic for this ENOENT case.
Due to the improved checks in the utilities it is unlikely this
logic will be invoked.  However, if the rare event it is needed
it will prevent a failure.

Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Signed-off-by: Richard Laager <[email protected]>
Closes openzfs#4523
Closes openzfs#3708
Closes openzfs#4077
Closes openzfs#4144
Closes openzfs#4214
Closes openzfs#4517
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants