-
Notifications
You must be signed in to change notification settings - Fork 434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get device/vendor ids of SF from its parent PCI #5964
Conversation
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
src/uct/ib/base/ib_device.h
Outdated
@@ -56,6 +56,9 @@ | |||
#define UCT_IB_DEVICE_SYSFS_GID_TYPE_FMT UCT_IB_DEVICE_SYSFS_GID_ATTR_PFX "/types/%d" | |||
#define UCT_IB_DEVICE_SYSFS_GID_NDEV_FMT UCT_IB_DEVICE_SYSFS_GID_ATTR_PFX "/ndevs/%d" | |||
|
|||
#define UCT_IB_PCI_BUS_ID_LEN 7 /* PCI BUS ID */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe use NAME_MAX constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yosefe We need the real length of 0000:03 which is 7 to memcpy 0000:03 from 0000:03:00.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yosefe Or is it better to use something like strstr(":") to get the length of 0000:03 then we don't need to define monstant here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better parse it to components
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yosefe Maybe no need to do this because the pci bus id and format is pretty fixed. No need to add more code and logic to parse it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yosefe Fixed.
bot:pipe:retest |
src/uct/ib/base/ib_device.c
Outdated
/* get 0000:03 from 0000:03:00.0*/ | ||
memcpy(pci_bus, bus_id, UCT_IB_PCI_BUS_ID_LEN); | ||
pci_bus[UCT_IB_PCI_BUS_ID_LEN] = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we unite common code with uct_ib_device_get_sys_dev ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yosefe This is not common code with uct_ib_device_get_sys_dev. uct_ib_device_get_sys_dev doesn't need to use 0000:03. It only needs 0000:03:00.0.
src/uct/ib/base/ib_device.c
Outdated
status = ucs_read_file_number(&vendor_id, 1, UCT_IB_DEVICE_SYSFS_PCI_BUS_FMT, | ||
pci_bus, bus_id, "vendor"); | ||
if (status != UCS_OK) { | ||
return status; | ||
} | ||
|
||
/* /sys/class/pci_bus/0000:03/device/0000:03:00.0/device */ | ||
status = ucs_read_file_number(&device_id, 1, UCT_IB_DEVICE_SYSFS_PCI_BUS_FMT, | ||
pci_bus, bus_id, "device"); | ||
if (status != UCS_OK) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we unite common code with 549-552?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yosefe Tried, but seems not simply to do it. The SYSFS path format is much different for both places, the contents and path layers are different. Need much more code to deal with the diffrence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yosefe Fixed.
src/uct/ib/base/ib_device.c
Outdated
return status; | ||
} | ||
|
||
bus_id = basename(dirname(dirname(dirname(path_buffer)))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check return values from dirname
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yosefe Fixed.
src/uct/ib/base/ib_device.h
Outdated
@@ -56,6 +56,9 @@ | |||
#define UCT_IB_DEVICE_SYSFS_GID_TYPE_FMT UCT_IB_DEVICE_SYSFS_GID_ATTR_PFX "/types/%d" | |||
#define UCT_IB_DEVICE_SYSFS_GID_NDEV_FMT UCT_IB_DEVICE_SYSFS_GID_ATTR_PFX "/ndevs/%d" | |||
|
|||
#define UCT_IB_PCI_BUS_ID_LEN 7 /* PCI BUS ID */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better parse it to components
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
bot:pipe:retest |
1 similar comment
bot:pipe:retest |
bb4245e
to
abcf45e
Compare
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
src/uct/ib/base/ib_device.c
Outdated
return UCS_OK; | ||
} | ||
|
||
char* uct_ib_device_get_pcie_bus(char *path_buffer, uint8_t layers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add function to UCS:
ucs_dirname(chat *dir_path, int num_layers)
pls add doxygen and small gtest for this function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yosefe Done.
src/uct/ib/base/ib_device.c
Outdated
return basename(path_buffer); | ||
} | ||
|
||
static ucs_status_t uct_ib_device_get_ids_from_path(char *path, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const char *path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yosefe Fixed.
src/uct/ib/base/ib_device.c
Outdated
return status; | ||
} | ||
|
||
bus_id = uct_ib_device_get_pcie_bus(path_buffer, 3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like 'bus_id' here, after stripping 3 directory components, already points to the SF
$ cat $(realpath /sys/class/infiniband/mlx5_0)/../../../vendor
0x15b3
$ cat $(realpath /sys/class/infiniband/mlx5_0)/../../../device
0xa2d6
$ cat $(realpath /sys/class/infiniband/mlx5_0)/../../../current_link_speed
16 GT/s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yosefe Fixed.
bot:pipe:retest |
src/ucs/sys/string.c
Outdated
@@ -199,6 +200,17 @@ ucs_status_t ucs_str_to_memunits(const char *buf, void *dest) | |||
return UCS_OK; | |||
} | |||
|
|||
char *ucs_dirname(char *path, int num_layers) | |||
{ | |||
while (num_layers--) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
> 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yosefe Should not have problem when num_layers==0 since "--" is after num_layers. When num_layers==0, won't go into while loop, will only return path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, github re-formatted my comment
i've meant:
while (num_layers-- > 0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yosefe Fixed.
src/uct/ib/base/ib_device.c
Outdated
for (i = 0; i < 2; i++) { | ||
ids_path = ucs_dirname(path_buffer, nums_layers[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call ucs_dirname twice, without loop
also, need to reset path_buffer before 2nd call because ucs_dirname may change it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. You detected a bug which caused by my fixing on PF but didn't retest on SF. Thanks!
src/uct/ib/base/ib_device.c
Outdated
for (i = 0; i < 2; i++) { | ||
ids_path = ucs_dirname(path_buffer, nums_layers[i]); | ||
if (ids_path != NULL) { | ||
if (uct_ib_device_get_ids_from_path(ids_path, &vendor_id, &device_id) == UCS_OK) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe pass &dev->pci_id.vendor and &dev->pci_id.device?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yosefe Can't pass them directly since the variable types are different with the function parameters. Have changed to another way.
pcie_bus = ucs_dirname(path_buffer, 2); | ||
if (pcie_bus == NULL) { | ||
return UCS_SYS_DEVICE_ID_UNKNOWN; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space line after
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yosefe Fixed.
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/* SF: realpath name is of form /sys/devices/.../0000:03:00.0/<UUID>/infiniband/mlx5_0 */ | ||
|
||
status = uct_ib_device_get_path_buffer(dev, path_buffer); | ||
if (status == UCS_OK) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (status != UCS_OK) {
goto not_found;
}
...
not_found:
dev->pci_id.vendor = 0;
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yosefe Done.
src/uct/ib/base/ib_device.c
Outdated
if (status == UCS_OK) { | ||
/* PF: strip 2 layers. */ | ||
ids_path = ucs_dirname(path_buffer, 2); | ||
if (ids_path != NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (ids_path == NULL) {
goto not_found;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yosefe Fixed.
src/uct/ib/base/ib_device.c
Outdated
/* PF: strip 2 layers. */ | ||
ids_path = ucs_dirname(path_buffer, 2); | ||
if (ids_path != NULL) { | ||
if (uct_ib_device_get_ids_from_path(ids_path, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
status = uct_ib_device_get_ids_from_path(...)
if (status == UCS_OK) {
....
return;
}
/* SF: strip 3 layers (1 more layer than PF). */
ids_path = ucs_dirname(path_buffer, 1);
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yosefe Fixed.
src/uct/ib/base/ib_device.c
Outdated
if (status == UCS_OK) { | ||
ucs_debug("SF: %s vendor_id: 0x%x device_id: %d", uct_ib_device_name(dev), | ||
dev->pci_id.vendor, dev->pci_id.device); | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls fix indent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yosefe Fixed.
src/uct/ib/base/ib_device.h
Outdated
#define UCT_IB_DEVICE_SYSFS_PCI_PFX "/sys/class/pci_bus/%s" | ||
#define UCT_IB_DEVICE_SYSFS_PCI_PATH_FMT UCT_IB_DEVICE_SYSFS_PCI_PFX "/device/%s" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems unused now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, according latest implementation they are unused. Removed. Thanks!
src/uct/ib/base/ib_device.h
Outdated
@@ -51,10 +51,13 @@ | |||
#define UCT_IB_DEFAULT_ROCEV2_DSCP 106 /* Default DSCP for RoCE v2 */ | |||
#define UCT_IB_ROCE_UDP_SRC_PORT_BASE 0xC000 | |||
#define UCT_IB_DEVICE_SYSFS_PFX "/sys/class/infiniband/%s" | |||
#define UCT_IB_DEVICE_SYSFS_PATH_FMT UCT_IB_DEVICE_SYSFS_PFX "/device" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yosefe Removed. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls squash
Done. |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
What
Get device/vendor ids of SF from its parent PCI.
Why ?
We can't get device/vendor ids of SF from its sysfs directly, so we get the ids from SF's parent PCI.
How ?
Get the parent PCI of SF, then get the device/vendor ids of parent PCI. The SF has the same device/vendor ids as its parent PCI.