Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get device/vendor ids of SF from its parent PCI #5964

Merged
merged 1 commit into from
Jan 7, 2021

Conversation

leibin2014
Copy link
Contributor

What

Get device/vendor ids of SF from its parent PCI.

Why ?

We can't get device/vendor ids of SF from its sysfs directly, so we get the ids from SF's parent PCI.

How ?

Get the parent PCI of SF, then get the device/vendor ids of parent PCI. The SF has the same device/vendor ids as its parent PCI.

@leibin2014
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

src/uct/ib/base/ib_device.c Outdated Show resolved Hide resolved
src/uct/ib/base/ib_device.c Show resolved Hide resolved
src/uct/ib/base/ib_device.c Show resolved Hide resolved
src/uct/ib/base/ib_device.c Outdated Show resolved Hide resolved
src/uct/ib/base/ib_device.c Outdated Show resolved Hide resolved
src/uct/ib/base/ib_device.c Outdated Show resolved Hide resolved
src/uct/ib/base/ib_device.c Outdated Show resolved Hide resolved
@@ -56,6 +56,9 @@
#define UCT_IB_DEVICE_SYSFS_GID_TYPE_FMT UCT_IB_DEVICE_SYSFS_GID_ATTR_PFX "/types/%d"
#define UCT_IB_DEVICE_SYSFS_GID_NDEV_FMT UCT_IB_DEVICE_SYSFS_GID_ATTR_PFX "/ndevs/%d"

#define UCT_IB_PCI_BUS_ID_LEN 7 /* PCI BUS ID */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use NAME_MAX constant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yosefe We need the real length of 0000:03 which is 7 to memcpy 0000:03 from 0000:03:00.0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yosefe Or is it better to use something like strstr(":") to get the length of 0000:03 then we don't need to define monstant here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better parse it to components

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yosefe Maybe no need to do this because the pci bus id and format is pretty fixed. No need to add more code and logic to parse it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yosefe Fixed.

@yosefe
Copy link
Contributor

yosefe commented Dec 4, 2020

bot:pipe:retest

Comment on lines 521 to 523
/* get 0000:03 from 0000:03:00.0*/
memcpy(pci_bus, bus_id, UCT_IB_PCI_BUS_ID_LEN);
pci_bus[UCT_IB_PCI_BUS_ID_LEN] = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we unite common code with uct_ib_device_get_sys_dev ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yosefe This is not common code with uct_ib_device_get_sys_dev. uct_ib_device_get_sys_dev doesn't need to use 0000:03. It only needs 0000:03:00.0.

Comment on lines 526 to 535
status = ucs_read_file_number(&vendor_id, 1, UCT_IB_DEVICE_SYSFS_PCI_BUS_FMT,
pci_bus, bus_id, "vendor");
if (status != UCS_OK) {
return status;
}

/* /sys/class/pci_bus/0000:03/device/0000:03:00.0/device */
status = ucs_read_file_number(&device_id, 1, UCT_IB_DEVICE_SYSFS_PCI_BUS_FMT,
pci_bus, bus_id, "device");
if (status != UCS_OK) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we unite common code with 549-552?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yosefe Tried, but seems not simply to do it. The SYSFS path format is much different for both places, the contents and path layers are different. Need much more code to deal with the diffrence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yosefe Fixed.

return status;
}

bus_id = basename(dirname(dirname(dirname(path_buffer))));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check return values from dirname

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yosefe Fixed.

@@ -56,6 +56,9 @@
#define UCT_IB_DEVICE_SYSFS_GID_TYPE_FMT UCT_IB_DEVICE_SYSFS_GID_ATTR_PFX "/types/%d"
#define UCT_IB_DEVICE_SYSFS_GID_NDEV_FMT UCT_IB_DEVICE_SYSFS_GID_ATTR_PFX "/ndevs/%d"

#define UCT_IB_PCI_BUS_ID_LEN 7 /* PCI BUS ID */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better parse it to components

@leibin2014
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@leibin2014
Copy link
Contributor Author

bot:pipe:retest

1 similar comment
@leibin2014
Copy link
Contributor Author

bot:pipe:retest

@leibin2014 leibin2014 force-pushed the bf2 branch 2 times, most recently from bb4245e to abcf45e Compare December 26, 2020 14:43
@leibin2014
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@leibin2014
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@leibin2014
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

return UCS_OK;
}

char* uct_ib_device_get_pcie_bus(char *path_buffer, uint8_t layers)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add function to UCS:
ucs_dirname(chat *dir_path, int num_layers)
pls add doxygen and small gtest for this function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yosefe Done.

return basename(path_buffer);
}

static ucs_status_t uct_ib_device_get_ids_from_path(char *path,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const char *path

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yosefe Fixed.

return status;
}

bus_id = uct_ib_device_get_pcie_bus(path_buffer, 3);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like 'bus_id' here, after stripping 3 directory components, already points to the SF

$ cat $(realpath /sys/class/infiniband/mlx5_0)/../../../vendor
0x15b3
$ cat $(realpath /sys/class/infiniband/mlx5_0)/../../../device
0xa2d6
$ cat $(realpath /sys/class/infiniband/mlx5_0)/../../../current_link_speed
16 GT/s

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yosefe Fixed.

@leibin2014
Copy link
Contributor Author

bot:pipe:retest

@@ -199,6 +200,17 @@ ucs_status_t ucs_str_to_memunits(const char *buf, void *dest)
return UCS_OK;
}

char *ucs_dirname(char *path, int num_layers)
{
while (num_layers--) {
Copy link
Contributor

@yosefe yosefe Jan 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

> 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yosefe Should not have problem when num_layers==0 since "--" is after num_layers. When num_layers==0, won't go into while loop, will only return path.

Copy link
Contributor

@yosefe yosefe Jan 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, github re-formatted my comment
i've meant:
while (num_layers-- > 0)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yosefe Fixed.

Comment on lines 537 to 538
for (i = 0; i < 2; i++) {
ids_path = ucs_dirname(path_buffer, nums_layers[i]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call ucs_dirname twice, without loop
also, need to reset path_buffer before 2nd call because ucs_dirname may change it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. You detected a bug which caused by my fixing on PF but didn't retest on SF. Thanks!

for (i = 0; i < 2; i++) {
ids_path = ucs_dirname(path_buffer, nums_layers[i]);
if (ids_path != NULL) {
if (uct_ib_device_get_ids_from_path(ids_path, &vendor_id, &device_id) == UCS_OK) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe pass &dev->pci_id.vendor and &dev->pci_id.device?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yosefe Can't pass them directly since the variable types are different with the function parameters. Have changed to another way.

pcie_bus = ucs_dirname(path_buffer, 2);
if (pcie_bus == NULL) {
return UCS_SYS_DEVICE_ID_UNKNOWN;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space line after

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yosefe Fixed.

@leibin2014
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@leibin2014
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

/* SF: realpath name is of form /sys/devices/.../0000:03:00.0/<UUID>/infiniband/mlx5_0 */

status = uct_ib_device_get_path_buffer(dev, path_buffer);
if (status == UCS_OK) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    if (status != UCS_OK) {
        goto not_found;
    }
    ...

not_found:
    dev->pci_id.vendor = 0;
    ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yosefe Done.

if (status == UCS_OK) {
/* PF: strip 2 layers. */
ids_path = ucs_dirname(path_buffer, 2);
if (ids_path != NULL) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (ids_path == NULL) {
    goto not_found;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yosefe Fixed.

/* PF: strip 2 layers. */
ids_path = ucs_dirname(path_buffer, 2);
if (ids_path != NULL) {
if (uct_ib_device_get_ids_from_path(ids_path,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

status = uct_ib_device_get_ids_from_path(...)
if (status == UCS_OK) {
    ....
    return;
}

/* SF: strip 3 layers (1 more layer than PF). */
ids_path = ucs_dirname(path_buffer, 1);
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yosefe Fixed.

if (status == UCS_OK) {
ucs_debug("SF: %s vendor_id: 0x%x device_id: %d", uct_ib_device_name(dev),
dev->pci_id.vendor, dev->pci_id.device);
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls fix indent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yosefe Fixed.

Comment on lines 59 to 60
#define UCT_IB_DEVICE_SYSFS_PCI_PFX "/sys/class/pci_bus/%s"
#define UCT_IB_DEVICE_SYSFS_PCI_PATH_FMT UCT_IB_DEVICE_SYSFS_PCI_PFX "/device/%s"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems unused now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, according latest implementation they are unused. Removed. Thanks!

@@ -51,10 +51,13 @@
#define UCT_IB_DEFAULT_ROCEV2_DSCP 106 /* Default DSCP for RoCE v2 */
#define UCT_IB_ROCE_UDP_SRC_PORT_BASE 0xC000
#define UCT_IB_DEVICE_SYSFS_PFX "/sys/class/infiniband/%s"
#define UCT_IB_DEVICE_SYSFS_PATH_FMT UCT_IB_DEVICE_SYSFS_PFX "/device"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yosefe Removed. Thanks!

Copy link
Contributor

@yosefe yosefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls squash

@leibin2014
Copy link
Contributor Author

pls squash

Done.

@leibin2014
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@yosefe yosefe merged commit e843d3e into openucx:master Jan 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants