Skip to content

Commit

Permalink
DAOS-9247 control,bio: Add PCIe link speed and width to NVMe health s… (
Browse files Browse the repository at this point in the history
#14845)

* DAOS-9247 control,bio: Add PCIe link speed and width to NVMe health stats (#14395)

Add NVMe PCIe link speed and width details to the NVMe health stats
returned from dmg storage query list-devices --health and dmg storage
scan --nvme-health commands. The PCIe config space is fetched within
the engine SPDK process (which is functional even when NVMe device is
behind a VMD and bound to a userspace driver) and passed back to the
control-plane over dRPC as a byte-string. The byte-string is formatted
and passed to lspci to convert to human readable text. The output text
is parsed and relevant entries converted into health stat fields to be
printed in dmg output.

Signed-off-by: Tom Nabarro <[email protected]>
  • Loading branch information
tanabarr authored Aug 20, 2024
1 parent 13307e2 commit b6fc83e
Show file tree
Hide file tree
Showing 48 changed files with 2,195 additions and 2,005 deletions.
1 change: 1 addition & 0 deletions ci/unit/required_packages.sh
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ pkgs="argobots \
numactl-devel \
openmpi$OPENMPI_VER \
patchelf \
pciutils-devel \
pmix \
protobuf-c \
spdk-devel \
Expand Down
7 changes: 7 additions & 0 deletions debian/changelog
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
daos (2.6.0-5) unstable; urgency=medium
[ Tom Nabarro ]
* Add pciutils runtime dep for daos_server lspci call
* Add libpci-dev build dep for pciutils CGO bindings

-- Tom Nabarro <[email protected]>> Thu, 08 Aug 2024 12:00:00 -0000

daos (2.6.0-4) unstable; urgency=medium
[ Jerome Soumagne ]
* Bump mercury version to 2.4.0rc4
Expand Down
5 changes: 3 additions & 2 deletions debian/control
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,8 @@ Build-Depends: debhelper (>= 10),
python3-tabulate,
liblz4-dev,
libaio-dev,
libcapstone-dev
libcapstone-dev,
libpci-dev
Standards-Version: 4.1.2
Homepage: https://docs.daos.io/
Vcs-Git: https://github.com/daos-stack/daos.git
Expand Down Expand Up @@ -171,7 +172,7 @@ Package: daos-server
Section: net
Architecture: any
Multi-Arch: same
Depends: ${shlibs:Depends}, ${misc:Depends}, openmpi-bin,
Depends: ${shlibs:Depends}, ${misc:Depends}, openmpi-bin, pciutils,
ipmctl (>=03.00.00.0468), libfabric (>= 1.15.1-1), spdk-tools (>= 22.01.2)
Description: The Distributed Asynchronous Object Storage (DAOS) is an open-source
software-defined object store designed from the ground up for
Expand Down
2 changes: 1 addition & 1 deletion docs/admin/administration.md
Original file line number Diff line number Diff line change
Expand Up @@ -578,7 +578,7 @@ The engine's NVMe config (produced during format) then contains the following
JSON to apply the criteria:
```json
[tanabarr@wolf-310 ~]$ cat /mnt/daos0/daos_nvme.conf
cat /mnt/daos0/daos_nvme.conf
{
"daos_data": {
"config": [
Expand Down
23 changes: 16 additions & 7 deletions src/bio/bio_device.c
Original file line number Diff line number Diff line change
Expand Up @@ -382,6 +382,7 @@ struct pci_dev_opts {
bool finished;
int *socket_id;
char **pci_type;
char **pci_cfg;
int status;
};

Expand All @@ -391,6 +392,7 @@ pci_device_cb(void *ctx, struct spdk_pci_device *pci_device)
struct pci_dev_opts *opts = ctx;
const char *device_type;
int len;
int rc;

if (opts->status != 0)
return;
Expand Down Expand Up @@ -422,6 +424,13 @@ pci_device_cb(void *ctx, struct spdk_pci_device *pci_device)
opts->status = -DER_NOMEM;
return;
}

rc = spdk_pci_device_cfg_read(pci_device, *opts->pci_cfg, NVME_PCI_CFG_SPC_MAX_LEN, 0);
if (rc != 0) {
D_ERROR("Failed to read config space of device (%s)\n", spdk_strerror(-rc));
opts->status = -DER_INVAL;
return;
}
}

static int
Expand All @@ -443,6 +452,7 @@ fetch_pci_dev_info(struct nvme_ctrlr_t *w_ctrlr, const char *tr_addr)
opts.pci_addr = pci_addr;
opts.socket_id = &w_ctrlr->socket_id;
opts.pci_type = &w_ctrlr->pci_type;
opts.pci_cfg = &w_ctrlr->pci_cfg;

spdk_pci_for_each_device(&opts, pci_device_cb);

Expand Down Expand Up @@ -485,6 +495,10 @@ alloc_ctrlr_info(uuid_t dev_id, char *dev_name, struct bio_dev_info *b_info)
if (b_info->bdi_ctrlr->nss == NULL)
return -DER_NOMEM;

D_ALLOC(b_info->bdi_ctrlr->pci_cfg, NVME_PCI_CFG_SPC_MAX_LEN);
if (b_info->bdi_ctrlr->pci_cfg == NULL)
return -DER_NOMEM;

/* Namespace capacity by direct query of SPDK bdev object */
blk_sz = spdk_bdev_get_block_size(bdev);
nr_blks = spdk_bdev_get_num_blocks(bdev);
Expand All @@ -497,13 +511,8 @@ alloc_ctrlr_info(uuid_t dev_id, char *dev_name, struct bio_dev_info *b_info)
return rc;
}

/* Fetch socket ID and PCI device type by enumerating spdk_pci_device list */
rc = fetch_pci_dev_info(b_info->bdi_ctrlr, b_info->bdi_traddr);
if (rc != 0) {
return rc;
}

return 0;
/* Fetch PCI details by enumerating spdk_pci_device list */
return fetch_pci_dev_info(b_info->bdi_ctrlr, b_info->bdi_traddr);
}

int
Expand Down
Loading

0 comments on commit b6fc83e

Please sign in to comment.