-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-16838 control: Fix dmg storage query usage with emulated NVMe #15545
Conversation
Ticket title is 'Fix dmg storage query usage with emulated NVMe' |
Test stage Build on Leap 15.5 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15545/1/execution/node/333/log |
Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15545/1/execution/node/273/log |
Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15545/1/execution/node/398/log |
Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15545/1/execution/node/395/log |
Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15545/1/execution/node/372/log |
Test stage Build on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15545/1/execution/node/521/log |
a3196ea
to
fe52d98
Compare
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15545/2/execution/node/1463/log |
fe52d98
to
c09dc7b
Compare
3045784
to
b46e32a
Compare
Features: control Required-githooks: true Signed-off-by: Tom Nabarro <[email protected]>
b46e32a
to
77bbb86
Compare
NLT failing on unrelated dfuse valgrind issue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. The comment is more of a question/suggestion.
// Mock identifier for emulated NVMe mode where devices have no PCI-address. | ||
addr = fmt.Sprintf("0000:00:0.%d", i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only question is whether this could possibly conflict with a real device, since it is used in the seenCtrlrs
map. I wonder if it would be better to use a "name" which could be either a PCI address for the real device, or the file name for emulated nvme.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In DPDK, PCI_ANY_ID
is defined as 0xffff
: https://doc.dpdk.org/api/rte__pci_8h.html#a53aca768a081fcf56089353d805ab77c
That also seems to match the kernel: https://elixir.bootlin.com/linux/v6.12.1/source/include/linux/mod_devicetable.h#L18
Might be better to use that value, as there is zero possibility of conflicting with a real vendor ID that way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so are you suggesting FFFF:00:0.%d
as the format string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, if the goal is to avoid any conflict with a device that might be found via the topology scan. I can't find an official PCI spec to cite, but from what I found about how linux and DPDK do things, that value is a "can't happen" value, i.e. there's no way that a valid device could show up with that vendor ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aren't we getting confused here between PCI-address and PCI-ID? https://wiki.debian.org/HowToIdentifyADevice/PCI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the first segment of the PCI address is the domain not the vendor ID afais
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've improved the resilience against collisions by using 0xffff
PCI-domain segment and 0xf
func value (neither would be used in a "real" NVMe device address as domain segment is either 0x10000
or 0x0000
in almost all allocated addresses and BDF func value is 0-7). Reference: https://dottedmag.net/blog/pci-basics/ .
…o-usagequery-fix Signed-off-by: Tom Nabarro <[email protected]>
Features: control Required-githooks: true Signed-off-by: Tom Nabarro <[email protected]>
Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15545/8/display/redirect |
Test stage Functional Hardware Medium Verbs Provider completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15545/8/testReport/ |
Fix a regression which prevents dmg storage query usage from
enumerating devices backed with emulated (AIO file or kdev) NVMe.
Required-githooks: true
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: