-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-7485 control: Implement system reint to act on all pools #15551
base: master
Are you sure you want to change the base?
Conversation
Ticket title is 'dmg command to drain and reintegrate nodes from all pools' |
Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15551/1/execution/node/357/log |
Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15551/1/execution/node/354/log |
Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15551/1/execution/node/273/log |
Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15551/1/execution/node/304/log |
Test stage Build on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15551/1/execution/node/480/log |
Test stage Build on Leap 15.5 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15551/1/execution/node/519/log |
Test stage Build on Leap 15.5 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15551/2/execution/node/375/log |
Test stage Build on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15551/2/execution/node/387/log |
Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15551/2/execution/node/360/log |
Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15551/2/execution/node/369/log |
Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15551/2/execution/node/359/log |
Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15551/2/execution/node/364/log |
Test stage Unit Test on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15551/5/testReport/ |
Required-githooks: true Signed-off-by: Tom Nabarro <[email protected]>
Required-githooks: true Signed-off-by: Tom Nabarro <[email protected]>
Required-githooks: true Signed-off-by: Tom Nabarro <[email protected]>
Required-githooks: true Signed-off-by: Tom Nabarro <[email protected]>
Required-githooks: true Signed-off-by: Tom Nabarro <[email protected]>
Required-githooks: true Signed-off-by: Tom Nabarro <[email protected]>
Features: control Required-githooks: true Signed-off-by: Tom Nabarro <[email protected]>
Features: control Required-githooks: true Signed-off-by: Tom Nabarro <[email protected]>
45f3e40
to
d96bbfd
Compare
Test stage Unit Test on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15551/6/testReport/ |
Features: control Required-githooks: true Signed-off-by: Tom Nabarro <[email protected]>
…-stack/daos into tanabarr/control-reintpools-pernode Features: control Required-githooks: true Signed-off-by: Tom Nabarro <tom.nabarrointel.com>
Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15551/7/execution/node/1211/log |
I suspect the copyright GHA is failing because the workflow is coming from a merge of this PR + master, but the source tree used is just this PR. Something I'll need to consider in the future for new GHA. Anyway, the copyright GHA is not required so can be ignored |
Test stage Functional Hardware Medium completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15551/8/testReport/ |
…intpools-pernode Features: pool Required-githooks: true Signed-off-by: Tom Nabarro <[email protected]>
Test stage Build on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15551/9/execution/node/372/log |
Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15551/9/execution/node/364/log |
Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15551/9/execution/node/361/log |
Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15551/9/execution/node/346/log |
Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15551/9/execution/node/367/log |
Test stage Build on Leap 15.5 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15551/9/execution/node/521/log |
Features: pool Required-githooks: true Signed-off-by: Tom Nabarro <[email protected]>
Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15551/10/execution/node/1130/log |
…intpools-pernode Signed-off-by: Tom Nabarro <tom.nabarrointel.com>
Features: control Required-githooks: true Signed-off-by: Tom Nabarro <[email protected]>
Features: control Required-githooks: true Signed-off-by: Tom Nabarro <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functionally, looks OK to me. I have some quibbles with the (re-)naming, though.
src/control/cmd/dmg/pretty/system.go
Outdated
return | ||
} | ||
|
||
func printSysOsaResults(out io.Writer, results []*control.SystemOsaResult) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This helper should have a comment. Also, what is Osa
? Maybe a better name for that whole concept and family of types/functions might be PoolMembershipUpdates
? Ultimately, the functionality is less about the system and more about the changes to pools, right?
@@ -134,7 +134,7 @@ func (m MgmtMethod) String() string { | |||
MethodPoolExclude: "PoolExclude", | |||
MethodPoolDrain: "PoolDrain", | |||
MethodPoolExtend: "PoolExtend", | |||
MethodPoolReintegrate: "PoolReintegrate", | |||
MethodPoolReint: "PoolReint", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the rename? The other methods generally use full words, and Reint
is less clear than Reintegrate
without context. Plus, it makes this PR much noisier than it would be without the rename.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's just long and sticks out in the C code because of its length, wanted to be consistent. is it a strong enough objection that you want me to revert all the changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These symbols are for humans, not computers. Truncating a word in an API surface because it formats inconveniently seems strange to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I think partly why long names stand out in C code is because of the conventions used in DAOS. E.g. the DAOS C code tends to do
my_return_var = some_long_function_name(some_long_argument_name1,
some_long_argument_name2,
some_long_argument_name3)
Whereas, IMO, for this reason and several others actually, I think this is much more reasonable
my_return_var = some_long_function_name(
some_long_argument_name1,
some_long_argument_name2,
some_long_argument_name3)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We all come from different programming experiences and it's always about striking a balance but I generally find that people coming from C backgrounds are uncomfortable with long variable names which people from other backgrounds may be more comfortable with. So when interacting with C engine code I tend to try to be considerate to the preferences of the relevant code owners. "Reint" is an abbreviation that's been used variously around
the code base and is relatively intuitive and unambiguous. @mjmac are you requesting that I revert the reame?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point with regard to the C code, but why does the Go code need to be truncated, too? Go norms focus on readability and maintainability. I don't have a problem with the C code conforming to the style in that codebase, but I do think the Go code should be reverted to the more verbose and readable conventions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
src/control/lib/control/system.go
Outdated
@@ -566,41 +566,49 @@ func SystemExclude(ctx context.Context, rpcClient UnaryInvoker, req *SystemExclu | |||
return resp, convertMSResponse(ur, resp) | |||
} | |||
|
|||
// SystemOsaResult describes the result of an OSA operation on a pool's ranks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As noted elsewhere, this is not a very human-friendly type name. It requires the reader to go figure out what Osa
or OSA
is. That's more of an internal DAOS name, IMO. Better to describe the types and functions in terms of why and how they would be used, e.g. SystemPoolMembershipUpdate
or something along those lines. Yes, it's more verbose, but this is a public API, and I don't think brevity is a virtue in public types and documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Membership is an overloaded term IMO but if you insist I can change to SystemPoolMembershipUpdate, is that required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put yourself in the shoes of an API user who has not been hacking on DAOS for a couple of years. What the heck is a/an Osa
? Sure, it can be looked up, but the bar for adding to an API surface should be higher than "it compiles and lets me move on to the next bug, go figure it out", right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm re-factoring to reduce duplication as per suggestion from @wangshilong and will rename appropriately
…failures Features: control Required-githooks: true Signed-off-by: Tom Nabarro <[email protected]>
…intpools-pernode Features: control Signed-off-by: Tom Nabarro <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Add dmg system reint command to reintegrate a set of storage nodes or
ranks from all the pools they belong to. Takes --ranks or --rank-hosts in
ranged format.
Required-githooks: true
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: