-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-16650 control: dmg system exclude, update group version #15288
Conversation
With this change, when a daos administrator runs dmg system exclude for a given set of engines, the system map version / cart primary group version will be updated. In turn, daos_engines will more immediately detect the "loss" of the administratively excluded engines, update pool maps and perform rebuild. This change supports a use case of a proactive exclusion of ranks that are expected to be impacted by planned maintenance that would cut off connectivity to certain engines. Features: control ms_membership Signed-off-by: Kenneth Cain <[email protected]>
Ticket title is 'DAOS 2.6.0 : 128 Node cluster, when user tried to start the few stopped engines, Different engine got crashed.' |
Skip-nlt: true Skip-func-test: true Signed-off-by: Kenneth Cain <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@daos-stack/daos-gatekeeper the control plane implementation change was tested in Jenkins build 1 https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15288/1/pipeline-graph/ , and a modification to the unit test only code was tested in build 2 (that skipped functional testing due to the test-only code change) https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15288/2/pipeline-graph/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Comments are minor, no need to bother unless you're repushing anyway.
|
||
startMapVer, err := svc.sysdb.CurMapVersion() | ||
if err != nil { | ||
t.Fatalf("startMapVer CurMapVersion() failed\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit, no newline needed at the end of the message
for { | ||
curMapVer, err := svc.sysdb.CurMapVersion() | ||
if err != nil { | ||
t.Fatalf("CurMapVersion() failed\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above
FYI skipping stages on the final commit means the PR requires force landing. But in this case I can see it was just unit tests modified. |
With this change, when a daos administrator runs dmg system exclude for a given set of engines, the system map version / cart primary group version will be updated. In turn, daos_engines will more immediately detect the "loss" of the administratively excluded engines, update pool maps and perform rebuild. This change supports a use case of a proactive exclusion of ranks that are expected to be impacted by planned maintenance that would cut off connectivity to certain engines. Features: control ms_membership Signed-off-by: Kenneth Cain <[email protected]>
With this change, when a daos administrator runs dmg system exclude for a given set of engines, the system map version / cart primary group version will be updated. In turn, daos_engines will more immediately detect the "loss" of the administratively excluded engines, update pool maps and perform rebuild. This change supports a use case of a proactive exclusion of ranks that are expected to be impacted by planned maintenance that would cut off connectivity to certain engines. skip-nlt: true Features: control ms_membership Signed-off-by: Kenneth Cain <[email protected]>
…#15349) With this change, when a daos administrator runs dmg system exclude for a given set of engines, the system map version / cart primary group version will be updated. In turn, daos_engines will more immediately detect the "loss" of the administratively excluded engines, update pool maps and perform rebuild. This change supports a use case of a proactive exclusion of ranks that are expected to be impacted by planned maintenance that would cut off connectivity to certain engines. Signed-off-by: Kenneth Cain <[email protected]>
With this change, when a daos administrator runs dmg system exclude for a given set of engines, the system map version / cart primary group version will be updated. In turn, daos_engines will more immediately detect the "loss" of the administratively excluded engines, update pool maps and perform rebuild. This change supports a use case of a proactive exclusion of ranks that are expected to be impacted by planned maintenance that would cut off connectivity to certain engines.
Features: control ms_membership
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: