-
Notifications
You must be signed in to change notification settings - Fork 302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-16840 control: Deprecate access_points in server config #15548
Conversation
Ticket title is 'Rename access_points in server configuration' |
614d0c5
to
d16f04d
Compare
Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15548/3/execution/node/1211/log |
be6ee34
to
fba7c3b
Compare
Test stage Functional Hardware Large completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15548/4/testReport/ |
fba7c3b
to
6de1563
Compare
The access_points name in the server configuration is an ongoing source of confusion. Rename it to the more descriptive mgmt_svc_replicas and emit deprecation notices for older configurations. Required-githooks: true Signed-off-by: Michael MacDonald <[email protected]>
Required-githooks: true Signed-off-by: Michael MacDonald <[email protected]>
Required-githooks: true
6de1563
to
d0aeddb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ftest LGTM but would be good for @phender to review too
Required-githooks: true Signed-off-by: Michael MacDonald <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. This PR needs to be run through CI with Features: control
though.
@@ -327,7 +327,7 @@ sudo ipcrm -M 0x10242049 | |||
1. When the server configuration is changed, it's necessary to restart the agent. | |||
1. `DER_UNREACH(-1006)`: Check the socket ID consistency between PMem and NVMe. First, determine which socket you're using with `daos_server network scan -p all`. e.g., if the interface you're using in the engine section is eth0, find which NUMA Socket it belongs to. Next, determine the disks you can use with this socket by calling `daos_server nvme scan` or `dmg storage scan`. e.g., if eth0 belongs to NUMA Socket 0, use only the disks with 0 in the Socket ID column. | |||
1. Check the interface used in the server config (`fabric_iface`) also exists in the client and can communicate with the server. | |||
1. Check the access_points of the agent config points to the correct server host. | |||
1. Check the `access_points` of the agent config points to the correct server hosts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be a good idea to clarify here that access_points
should be a subset of the mgmt_svc_replicas
.
I wonder if it would also be a good time to add some documentation indicating the behavior under various circumstances, i.e. if the MS leader isn't in the list for some reason, or if the nodes in the list aren't replicas at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to put those implementation details in the user documentation. In practice, it really doesn't matter which server addresses go in the agent config. If anything, I'd be inclined to change the guidance to just use a hostlist that covers all of the servers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To expand a bit on this comment, as I've noted elsewhere, longer-term I think we want to move away from exposing a fixed set of server addresses as "special" to the agent. Keeping this somewhat vague now gives us flexibility to change the server side of things without requiring changes to the agent config process.
I've pushed the latest commit with Features: control
. I was under the impression that the test harness was smart enough to automatically test all changed .py files now, but maybe not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only smart enough when you modify specific functional tests.
- modify specific functional test -> run that test
- modify ftest/util -> run
pr
- not smart! - modify dfs -> run
dfs
- slightly smart - anything else -> run
pr
- not smart!
The mapping is here
https://github.com/daos-stack/daos/blob/master/src/tests/ftest/tags.yaml
This hasn't gotten much attention since I originally added it (I haven't had time), but we could add something like this e.g. Or whatever you think makes sense for the code you regularly edit.
But keep in mind with this mapping, even if you modify a single line comment, all control
tests will be included. So devs should address this for their feature areas.
# Run "control" for files under "src/control/"
# Run "control" and "security" for files under "src/control/security"
src/control/: control
src/control/security/: security
The way to override this behavior would be to use Test-tag: <specific tests I want to run>
Test-tag
overrides whatever CI auto-determined.
Features
is on top of whatever CI auto-determined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor cleanup of example configs.
Features: control Required-githooks: true Signed-off-by: Michael MacDonald <[email protected]>
98085a4
to
ae45d06
Compare
Test stage Unit Test on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15548/9/testReport/ |
Features: control Required-githooks: true Signed-off-by: Michael MacDonald <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ftest LGTM - will reserve approval for those familiar with the rest of the code
The access_points name in the server configuration is
an ongoing source of confusion. Rename it to the more
descriptive mgmt_svc_replicas and emit deprecation
notices for older configurations.
Required-githooks: true
Signed-off-by: Michael MacDonald [email protected]