Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-16840 control: Deprecate access_points in server config #15548

Merged
merged 6 commits into from
Dec 17, 2024

Conversation

mjmac
Copy link
Contributor

@mjmac mjmac commented Dec 2, 2024

The access_points name in the server configuration is
an ongoing source of confusion. Rename it to the more
descriptive mgmt_svc_replicas and emit deprecation
notices for older configurations.

Required-githooks: true

Signed-off-by: Michael MacDonald [email protected]

Copy link

github-actions bot commented Dec 2, 2024

Ticket title is 'Rename access_points in server configuration'
Status is 'Open'
https://daosio.atlassian.net/browse/DAOS-16840

@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15548/3/execution/node/1211/log

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Large completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15548/4/testReport/

mjmac added 3 commits December 5, 2024 15:45
The access_points name in the server configuration is
an ongoing source of confusion. Rename it to the more
descriptive mgmt_svc_replicas and emit deprecation
notices for older configurations.

Required-githooks: true

Signed-off-by: Michael MacDonald <[email protected]>
Required-githooks: true

Signed-off-by: Michael MacDonald <[email protected]>
@mjmac mjmac marked this pull request as ready for review December 6, 2024 14:57
@mjmac mjmac requested review from a team as code owners December 6, 2024 14:57
Copy link
Contributor

@daltonbohning daltonbohning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ftest LGTM but would be good for @phender to review too

Required-githooks: true

Signed-off-by: Michael MacDonald <[email protected]>
Copy link
Contributor

@kjacque kjacque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. This PR needs to be run through CI with Features: control though.

@@ -327,7 +327,7 @@ sudo ipcrm -M 0x10242049
1. When the server configuration is changed, it's necessary to restart the agent.
1. `DER_UNREACH(-1006)`: Check the socket ID consistency between PMem and NVMe. First, determine which socket you're using with `daos_server network scan -p all`. e.g., if the interface you're using in the engine section is eth0, find which NUMA Socket it belongs to. Next, determine the disks you can use with this socket by calling `daos_server nvme scan` or `dmg storage scan`. e.g., if eth0 belongs to NUMA Socket 0, use only the disks with 0 in the Socket ID column.
1. Check the interface used in the server config (`fabric_iface`) also exists in the client and can communicate with the server.
1. Check the access_points of the agent config points to the correct server host.
1. Check the `access_points` of the agent config points to the correct server hosts.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be a good idea to clarify here that access_points should be a subset of the mgmt_svc_replicas.

I wonder if it would also be a good time to add some documentation indicating the behavior under various circumstances, i.e. if the MS leader isn't in the list for some reason, or if the nodes in the list aren't replicas at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to put those implementation details in the user documentation. In practice, it really doesn't matter which server addresses go in the agent config. If anything, I'd be inclined to change the guidance to just use a hostlist that covers all of the servers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To expand a bit on this comment, as I've noted elsewhere, longer-term I think we want to move away from exposing a fixed set of server addresses as "special" to the agent. Keeping this somewhat vague now gives us flexibility to change the server side of things without requiring changes to the agent config process.

I've pushed the latest commit with Features: control. I was under the impression that the test harness was smart enough to automatically test all changed .py files now, but maybe not?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only smart enough when you modify specific functional tests.

  • modify specific functional test -> run that test
  • modify ftest/util -> run pr - not smart!
  • modify dfs -> run dfs - slightly smart
  • anything else -> run pr - not smart!

The mapping is here
https://github.com/daos-stack/daos/blob/master/src/tests/ftest/tags.yaml

This hasn't gotten much attention since I originally added it (I haven't had time), but we could add something like this e.g. Or whatever you think makes sense for the code you regularly edit.
But keep in mind with this mapping, even if you modify a single line comment, all control tests will be included. So devs should address this for their feature areas.

# Run "control" for files under "src/control/"
# Run "control" and "security" for files under "src/control/security"
src/control/: control
src/control/security/: security

The way to override this behavior would be to use Test-tag: <specific tests I want to run>
Test-tag overrides whatever CI auto-determined.
Features is on top of whatever CI auto-determined.

Copy link
Contributor

@phender phender left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor cleanup of example configs.

src/tests/ftest/util/agent_utils_params.py Show resolved Hide resolved
utils/config/examples/daos_server_mdonssd.yml Outdated Show resolved Hide resolved
Features: control
Required-githooks: true
Signed-off-by: Michael MacDonald <[email protected]>
@daosbuild1
Copy link
Collaborator

Test stage Unit Test on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15548/9/testReport/

Features: control
Required-githooks: true

Signed-off-by: Michael MacDonald <[email protected]>
Copy link
Contributor

@daltonbohning daltonbohning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ftest LGTM - will reserve approval for those familiar with the rest of the code

@mjmac mjmac merged commit b020347 into master Dec 17, 2024
50 of 51 checks passed
@mjmac mjmac deleted the mjmac/DAOS-16840 branch December 17, 2024 20:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants