diff --git a/docs/QSG/qemu-vms.md b/docs/QSG/qemu-vms.md index 108d89295d9..3531a82fb3a 100644 --- a/docs/QSG/qemu-vms.md +++ b/docs/QSG/qemu-vms.md @@ -199,10 +199,10 @@ I follow these [steps](https://docs.daos.io/latest/QSG/setup_rhel/) to install b 5. Update config files. -Update the daos-server config file `/etc/daos/daos_server.yml` on daos-server. You may need to update "access\_points", "fabric\_iface" and "bdev\_list". Update "access\_points" accordingly if you name daos-server differently. Check if the network device has the same name as listed under "fabric\_iface". Look in the output of `lspci` for "bdev\_list". The info for our NVMe controller is like *??:??:? Non-Volatile memory controller: Red Hat, Inc. QEMU NVM Express Controller (rev 02)*. Prefix *??:??.?* is the address of the NVMe devices. +Update the daos-server config file `/etc/daos/daos_server.yml` on daos-server. You may need to update "mgmt\_svc\_replicas", "fabric\_iface" and "bdev\_list". Update "mgmt\_svc\_replicas" accordingly if you name daos-server differently. Check if the network device has the same name as listed under "fabric\_iface". Look in the output of `lspci` for "bdev\_list". The info for our NVMe controller is like *??:??:? Non-Volatile memory controller: Red Hat, Inc. QEMU NVM Express Controller (rev 02)*. Prefix *??:??.?* is the address of the NVMe devices. ``` name: daos_server -access_points: +mgmt_svc_replicas: - daos-server port: 10001 diff --git a/docs/QSG/setup_rhel.md b/docs/QSG/setup_rhel.md index 9ed73ce8ec8..dbe855ae762 100644 --- a/docs/QSG/setup_rhel.md +++ b/docs/QSG/setup_rhel.md @@ -273,7 +273,7 @@ Examples are available on [github](https://github.com/daos-stack/daos/tree/maste name: daos_server - access_points: + mgmt_svc_replicas: - server-1 port: 10001 diff --git a/docs/QSG/setup_suse.md b/docs/QSG/setup_suse.md index c05b30945ac..32500dfe4fb 100644 --- a/docs/QSG/setup_suse.md +++ b/docs/QSG/setup_suse.md @@ -292,7 +292,7 @@ Examples are available on [github](https://github.com/daos-stack/daos/tree/maste An example of the daos_server.yml is presented below. Copy the modified server yaml file to all the server nodes at `/etc/daos/daos_server.yml`. name: daos_server - access_points: + mgmt_svc_replicas: - node-4 port: 10001 diff --git a/docs/QSG/tour.md b/docs/QSG/tour.md index e3254ac25cd..78dd290cc03 100644 --- a/docs/QSG/tour.md +++ b/docs/QSG/tour.md @@ -223,7 +223,7 @@ bring-up DAOS servers and clients. ### Run dfuse # Bring up 4 hosts server with appropriate daos_server.yml and - # access-point, reference to DAOS Set-Up + # MS replicas, reference to DAOS Set-Up # After DAOS servers, DAOS admin and client started. $ dmg storage format diff --git a/docs/admin/administration.md b/docs/admin/administration.md index 9859baebd2a..049e98eb81c 100644 --- a/docs/admin/administration.md +++ b/docs/admin/administration.md @@ -825,9 +825,9 @@ device would remain in this state until replaced by a new device. ## System Operations -The DAOS server acting as the access point records details of engines -that join the DAOS system. Once an engine has joined the DAOS system, it is -identified by a unique system "rank". Multiple ranks can reside on the same +The DAOS server acting as the Management Service (MS) leader records details +of engines that join the DAOS system. Once an engine has joined the DAOS system, +it is identified by a unique system "rank". Multiple ranks can reside on the same host machine, accessible via the same network address. A DAOS system can be shutdown and restarted to perform maintenance and/or @@ -837,14 +837,14 @@ made to the rank's metadata stored on persistent memory. Storage reformat can also be performed after system shutdown. Pools will be removed and storage wiped. -System commands will be handled by a DAOS Server acting as access point and +System commands will be handled by a DAOS Server acting as the MS leader and listening on the address specified in the DMG config file "hostlist" parameter. See [`daos_control.yml`](https://github.com/daos-stack/daos/blob/master/utils/config/daos_control.yml) for details. At least one of the addresses in the hostlist parameters should match one of the -"access point" addresses specified in the server config file +`mgmt_svc_replicas` addresses specified in the server config file [`daos_server.yml`](https://github.com/daos-stack/daos/blob/master/utils/config/daos_server.yml) that is supplied when starting `daos_server` instances. @@ -1028,13 +1028,15 @@ formatted again by running `dmg storage format`. To add a new server to an existing DAOS system, one should install: -- the relevant certificates -- the server yaml file pointing to the access points of the running - DAOS system +- A copy of the relevant certificates from an existing server. All servers must + share the same set of certificates in order to provide services. +- A copy of the server yaml file from an existing server (DAOS server configurations + should be homogeneous) -- the `mgmt_svc_replicas` entry is used by the new server in + order to know which servers should handle its SystemJoin request. The daos\_control.yml file should also be updated to include the new DAOS server. -Then starts the daos\_server via systemd and format the new server via +Then start the daos\_server via systemd and format the new server via dmg as follows: ``` diff --git a/docs/admin/common_tasks.md b/docs/admin/common_tasks.md index 065edd13d11..fef3dbe9e60 100644 --- a/docs/admin/common_tasks.md +++ b/docs/admin/common_tasks.md @@ -9,7 +9,7 @@ This section describes some of the common tasks handled by admins at a high leve 3. Install `daos-server` and `daos-client` RPMs. 4. Generate certificate files. 5. Copy one of the example configs from `utils/config/examples` to -`/etc/daos` and adjust it based on the environment. E.g., `access_points`, +`/etc/daos` and adjust it based on the environment. E.g., `mgmt_svc_replicas`, `class`. 6. Check that the directory where the log files will be created exists. E.g., `control_log_file`, `log_file` field in `engines` section. @@ -38,7 +38,7 @@ to server hosts and `daos-client` to client hosts. 4. Generate certificate files and distribute them to all the hosts. 5. Copy one of the example configs from `utils/config/examples` to `/etc/daos` of one of the server hosts and adjust it based on the environment. -E.g., `access_points`, `class`. +E.g., `mgmt_svc_replicas`, `class`. 6. Check that the directory where the log files will be created exists. E.g., `control_log_file`, `log_file` field in `engines` section. 7. Start `daos_server`. diff --git a/docs/admin/deployment.md b/docs/admin/deployment.md index 0f5c128b1f4..e318397a28d 100644 --- a/docs/admin/deployment.md +++ b/docs/admin/deployment.md @@ -50,7 +50,7 @@ A recommended workflow to get up and running is as follows: server config file (default location at `/etc/daos/daos_server.yml`) has not yet been populated. -* Run `dmg config generate -l -a ` across the entire +* Run `dmg config generate -l -r ` across the entire hostset (all the storage servers that are now running the `daos_server` service after RPM install). The command will only generate a config if hardware setups on all the hosts are @@ -285,7 +285,7 @@ Help Options: [generate command options] -l, --helper-log-file= Log file location for debug from daos_server_helper binary - -a, --access-points= Comma separated list of access point addresses + -r, --ms-replicas= Comma separated list of MS replica addresses (default: localhost) -e, --num-engines= Set the number of DAOS Engine sections to be populated in the config file output. If unset then the value will be set to the @@ -331,7 +331,7 @@ Help Options: [generate command options] -l, --host-list= A comma separated list of addresses to connect to - -a, --access-points= Comma separated list of access point addresses + -r, --ms-replicas= Comma separated list of MS replica addresses to host management service (default: localhost) -e, --num-engines= Set the number of DAOS Engine sections to be populated in the config file output. If unset then the value will be set to the @@ -371,8 +371,8 @@ engines: The options that can be supplied to the config generate command are as follows: -- `--access-points` specifies the access points (identified storage servers that will host the -management service for the DAOS system across the cluster). +- `--ms-replicas` specifies the MS replicas (identified storage servers that will host the +Management Service for the DAOS system across the cluster). - `--num-engines` specifies the number of engine sections to populate in the config file output. If not set explicitly on the commandline, default is the number of NUMA nodes detected on the host. @@ -502,7 +502,7 @@ core_dump_filter: 19 name: daos_server socket_dir: /var/run/daos_server provider: ofi+tcp -access_points: +mgmt_svc_replicas: - localhost:10001 fault_cb: "" hyperthreads: false @@ -515,7 +515,7 @@ and runs until the point where a storage format is required, as expected. [user@wolf-226 daos]$ install/bin/daos_server start -i -o ~/configs/tmp.yml DAOS Server config loaded from /home/user/configs/tmp.yml install/bin/daos_server logging to file /tmp/daos_server.log -NOTICE: Configuration includes only one access point. This provides no redundancy in the event of an access point failure. +NOTICE: Configuration includes only one MS replica. This provides no redundancy in the event of a MS replica failure. DAOS Control Server v2.3.101 (pid 1211553) listening on 127.0.0.1:10001 Checking DAOS I/O Engine instance 0 storage ... Checking DAOS I/O Engine instance 1 storage ... @@ -821,8 +821,6 @@ To set the addresses of which DAOS Servers to task, provide either: Where `` represents a slurm-style hostlist string e.g. `foo-1[28-63],bar[256-511]`. -The first entry in the hostlist (after alphabetic then numeric sorting) will be -assumed to be the access point as set in the server configuration file. Local configuration files stored in the user directory will be used in preference to the default location e.g. `~/.daos_control.yml`. @@ -1322,7 +1320,7 @@ as follows to establish 2-tier storage: ```yaml port: 10001 -access_points: ["wolf-71"] # <----- updated +mgmt_svc_replicas: ["wolf-71"] # <----- updated engines: - @@ -1367,10 +1365,10 @@ information, please refer to the [DAOS build documentation][6]. DAOS Control Servers will need to be restarted on all hosts after updates to the server configuration file. - Pick an odd number of hosts in the system and set `access_points` to list of that host's - hostname or IP address (don't need to specify port). + Pick an odd number (3-7) of hosts in the system and set the `mgmt_svc_replicas` list to + include the hostnames or IP addresses (don't need to specify port) of those hosts. - This will be the host which bootstraps the DAOS management service (MS). + This will be the set of servers which host the replicated DAOS management service (MS). >The support of the optional providers is not guarantee and can be removed >without further notification. diff --git a/docs/admin/troubleshooting.md b/docs/admin/troubleshooting.md index 79173e782f7..5de7b95412d 100644 --- a/docs/admin/troubleshooting.md +++ b/docs/admin/troubleshooting.md @@ -313,7 +313,7 @@ sudo ipcrm -M 0x10242049 1. Format the SCMs defined in the config file. 1. Generate the config file using `dmg config generate`. The various requirements will be populated without a syntax error. 1. Try starting with `allow_insecure: true`. This will rule out the credential certificate issue. -1. Verify that the `access_points` host is accessible and the port is not used. +1. Verify that the `mgmt_svc_replicas` host is accessible and the port is not used. 1. Check the `provider` entry. See the "Network Scan and Configuration" section of the admin guide for determining the right provider to use. 1. Check `fabric_iface` in `engines`. They should be available and enabled. 1. Check that `socket_dir` is writable by the daos_server. @@ -327,7 +327,7 @@ sudo ipcrm -M 0x10242049 1. When the server configuration is changed, it's necessary to restart the agent. 1. `DER_UNREACH(-1006)`: Check the socket ID consistency between PMem and NVMe. First, determine which socket you're using with `daos_server network scan -p all`. e.g., if the interface you're using in the engine section is eth0, find which NUMA Socket it belongs to. Next, determine the disks you can use with this socket by calling `daos_server nvme scan` or `dmg storage scan`. e.g., if eth0 belongs to NUMA Socket 0, use only the disks with 0 in the Socket ID column. 1. Check the interface used in the server config (`fabric_iface`) also exists in the client and can communicate with the server. -1. Check the access_points of the agent config points to the correct server host. +1. Check the `access_points` of the agent config points to the correct server hosts. 1. Call `daos pool query` and check that the pool exists and has free space. ### Applications run slow @@ -512,7 +512,7 @@ fabric providers. After starting `daos_server`, ranks will be unable to join if their configuration's fabric provider does not match that of the system. The system configuration is determined by the management service -(MS) leader node, which may be arbitrarily chosen from the configured access points. +(MS) leader node, which may be arbitrarily chosen from the configured MS replicas. The error message will include the string: `fabric provider does not match system provider ` diff --git a/src/control/cmd/daos_server/README.md b/src/control/cmd/daos_server/README.md index 86f652faed8..bd8be525790 100644 --- a/src/control/cmd/daos_server/README.md +++ b/src/control/cmd/daos_server/README.md @@ -120,7 +120,7 @@ The control API is responsible for working out which `daos_server` instance is the MS leader and issuing the request, `dmg` uses the control API. The `dmg` tool requires the hostlist of all hosts in the DAOS system to be specified either on the command line or in the `daos_control.yml` config file. -A list of access point servers is defined in the server's config file. +A list of MS replica servers is defined in the server's config file. ## Functionality diff --git a/src/control/cmd/daos_server/auto.go b/src/control/cmd/daos_server/auto.go index d1c28a3f5d0..b2f7f690b6c 100644 --- a/src/control/cmd/daos_server/auto.go +++ b/src/control/cmd/daos_server/auto.go @@ -122,6 +122,8 @@ func (cmd *configGenCmd) confGen(ctx context.Context, getFabric getFabricFn, get } cmd.Debugf("fetched host storage info on localhost: %+v", hs) + cmd.CheckDeprecated(cmd.Logger) + req := new(control.ConfGenerateReq) if err := convert.Types(cmd, req); err != nil { return nil, err diff --git a/src/control/cmd/daos_server/auto_test.go b/src/control/cmd/daos_server/auto_test.go index 8fb8d85522f..2008c41c30b 100644 --- a/src/control/cmd/daos_server/auto_test.go +++ b/src/control/cmd/daos_server/auto_test.go @@ -32,11 +32,11 @@ import ( func TestDaosServer_Auto_Commands(t *testing.T) { runCmdTests(t, []cmdTest{ { - "Generate with no access point", + "Generate with no MS replica", "config generate", printCommand(t, func() *configGenCmd { cmd := &configGenCmd{} - cmd.AccessPoints = "localhost" + cmd.MgmtSvcReplicas = "localhost" cmd.NetClass = "infiniband" return cmd }()), @@ -44,10 +44,10 @@ func TestDaosServer_Auto_Commands(t *testing.T) { }, { "Generate with defaults", - "config generate -a foo", + "config generate -r foo", printCommand(t, func() *configGenCmd { cmd := &configGenCmd{} - cmd.AccessPoints = "foo" + cmd.MgmtSvcReplicas = "foo" cmd.NetClass = "infiniband" return cmd }()), @@ -55,10 +55,10 @@ func TestDaosServer_Auto_Commands(t *testing.T) { }, { "Generate with no nvme", - "config generate -a foo --scm-only", + "config generate -r foo --scm-only", printCommand(t, func() *configGenCmd { cmd := &configGenCmd{} - cmd.AccessPoints = "foo" + cmd.MgmtSvcReplicas = "foo" cmd.NetClass = "infiniband" cmd.SCMOnly = true return cmd @@ -67,10 +67,10 @@ func TestDaosServer_Auto_Commands(t *testing.T) { }, { "Generate with storage parameters", - "config generate -a foo --num-engines 2", + "config generate -r foo --num-engines 2", printCommand(t, func() *configGenCmd { cmd := &configGenCmd{} - cmd.AccessPoints = "foo" + cmd.MgmtSvcReplicas = "foo" cmd.NetClass = "infiniband" cmd.NrEngines = 2 return cmd @@ -79,10 +79,10 @@ func TestDaosServer_Auto_Commands(t *testing.T) { }, { "Generate with short option storage parameters", - "config generate -a foo -e 2 -s", + "config generate -r foo -e 2 -s", printCommand(t, func() *configGenCmd { cmd := &configGenCmd{} - cmd.AccessPoints = "foo" + cmd.MgmtSvcReplicas = "foo" cmd.NetClass = "infiniband" cmd.NrEngines = 2 cmd.SCMOnly = true @@ -92,10 +92,10 @@ func TestDaosServer_Auto_Commands(t *testing.T) { }, { "Generate with ethernet network device class", - "config generate -a foo --net-class ethernet", + "config generate -r foo --net-class ethernet", printCommand(t, func() *configGenCmd { cmd := &configGenCmd{} - cmd.AccessPoints = "foo" + cmd.MgmtSvcReplicas = "foo" cmd.NetClass = "ethernet" return cmd }()), @@ -103,10 +103,10 @@ func TestDaosServer_Auto_Commands(t *testing.T) { }, { "Generate with infiniband network device class", - "config generate -a foo --net-class infiniband", + "config generate -r foo --net-class infiniband", printCommand(t, func() *configGenCmd { cmd := &configGenCmd{} - cmd.AccessPoints = "foo" + cmd.MgmtSvcReplicas = "foo" cmd.NetClass = "infiniband" return cmd }()), @@ -114,22 +114,22 @@ func TestDaosServer_Auto_Commands(t *testing.T) { }, { "Generate with deprecated network device class", - "config generate -a foo --net-class best-available", + "config generate -r foo --net-class best-available", "", errors.New("Invalid value"), }, { "Generate with unsupported network device class", - "config generate -a foo --net-class loopback", + "config generate -r foo --net-class loopback", "", errors.New("Invalid value"), }, { "Generate tmpfs non-MD-on-SSD config", - "config generate -a foo --use-tmpfs-scm", + "config generate -r foo --use-tmpfs-scm", printCommand(t, func() *configGenCmd { cmd := &configGenCmd{} - cmd.AccessPoints = "foo" + cmd.MgmtSvcReplicas = "foo" cmd.NetClass = "infiniband" cmd.UseTmpfsSCM = true return cmd @@ -138,10 +138,10 @@ func TestDaosServer_Auto_Commands(t *testing.T) { }, { "Generate MD-on-SSD config", - "config generate -a foo --use-tmpfs-scm --control-metadata-path /opt/daos_md", + "config generate -r foo --use-tmpfs-scm --control-metadata-path /opt/daos_md", printCommand(t, func() *configGenCmd { cmd := &configGenCmd{} - cmd.AccessPoints = "foo" + cmd.MgmtSvcReplicas = "foo" cmd.NetClass = "infiniband" cmd.UseTmpfsSCM = true cmd.ExtMetadataPath = "/opt/daos_md" @@ -169,7 +169,7 @@ func TestDaosServer_Auto_confGenCmd_Convert(t *testing.T) { cmd.NrEngines = 1 cmd.NetProvider = "ofi+tcp" cmd.SCMOnly = true - cmd.AccessPoints = "foo,bar" + cmd.MgmtSvcReplicas = "foo,bar" cmd.NetClass = "infiniband" cmd.UseTmpfsSCM = true cmd.ExtMetadataPath = "/opt/daos_md" @@ -184,7 +184,7 @@ func TestDaosServer_Auto_confGenCmd_Convert(t *testing.T) { NrEngines: 1, NetProvider: "ofi+tcp", SCMOnly: true, - AccessPoints: []string{"foo", "bar"}, + MgmtSvcReplicas: []string{"foo", "bar"}, NetClass: hardware.Infiniband, UseTmpfsSCM: true, ExtMetadataPath: "/opt/daos_md", @@ -273,7 +273,7 @@ func TestDaosServer_Auto_confGen(t *testing.T) { } for name, tc := range map[string]struct { - accessPoints string + msReplicas string nrEngines int scmOnly bool netClass string @@ -338,16 +338,16 @@ func TestDaosServer_Auto_confGen(t *testing.T) { hf: defHostFabric, hs: defHostStorage, expCfg: control.MockServerCfg("ofi+psm2", exmplEngineCfgs). - WithAccessPoints("localhost:10001"). + WithMgmtSvcReplicas("localhost:10001"). WithControlLogFile("/tmp/daos_server.log"), }, - "access points set": { - accessPoints: "moon-111,mars-115,jupiter-119", - hf: defHostFabric, - hs: defHostStorage, + "MS replicas set": { + msReplicas: "moon-111,mars-115,jupiter-119", + hf: defHostFabric, + hs: defHostStorage, expCfg: control.MockServerCfg("ofi+psm2", exmplEngineCfgs). - WithAccessPoints("localhost:10001"). - WithAccessPoints("moon-111:10001", "mars-115:10001", "jupiter-119:10001"). + WithMgmtSvcReplicas("localhost:10001"). + WithMgmtSvcReplicas("moon-111:10001", "mars-115:10001", "jupiter-119:10001"). WithControlLogFile("/tmp/daos_server.log"), }, "unmet min nr ssds": { @@ -391,7 +391,7 @@ func TestDaosServer_Auto_confGen(t *testing.T) { }, }, expCfg: control.MockServerCfg("ofi+psm2", tmpfsEngineCfgs). - WithAccessPoints("localhost:10001"). + WithMgmtSvcReplicas("localhost:10001"). WithControlLogFile("/tmp/daos_server.log"), }, "dcpm scm; control_metadata path set": { @@ -418,7 +418,7 @@ func TestDaosServer_Auto_confGen(t *testing.T) { }, }, expCfg: control.MockServerCfg("ofi+psm2", mdOnSSDEngineCfgs). - WithAccessPoints("localhost:10001"). + WithMgmtSvcReplicas("localhost:10001"). WithControlLogFile("/tmp/daos_server.log"). WithControlMetadata(controlMetadata), }, @@ -510,7 +510,7 @@ func TestDaosServer_Auto_confGen(t *testing.T) { WithBdevDeviceList("0000:97:00.5", "0000:e2:00.5"), ), }). - WithAccessPoints("localhost:10001"). + WithMgmtSvcReplicas("localhost:10001"). WithControlLogFile("/tmp/daos_server.log"), }, } { @@ -522,12 +522,12 @@ func TestDaosServer_Auto_confGen(t *testing.T) { if tc.netClass == "" { tc.netClass = "infiniband" } - if tc.accessPoints == "" { - tc.accessPoints = "localhost" + if tc.msReplicas == "" { + tc.msReplicas = "localhost" } cmd := &configGenCmd{} - cmd.AccessPoints = tc.accessPoints + cmd.MgmtSvcReplicas = tc.msReplicas cmd.NrEngines = tc.nrEngines cmd.SCMOnly = tc.scmOnly cmd.NetClass = tc.netClass @@ -573,7 +573,10 @@ func TestDaosServer_Auto_confGen(t *testing.T) { } return x.Equals(y) }), - cmpopts.IgnoreUnexported(security.CertificateConfig{}), + cmpopts.IgnoreUnexported( + security.CertificateConfig{}, + config.Server{}, + ), } if diff := cmp.Diff(tc.expCfg, gotCfg, cmpOpts...); diff != "" { diff --git a/src/control/cmd/daos_server/config.go b/src/control/cmd/daos_server/config.go index 197e6040458..d58a7abd9ca 100644 --- a/src/control/cmd/daos_server/config.go +++ b/src/control/cmd/daos_server/config.go @@ -11,11 +11,12 @@ import ( "path" "github.com/daos-stack/daos/src/control/build" + "github.com/daos-stack/daos/src/control/logging" "github.com/daos-stack/daos/src/control/server/config" ) type cfgLoader interface { - loadConfig() error + loadConfig(logging.Logger) error configPath() string configOptional() bool } @@ -43,7 +44,7 @@ func (c *cfgCmd) configPath() string { return c.config.Path } -func (c *cfgCmd) loadConfig() error { +func (c *cfgCmd) loadConfig(log logging.Logger) error { if c.IgnoreConfig { c.config = nil return nil @@ -75,7 +76,7 @@ func (c *cfgCmd) loadConfig() error { return err } - return c.config.Load() + return c.config.Load(log) } func (c *cfgCmd) configOptional() bool { diff --git a/src/control/cmd/daos_server/main.go b/src/control/cmd/daos_server/main.go index cc721658711..55d613d9673 100644 --- a/src/control/cmd/daos_server/main.go +++ b/src/control/cmd/daos_server/main.go @@ -168,7 +168,7 @@ func parseOpts(args []string, opts *mainOpts, log *logging.LeveledLogger) error optCfgCmd.setOptional() } - if err := cfgCmd.loadConfig(); err != nil { + if err := cfgCmd.loadConfig(log); err != nil { return errors.Wrapf(err, "failed to load config from %s", cfgCmd.configPath()) } else if cfgCmd.configPath() != "" { diff --git a/src/control/cmd/daos_server/start_test.go b/src/control/cmd/daos_server/start_test.go index 60e79f11387..4c8f419384a 100644 --- a/src/control/cmd/daos_server/start_test.go +++ b/src/control/cmd/daos_server/start_test.go @@ -261,6 +261,7 @@ func TestStartOptions(t *testing.T) { cmpOpts := []cmp.Option{ cmpopts.IgnoreUnexported( security.CertificateConfig{}, + config.Server{}, ), cmpopts.SortSlices(func(a, b string) bool { return a < b }), } diff --git a/src/control/cmd/dmg/auto.go b/src/control/cmd/dmg/auto.go index 3e97a528cc8..d2003ea97ed 100644 --- a/src/control/cmd/dmg/auto.go +++ b/src/control/cmd/dmg/auto.go @@ -1,5 +1,5 @@ // -// (C) Copyright 2020-2023 Intel Corporation. +// (C) Copyright 2020-2024 Intel Corporation. // // SPDX-License-Identifier: BSD-2-Clause-Patent // @@ -54,6 +54,8 @@ func (cmd *configGenCmd) confGen(ctx context.Context) (*config.Server, error) { hl = []string{"localhost"} } + cmd.CheckDeprecated(cmd.Logger) + req := control.ConfGenerateRemoteReq{ ConfGenerateReq: control.ConfGenerateReq{}, Client: cmd.ctlInvoker, diff --git a/src/control/cmd/dmg/auto_test.go b/src/control/cmd/dmg/auto_test.go index 540c721eff8..800fbed2a65 100644 --- a/src/control/cmd/dmg/auto_test.go +++ b/src/control/cmd/dmg/auto_test.go @@ -77,14 +77,14 @@ func TestAuto_ConfigCommands(t *testing.T) { runConfGenCmdTests(t, []cmdTest{ { - "Generate with no access point", + "Generate with no MS replica", "config generate", printCGRReq(t, func() control.ConfGenerateRemoteReq { req := control.ConfGenerateRemoteReq{ HostList: []string{"localhost:10001"}, } req.ConfGenerateReq.NetClass = hardware.Infiniband - req.ConfGenerateReq.AccessPoints = []string{"localhost"} + req.ConfGenerateReq.MgmtSvcReplicas = []string{"localhost"} return req }()), nil, @@ -100,7 +100,7 @@ func TestAuto_ConfigCommands(t *testing.T) { }, } req.ConfGenerateReq.NetClass = hardware.Infiniband - req.ConfGenerateReq.AccessPoints = []string{"foo"} + req.ConfGenerateReq.MgmtSvcReplicas = []string{"foo"} return req }()), nil, @@ -113,7 +113,7 @@ func TestAuto_ConfigCommands(t *testing.T) { HostList: []string{"localhost:10001"}, } req.ConfGenerateReq.NetClass = hardware.Infiniband - req.ConfGenerateReq.AccessPoints = []string{"foo"} + req.ConfGenerateReq.MgmtSvcReplicas = []string{"foo"} return req }()), nil, @@ -126,7 +126,7 @@ func TestAuto_ConfigCommands(t *testing.T) { HostList: []string{"localhost:10001"}, } req.ConfGenerateReq.NetClass = hardware.Infiniband - req.ConfGenerateReq.AccessPoints = []string{"foo"} + req.ConfGenerateReq.MgmtSvcReplicas = []string{"foo"} req.ConfGenerateReq.SCMOnly = true return req }()), @@ -140,7 +140,7 @@ func TestAuto_ConfigCommands(t *testing.T) { HostList: []string{"localhost:10001"}, } req.ConfGenerateReq.NetClass = hardware.Infiniband - req.ConfGenerateReq.AccessPoints = []string{"foo"} + req.ConfGenerateReq.MgmtSvcReplicas = []string{"foo"} req.ConfGenerateReq.NrEngines = 2 return req }()), @@ -154,7 +154,7 @@ func TestAuto_ConfigCommands(t *testing.T) { HostList: []string{"localhost:10001"}, } req.ConfGenerateReq.NetClass = hardware.Infiniband - req.ConfGenerateReq.AccessPoints = []string{"foo"} + req.ConfGenerateReq.MgmtSvcReplicas = []string{"foo"} req.ConfGenerateReq.NrEngines = 2 req.ConfGenerateReq.SCMOnly = true return req @@ -169,7 +169,7 @@ func TestAuto_ConfigCommands(t *testing.T) { HostList: []string{"localhost:10001"}, } req.ConfGenerateReq.NetClass = hardware.Ether - req.ConfGenerateReq.AccessPoints = []string{"foo"} + req.ConfGenerateReq.MgmtSvcReplicas = []string{"foo"} return req }()), nil, @@ -182,7 +182,7 @@ func TestAuto_ConfigCommands(t *testing.T) { HostList: []string{"localhost:10001"}, } req.ConfGenerateReq.NetClass = hardware.Infiniband - req.ConfGenerateReq.AccessPoints = []string{"foo"} + req.ConfGenerateReq.MgmtSvcReplicas = []string{"foo"} return req }()), nil, @@ -207,7 +207,7 @@ func TestAuto_ConfigCommands(t *testing.T) { HostList: []string{"localhost:10001"}, } req.ConfGenerateReq.NetClass = hardware.Infiniband - req.ConfGenerateReq.AccessPoints = []string{"foo"} + req.ConfGenerateReq.MgmtSvcReplicas = []string{"foo"} req.ConfGenerateReq.FabricPorts = []int{12345, 13345} return req }()), @@ -221,7 +221,7 @@ func TestAuto_ConfigCommands(t *testing.T) { HostList: []string{"localhost:10001"}, } req.ConfGenerateReq.NetClass = hardware.Infiniband - req.ConfGenerateReq.AccessPoints = []string{"foo"} + req.ConfGenerateReq.MgmtSvcReplicas = []string{"foo"} req.ConfGenerateReq.UseTmpfsSCM = true return req }()), @@ -235,7 +235,7 @@ func TestAuto_ConfigCommands(t *testing.T) { HostList: []string{"localhost:10001"}, } req.ConfGenerateReq.NetClass = hardware.Infiniband - req.ConfGenerateReq.AccessPoints = []string{"foo"} + req.ConfGenerateReq.MgmtSvcReplicas = []string{"foo"} req.ConfGenerateReq.UseTmpfsSCM = true req.ConfGenerateReq.ExtMetadataPath = "/opt/daos" return req @@ -256,7 +256,7 @@ func TestAuto_confGenCmd_Convert(t *testing.T) { cmd.NrEngines = 1 cmd.NetProvider = "ofi+tcp" cmd.SCMOnly = true - cmd.AccessPoints = "foo,bar" + cmd.MgmtSvcReplicas = "foo,bar" cmd.NetClass = "infiniband" cmd.UseTmpfsSCM = true cmd.ExtMetadataPath = "/opt/daos_md" @@ -271,7 +271,7 @@ func TestAuto_confGenCmd_Convert(t *testing.T) { NrEngines: 1, NetProvider: "ofi+tcp", SCMOnly: true, - AccessPoints: []string{"foo", "bar"}, + MgmtSvcReplicas: []string{"foo", "bar"}, NetClass: hardware.Infiniband, UseTmpfsSCM: true, ExtMetadataPath: "/opt/daos_md", @@ -338,7 +338,7 @@ func TestAuto_confGen(t *testing.T) { for name, tc := range map[string]struct { hostlist []string - accessPoints string + msReplicas string nrEngines int scmOnly bool netClass string @@ -376,17 +376,17 @@ func TestAuto_confGen(t *testing.T) { {storHostResp}, }, expCfg: control.MockServerCfg("ofi+psm2", exmplEngineCfgs). - WithAccessPoints("localhost:10001"). + WithMgmtSvcReplicas("localhost:10001"). WithControlLogFile("/tmp/daos_server.log"), }, - "dcpm scm; access points set": { - accessPoints: "moon-111,mars-115,jupiter-119", + "dcpm scm; MS replicas set": { + msReplicas: "moon-111,mars-115,jupiter-119", hostResponsesSet: [][]*control.HostResponse{ {netHostResp}, {storHostResp}, }, expCfg: control.MockServerCfg("ofi+psm2", exmplEngineCfgs). - WithAccessPoints("moon-111:10001", "mars-115:10001", "jupiter-119:10001"). + WithMgmtSvcReplicas("moon-111:10001", "mars-115:10001", "jupiter-119:10001"). WithControlLogFile("/tmp/daos_server.log"), }, "dcpm scm; unmet min nr ssds": { @@ -466,11 +466,11 @@ func TestAuto_confGen(t *testing.T) { if tc.netClass == "" { tc.netClass = "infiniband" } - if tc.accessPoints == "" { - tc.accessPoints = "localhost" + if tc.msReplicas == "" { + tc.msReplicas = "localhost" } cmd := &configGenCmd{} - cmd.AccessPoints = tc.accessPoints + cmd.MgmtSvcReplicas = tc.msReplicas cmd.NrEngines = tc.nrEngines cmd.SCMOnly = tc.scmOnly cmd.NetClass = tc.netClass @@ -520,7 +520,10 @@ func TestAuto_confGen(t *testing.T) { } return x.Equals(y) }), - cmpopts.IgnoreUnexported(security.CertificateConfig{}), + cmpopts.IgnoreUnexported( + security.CertificateConfig{}, + config.Server{}, + ), } if diff := cmp.Diff(tc.expCfg, gotCfg, cmpOpts...); diff != "" { @@ -596,7 +599,7 @@ core_dump_filter: 19 name: daos_server socket_dir: /var/run/daos_server provider: ofi+verbs -access_points: +mgmt_svc_replicas: - hostX:10002 fault_cb: "" hyperthreads: false @@ -606,7 +609,7 @@ hyperthreads: false typicalAutoGenOutCfg := config.DefaultServer(). WithControlLogFile(defaultControlLogFile). WithFabricProvider("ofi+verbs"). - WithAccessPoints("hostX:10002"). + WithMgmtSvcReplicas("hostX:10002"). WithDisableVMD(false). WithEngines( engine.MockConfig(). diff --git a/src/control/common/cmdutil/auto.go b/src/control/common/cmdutil/auto.go index 1081942e2c0..26bd56eba49 100644 --- a/src/control/common/cmdutil/auto.go +++ b/src/control/common/cmdutil/auto.go @@ -1,7 +1,14 @@ package cmdutil +import "github.com/daos-stack/daos/src/control/logging" + +type deprecatedParams struct { + AccessPoints string `short:"a" long:"access-points" description:"DEPRECATED; use ms-replicas instead" json:",omitempty"` // deprecated in 2.8 +} + type ConfGenCmd struct { - AccessPoints string `default:"localhost" short:"a" long:"access-points" description:"Comma separated list of access point addresses to host management service"` + deprecatedParams + MgmtSvcReplicas string `default:"localhost" short:"r" long:"ms-replicas" description:"Comma separated list of MS replica addresses to host management service"` NrEngines int `short:"e" long:"num-engines" description:"Set the number of DAOS Engine sections to be populated in the config file output. If unset then the value will be set to the number of NUMA nodes on storage hosts in the DAOS system."` SCMOnly bool `short:"s" long:"scm-only" description:"Create a SCM-only config without NVMe SSDs."` NetClass string `default:"infiniband" short:"c" long:"net-class" description:"Set the network device class to be used" choice:"ethernet" choice:"infiniband"` @@ -10,3 +17,12 @@ type ConfGenCmd struct { ExtMetadataPath string `short:"m" long:"control-metadata-path" description:"External storage path to store control metadata. Set this to a persistent location and specify --use-tmpfs-scm to create an MD-on-SSD config"` FabricPorts string `short:"f" long:"fabric-ports" description:"Allow custom fabric interface ports to be specified for each engine config section. Comma separated port numbers, one per engine"` } + +// CheckDeprecated will check for deprecated parameters and update as needed. +func (cmd *ConfGenCmd) CheckDeprecated(log logging.Logger) { + if cmd.AccessPoints != "" { + log.Notice("access-points is deprecated; please use ms-replicas instead") + cmd.MgmtSvcReplicas = cmd.AccessPoints + cmd.AccessPoints = "" + } +} diff --git a/src/control/fault/code/codes.go b/src/control/fault/code/codes.go index 4d045f7cfce..bbbb5d76cdf 100644 --- a/src/control/fault/code/codes.go +++ b/src/control/fault/code/codes.go @@ -165,8 +165,8 @@ const ( ServerNoConfigPath ServerConfigBadControlPort ServerConfigBadTelemetryPort - ServerConfigBadAccessPoints - ServerConfigEvenAccessPoints + ServerConfigBadMgmtSvcReplicas + ServerConfigEvenMgmtSvcReplicas ServerConfigBadProvider ServerConfigNoEngines ServerConfigDuplicateFabric diff --git a/src/control/lib/control/auto.go b/src/control/lib/control/auto.go index 0f2a94c404c..a5e981cb6a9 100644 --- a/src/control/lib/control/auto.go +++ b/src/control/lib/control/auto.go @@ -63,7 +63,7 @@ type ( // Generate a config without NVMe. SCMOnly bool `json:"SCMOnly"` // Hosts to run the management service. - AccessPoints []string `json:"-"` + MgmtSvcReplicas []string `json:"-"` // Ports to use for fabric comms (one needed per engine). FabricPorts []int `json:"-"` // Generate config with a tmpfs RAM-disk SCM. @@ -101,9 +101,9 @@ type ( func (cgr *ConfGenerateReq) UnmarshalJSON(data []byte) error { type Alias ConfGenerateReq aux := &struct { - AccessPoints string - FabricPorts string - NetClass string + MgmtSvcReplicas string + FabricPorts string + NetClass string *Alias }{ Alias: (*Alias)(cgr), @@ -113,7 +113,7 @@ func (cgr *ConfGenerateReq) UnmarshalJSON(data []byte) error { return err } - cgr.AccessPoints = strings.Split(aux.AccessPoints, ",") + cgr.MgmtSvcReplicas = strings.Split(aux.MgmtSvcReplicas, ",") fabricPorts := strings.Split(aux.FabricPorts, ",") for _, s := range fabricPorts { if s == "" { @@ -212,8 +212,8 @@ func ConfGenerateRemote(ctx context.Context, req ConfGenerateRemoteReq) (*ConfGe return nil, errors.New("no hosts specified") } - if len(req.AccessPoints) == 0 { - return nil, errors.New("no access points specified") + if len(req.MgmtSvcReplicas) == 0 { + return nil, errors.New("no MS replicas specified") } ns, err := getNetworkSet(ctx, req) @@ -1190,24 +1190,24 @@ func getThreadCounts(log logging.Logger, nodeSet []int, coresPerEngine int, numa return &tc, nil } -// check that all access points either have no port specified or have the same port number. -func checkAccessPointPorts(log logging.Logger, aps []string) (int, error) { - if len(aps) == 0 { - return 0, errors.New("no access points") +// check that all MS replicas either have no port specified or have the same port number. +func checkReplicaPorts(log logging.Logger, replicas []string) (int, error) { + if len(replicas) == 0 { + return 0, errors.New("no MS replicas") } port := -1 - for _, ap := range aps { - apPort, err := config.GetAccessPointPort(log, ap) + for _, ap := range replicas { + apPort, err := config.GetMSReplicaPort(log, ap) if err != nil { - return 0, errors.Wrapf(err, "access point %q", ap) + return 0, errors.Wrapf(err, "MS replica %q", ap) } if port == -1 { port = apPort continue } if apPort != port { - return 0, errors.New("access point port numbers do not match") + return 0, errors.New("MS replica port numbers do not match") } } @@ -1228,7 +1228,7 @@ func genServerConfig(req ConfGenerateReq, ecs []*engine.Config, tc *threadCounts } cfg := config.DefaultServer(). - WithAccessPoints(req.AccessPoints...). + WithMgmtSvcReplicas(req.MgmtSvcReplicas...). WithFabricProvider(ecs[0].Fabric.Provider). WithEngines(ecs...). WithControlLogFile(defaultControlLogFile) @@ -1247,12 +1247,12 @@ func genServerConfig(req ConfGenerateReq, ecs []*engine.Config, tc *threadCounts } } - portNum, err := checkAccessPointPorts(req.Log, cfg.AccessPoints) + portNum, err := checkReplicaPorts(req.Log, cfg.MgmtSvcReplicas) if err != nil { return nil, err } if portNum != 0 { - // Custom access point port number specified so set server port to the same. + // Custom MS replica port number specified so set server port to the same. cfg.WithControlPort(portNum) } diff --git a/src/control/lib/control/auto_test.go b/src/control/lib/control/auto_test.go index 724bed72180..41dd4a248c3 100644 --- a/src/control/lib/control/auto_test.go +++ b/src/control/lib/control/auto_test.go @@ -1621,7 +1621,7 @@ func TestControl_AutoConfig_genServerConfig(t *testing.T) { } for name, tc := range map[string]struct { - accessPoints []string // list of access point host/ip addresses + msReplicas []string // list of MS replica host/ip addresses extMetadataPath string ecs []*engine.Config threadCounts *threadCounts // numa to cpu mappings @@ -1638,46 +1638,46 @@ func TestControl_AutoConfig_genServerConfig(t *testing.T) { }, expErr: errors.New("provider not specified"), }, - "no access points": { - accessPoints: []string{}, + "no MS replicas": { + msReplicas: []string{}, threadCounts: &threadCounts{16, 0}, ecs: []*engine.Config{exmplEngineCfg0}, - expErr: errors.New("no access points"), + expErr: errors.New("no MS replicas"), }, - "access points without the same port": { - accessPoints: []string{"bob:1", "joe:2"}, + "MS replicas without the same port": { + msReplicas: []string{"bob:1", "joe:2"}, threadCounts: &threadCounts{16, 0}, ecs: []*engine.Config{exmplEngineCfg0}, expErr: errors.New("numbers do not match"), }, - "access points some with port specified": { - accessPoints: []string{"bob:1", "joe"}, + "MS replicas some with port specified": { + msReplicas: []string{"bob:1", "joe"}, threadCounts: &threadCounts{16, 0}, ecs: []*engine.Config{exmplEngineCfg0}, expErr: errors.New("numbers do not match"), }, "single engine config; default port number": { - accessPoints: []string{"hostX"}, + msReplicas: []string{"hostX"}, threadCounts: &threadCounts{16, 0}, ecs: []*engine.Config{exmplEngineCfg0}, expCfg: MockServerCfg(exmplEngineCfg0.Fabric.Provider, []*engine.Config{ exmplEngineCfg0.WithHelperStreamCount(0), }). - WithAccessPoints("hostX:10001"), // Default applied. + WithMgmtSvcReplicas("hostX:10001"), // Default applied. }, "single engine config; default port number specified": { - accessPoints: []string{"hostX:10001"}, + msReplicas: []string{"hostX:10001"}, threadCounts: &threadCounts{16, 0}, ecs: []*engine.Config{exmplEngineCfg0}, expCfg: MockServerCfg(exmplEngineCfg0.Fabric.Provider, []*engine.Config{ exmplEngineCfg0.WithHelperStreamCount(0), }). - WithAccessPoints("hostX:10001"), // ControlPort remains at 10001. + WithMgmtSvcReplicas("hostX:10001"), // ControlPort remains at 10001. }, - "dual engine config; custom access point port number": { - accessPoints: []string{"hostX:10002"}, + "dual engine config; custom MS replica port number": { + msReplicas: []string{"hostX:10002"}, threadCounts: &threadCounts{16, 0}, ecs: []*engine.Config{ exmplEngineCfg0, @@ -1688,11 +1688,11 @@ func TestControl_AutoConfig_genServerConfig(t *testing.T) { exmplEngineCfg0.WithHelperStreamCount(0), exmplEngineCfg1.WithHelperStreamCount(0), }). - WithAccessPoints("hostX:10002"). + WithMgmtSvcReplicas("hostX:10002"). WithControlPort(10002), // ControlPort updated to AP port. }, - "bad accesspoint port": { - accessPoints: []string{"hostX:-10001"}, + "bad MS replica port": { + msReplicas: []string{"hostX:-10001"}, threadCounts: &threadCounts{16, 0}, ecs: []*engine.Config{ exmplEngineCfg0, @@ -1709,7 +1709,7 @@ func TestControl_AutoConfig_genServerConfig(t *testing.T) { expErr: errors.New("multiple bdev tiers"), }, "dual engine tmpfs; high mem": { - accessPoints: []string{"hostX:10002", "hostY:10002", "hostZ:10002"}, + msReplicas: []string{"hostX:10002", "hostY:10002", "hostZ:10002"}, extMetadataPath: metadataMountPath, threadCounts: &threadCounts{16, 0}, ecs: []*engine.Config{ @@ -1737,7 +1737,7 @@ func TestControl_AutoConfig_genServerConfig(t *testing.T) { storage.BdevOutConfName), ), }). - WithAccessPoints("hostX:10002", "hostY:10002", "hostZ:10002"). + WithMgmtSvcReplicas("hostX:10002", "hostY:10002", "hostZ:10002"). WithControlPort(10002). // ControlPort updated to AP port. WithControlMetadata(controlMetadata), }, @@ -1746,8 +1746,8 @@ func TestControl_AutoConfig_genServerConfig(t *testing.T) { log, buf := logging.NewTestLogger(t.Name()) defer test.ShowBufferOnFailure(t, buf) - if tc.accessPoints == nil { - tc.accessPoints = []string{"localhost"} // Matches default in mock config. + if tc.msReplicas == nil { + tc.msReplicas = []string{"localhost"} // Matches default in mock config. } if tc.threadCounts == nil { tc.threadCounts = &threadCounts{} @@ -1755,7 +1755,7 @@ func TestControl_AutoConfig_genServerConfig(t *testing.T) { req := ConfGenerateReq{ Log: log, - AccessPoints: tc.accessPoints, + MgmtSvcReplicas: tc.msReplicas, ExtMetadataPath: tc.extMetadataPath, } @@ -1772,7 +1772,10 @@ func TestControl_AutoConfig_genServerConfig(t *testing.T) { } return x.Equals(y) }), - cmpopts.IgnoreUnexported(security.CertificateConfig{}), + cmpopts.IgnoreUnexported( + security.CertificateConfig{}, + config.Server{}, + ), } cmpOpts = append(cmpOpts, defResCmpOpts()...) diff --git a/src/control/lib/control/event.go b/src/control/lib/control/event.go index 2a9896873c3..d316a4e7cb1 100644 --- a/src/control/lib/control/event.go +++ b/src/control/lib/control/event.go @@ -1,5 +1,5 @@ // -// (C) Copyright 2021-2022 Intel Corporation. +// (C) Copyright 2021-2024 Intel Corporation. // // SPDX-License-Identifier: BSD-2-Clause-Patent // @@ -68,11 +68,11 @@ func eventNotify(ctx context.Context, rpcClient UnaryInvoker, seq uint64, evt *e } // EventForwarder implements the events.Handler interface, increments sequence -// number for each event forwarded and distributes requests to MS access points. +// number for each event forwarded and distributes requests to MS replicas. type EventForwarder struct { - seq <-chan uint64 - client UnaryInvoker - accessPts []string + seq <-chan uint64 + client UnaryInvoker + msReplicas []string } // OnEvent implements the events.Handler interface. @@ -81,21 +81,21 @@ func (ef *EventForwarder) OnEvent(ctx context.Context, evt *events.RASEvent) { case evt == nil: ef.client.Debug("skip event forwarding, nil event") return - case len(ef.accessPts) == 0: - ef.client.Debug("skip event forwarding, missing access points") + case len(ef.msReplicas) == 0: + ef.client.Debug("skip event forwarding, missing MS replicas") return case !evt.ShouldForward(): ef.client.Debugf("forwarding disabled for %s event", evt.ID) return } - if err := eventNotify(ctx, ef.client, <-ef.seq, evt, ef.accessPts); err != nil { + if err := eventNotify(ctx, ef.client, <-ef.seq, evt, ef.msReplicas); err != nil { ef.client.Debugf("failed to forward event to MS: %s", err) } } // NewEventForwarder returns an initialized EventForwarder. -func NewEventForwarder(rpcClient UnaryInvoker, accessPts []string) *EventForwarder { +func NewEventForwarder(rpcClient UnaryInvoker, replicas []string) *EventForwarder { seqCh := make(chan uint64) go func(ch chan<- uint64) { for i := uint64(1); ; i++ { @@ -104,9 +104,9 @@ func NewEventForwarder(rpcClient UnaryInvoker, accessPts []string) *EventForward }(seqCh) return &EventForwarder{ - seq: seqCh, - client: rpcClient, - accessPts: accessPts, + seq: seqCh, + client: rpcClient, + msReplicas: replicas, } } diff --git a/src/control/lib/control/event_test.go b/src/control/lib/control/event_test.go index c8cd47d6fee..73b60ae1873 100644 --- a/src/control/lib/control/event_test.go +++ b/src/control/lib/control/event_test.go @@ -1,5 +1,5 @@ // -// (C) Copyright 2021-2022 Intel Corporation. +// (C) Copyright 2021-2024 Intel Corporation. // // SPDX-License-Identifier: BSD-2-Clause-Patent // @@ -89,7 +89,7 @@ func TestControl_EventForwarder_OnEvent(t *testing.T) { rasEventEngineDiedFwdable := mockEvtEngineDied(t).WithForwardable(true) for name, tc := range map[string]struct { - aps []string + replicas []string event *events.RASEvent nilClient bool expInvokeCount int @@ -97,17 +97,17 @@ func TestControl_EventForwarder_OnEvent(t *testing.T) { "nil event": { event: nil, }, - "missing access points": { + "missing MS replicas": { event: rasEventEngineDiedFwdable, }, "successful forward": { event: rasEventEngineDiedFwdable, - aps: []string{"192.168.1.1"}, + replicas: []string{"192.168.1.1"}, expInvokeCount: 2, }, "skip non-forwardable event": { - event: rasEventEngineDied, - aps: []string{"192.168.1.1"}, + event: rasEventEngineDied, + replicas: []string{"192.168.1.1"}, }, } { t.Run(name, func(t *testing.T) { @@ -126,7 +126,7 @@ func TestControl_EventForwarder_OnEvent(t *testing.T) { callCount++ // call at least once } - ef := NewEventForwarder(mi, tc.aps) + ef := NewEventForwarder(mi, tc.replicas) for i := 0; i < callCount; i++ { ef.OnEvent(test.Context(t), tc.event) } diff --git a/src/control/lib/support/log.go b/src/control/lib/support/log.go index 02bf0abdb54..f6446159d75 100644 --- a/src/control/lib/support/log.go +++ b/src/control/lib/support/log.go @@ -464,7 +464,7 @@ func rsyncLog(log logging.Logger, opts ...CollectLogsParams) error { if cfgPath != "" { serverConfig := config.DefaultServer() serverConfig.SetPath(cfgPath) - if err := serverConfig.Load(); err == nil { + if err := serverConfig.Load(log); err == nil { if serverConfig.SupportConfig.FileTransferExec != "" { return customCopy(log, opts[0], serverConfig.SupportConfig.FileTransferExec) } @@ -682,7 +682,7 @@ func copyServerConfig(log logging.Logger, opts ...CollectLogsParams) error { serverConfig := config.DefaultServer() serverConfig.SetPath(cfgPath) - serverConfig.Load() + serverConfig.Load(log) // Create the individual folder on each server targetConfig, err := createHostLogFolder(DaosServerConfig, log, opts...) if err != nil { @@ -862,7 +862,7 @@ func collectServerLog(log logging.Logger, opts ...CollectLogsParams) error { } serverConfig := config.DefaultServer() serverConfig.SetPath(cfgPath) - serverConfig.Load() + serverConfig.Load(log) switch opts[0].LogCmd { case "EngineLog": @@ -928,7 +928,7 @@ func collectDaosMetrics(daosNodeLocation string, log logging.Logger, opts ...Col } serverConfig := config.DefaultServer() serverConfig.SetPath(cfgPath) - serverConfig.Load() + serverConfig.Load(log) for i := range serverConfig.Engines { engineId := fmt.Sprintf("%d", i) diff --git a/src/control/server/config/faults.go b/src/control/server/config/faults.go index b5128ffcb20..c2f2063357a 100644 --- a/src/control/server/config/faults.go +++ b/src/control/server/config/faults.go @@ -41,15 +41,15 @@ var ( "invalid telemetry port in configuration", "specify a positive non-zero network port in configuration ('telemetry_port' parameter) and restart the control server", ) - FaultConfigBadAccessPoints = serverConfigFault( - code.ServerConfigBadAccessPoints, - "invalid list of access points in configuration", - "'access_points' must contain resolvable addresses; fix the configuration and restart the control server", - ) - FaultConfigEvenAccessPoints = serverConfigFault( - code.ServerConfigEvenAccessPoints, - "non-odd number of access points in configuration", - "'access_points' must contain an odd number (e.g. 1, 3, 5, etc.) of addresses; fix the configuration and restart the control server", + FaultConfigBadMgmtSvcReplicas = serverConfigFault( + code.ServerConfigBadMgmtSvcReplicas, + "invalid list of MS replicas in configuration", + "'mgmt_svc_replicas' must contain resolvable addresses; fix the configuration and restart the control server", + ) + FaultConfigEvenMgmtSvcReplicas = serverConfigFault( + code.ServerConfigEvenMgmtSvcReplicas, + "non-odd number of MS replicas in configuration", + "'mgmt_svc_replicas' must contain an odd number (e.g. 1, 3, 5, etc.) of addresses; fix the configuration and restart the control server", ) FaultConfigNoProvider = serverConfigFault( code.ServerConfigBadProvider, diff --git a/src/control/server/config/server.go b/src/control/server/config/server.go index ec94784d7d5..4b4c5bd529c 100644 --- a/src/control/server/config/server.go +++ b/src/control/server/config/server.go @@ -41,6 +41,10 @@ type SupportConfig struct { FileTransferExec string `yaml:"file_transfer_exec,omitempty"` } +type deprecatedParams struct { + AccessPoints []string `yaml:"access_points,omitempty"` // deprecated in 2.8 +} + // Server describes configuration options for DAOS control plane. // See utils/config/daos_server.yml for parameter descriptions. type Server struct { @@ -72,7 +76,7 @@ type Server struct { Fabric engine.FabricConfig `yaml:",inline"` Modules string `yaml:"-"` - AccessPoints []string `yaml:"access_points"` + MgmtSvcReplicas []string `yaml:"mgmt_svc_replicas"` Metadata storage.ControlMetadata `yaml:"control_metadata,omitempty"` @@ -84,6 +88,8 @@ type Server struct { // Behavior flags AutoFormat bool `yaml:"-"` + + deprecatedParams `yaml:",inline"` } // WithCoreDumpFilter sets the core dump filter written to /proc/self/coredump_filter. @@ -200,9 +206,9 @@ func (cfg *Server) WithEngines(engineList ...*engine.Config) *Server { return cfg } -// WithAccessPoints sets the access point list. -func (cfg *Server) WithAccessPoints(aps ...string) *Server { - cfg.AccessPoints = aps +// WithMgmtSvcReplicas sets the MS replicas list. +func (cfg *Server) WithMgmtSvcReplicas(reps ...string) *Server { + cfg.MgmtSvcReplicas = reps return cfg } @@ -324,7 +330,7 @@ func DefaultServer() *Server { return &Server{ SystemName: build.DefaultSystemName, SocketDir: defaultRuntimeDir, - AccessPoints: []string{fmt.Sprintf("localhost:%d", build.DefaultControlPort)}, + MgmtSvcReplicas: []string{fmt.Sprintf("localhost:%d", build.DefaultControlPort)}, ControlPort: build.DefaultControlPort, TransportConfig: security.DefaultServerTransportConfig(), Hyperthreads: false, @@ -338,7 +344,7 @@ func DefaultServer() *Server { } // Load reads the serialized configuration from disk and validates file syntax. -func (cfg *Server) Load() error { +func (cfg *Server) Load(log logging.Logger) error { if cfg.Path == "" { return FaultConfigNoPath } @@ -372,6 +378,12 @@ func (cfg *Server) Load() error { cfg.ClientEnvVars = common.MergeKeyValues(cfg.ClientEnvVars, []string{cfg.Fabric.GetAuthKeyEnv()}) } + if len(cfg.deprecatedParams.AccessPoints) > 0 { + log.Notice("access_points is deprecated; please use mgmt_svc_replicas instead") + cfg.MgmtSvcReplicas = cfg.deprecatedParams.AccessPoints + cfg.deprecatedParams.AccessPoints = nil + } + return nil } @@ -413,22 +425,22 @@ func (cfg *Server) SaveActiveConfig(log logging.Logger) { log.Debugf("active config saved to %s (read-only)", activeConfig) } -// GetAccessPointPort returns port number suffixed to AP address after its validation or 0 if no +// GetMSReplicaPort returns port number suffixed to replicas address after its validation or 0 if no // port number specified. Error returned if validation fails. -func GetAccessPointPort(log logging.Logger, addr string) (int, error) { +func GetMSReplicaPort(log logging.Logger, addr string) (int, error) { if !common.HasPort(addr) { return 0, nil } _, port, err := net.SplitHostPort(addr) if err != nil { - log.Errorf("invalid access point %q: %s", addr, err) - return 0, FaultConfigBadAccessPoints + log.Errorf("invalid MS replica %q: %s", addr, err) + return 0, FaultConfigBadMgmtSvcReplicas } portNum, err := strconv.Atoi(port) if err != nil { - log.Errorf("invalid access point port: %s", err) + log.Errorf("invalid MS replica port: %s", err) return 0, FaultConfigBadControlPort } if portNum <= 0 { @@ -436,17 +448,17 @@ func GetAccessPointPort(log logging.Logger, addr string) (int, error) { if portNum < 0 { m = "negative" } - log.Errorf("access point port cannot be %s", m) + log.Errorf("MS replica port cannot be %s", m) return 0, FaultConfigBadControlPort } return portNum, nil } -// getAccessPointAddrWithPort appends default port number to address if custom port is not +// getReplicaAddrWithPort appends default port number to address if custom port is not // specified, otherwise custom specified port is validated. -func getAccessPointAddrWithPort(log logging.Logger, addr string, portDefault int) (string, error) { - portNum, err := GetAccessPointPort(log, addr) +func getReplicaAddrWithPort(log logging.Logger, addr string, portDefault int) (string, error) { + portNum, err := GetMSReplicaPort(log, addr) if err != nil { return "", err } @@ -454,9 +466,9 @@ func getAccessPointAddrWithPort(log logging.Logger, addr string, portDefault int return fmt.Sprintf("%s:%d", addr, portDefault), nil } - // Warn if access point port differs from config control port. + // Warn if MS replica port differs from config control port. if portDefault != portNum { - log.Debugf("access point %q port differs from default port %q", + log.Debugf("ms replica %q port differs from default port %q", addr, portDefault) } @@ -653,20 +665,20 @@ func (cfg *Server) Validate(log logging.Logger) (err error) { log.Debugf("vfio=%v hotplug=%v vmd=%v requested in config", !cfg.DisableVFIO, cfg.EnableHotplug, !(*cfg.DisableVMD)) - // Update access point addresses with control port if port is not supplied. - newAPs := make([]string, 0, len(cfg.AccessPoints)) - for _, ap := range cfg.AccessPoints { - newAP, err := getAccessPointAddrWithPort(log, ap, cfg.ControlPort) + // Update MS replica addresses with control port if port is not supplied. + newReps := make([]string, 0, len(cfg.MgmtSvcReplicas)) + for _, rep := range cfg.MgmtSvcReplicas { + newAP, err := getReplicaAddrWithPort(log, rep, cfg.ControlPort) if err != nil { return err } - newAPs = append(newAPs, newAP) + newReps = append(newReps, newAP) } - if common.StringSliceHasDuplicates(newAPs) { - log.Error("duplicate access points addresses") - return FaultConfigBadAccessPoints + if common.StringSliceHasDuplicates(newReps) { + log.Error("duplicate MS replica addresses") + return FaultConfigBadMgmtSvcReplicas } - cfg.AccessPoints = newAPs + cfg.MgmtSvcReplicas = newReps if cfg.Metadata.DevicePath != "" && cfg.Metadata.Path == "" { return FaultConfigControlMetadataNoPath @@ -686,13 +698,13 @@ func (cfg *Server) Validate(log logging.Logger) (err error) { } switch { - case len(cfg.AccessPoints) < 1: - return FaultConfigBadAccessPoints - case len(cfg.AccessPoints)%2 == 0: - return FaultConfigEvenAccessPoints - case len(cfg.AccessPoints) == 1: - log.Noticef("Configuration includes only one access point. This provides no redundancy " + - "in the event of an access point failure.") + case len(cfg.MgmtSvcReplicas) < 1: + return FaultConfigBadMgmtSvcReplicas + case len(cfg.MgmtSvcReplicas)%2 == 0: + return FaultConfigEvenMgmtSvcReplicas + case len(cfg.MgmtSvcReplicas) == 1: + log.Noticef("Configuration includes only one MS replica. This provides no redundancy " + + "in the event of a MS replica failure.") } switch { diff --git a/src/control/server/config/server_test.go b/src/control/server/config/server_test.go index 6961a0f4190..034d21a7b5d 100644 --- a/src/control/server/config/server_test.go +++ b/src/control/server/config/server_test.go @@ -44,6 +44,7 @@ var ( cmpopts.SortSlices(func(x, y string) bool { return x < y }), cmpopts.IgnoreUnexported( security.CertificateConfig{}, + Server{}, ), cmpopts.IgnoreFields(Server{}, "Path"), cmp.Comparer(func(x, y *storage.BdevDeviceList) bool { @@ -55,10 +56,10 @@ var ( } ) -func baseCfg(t *testing.T, testFile string) *Server { +func baseCfg(t *testing.T, log logging.Logger, testFile string) *Server { t.Helper() - config, err := mockConfigFromFile(t, testFile) + config, err := mockConfigFromFile(t, log, testFile) if err != nil { t.Fatalf("failed to load %s: %s", testFile, err) } @@ -125,12 +126,12 @@ func uncommentServerConfig(t *testing.T, outFile string) { // mockConfigFromFile returns a populated server config file from the // file at the given path. -func mockConfigFromFile(t *testing.T, path string) (*Server, error) { +func mockConfigFromFile(t *testing.T, log logging.Logger, path string) (*Server, error) { t.Helper() c := DefaultServer() c.Path = path - return c, c.Load() + return c, c.Load(log) } func TestServerConfig_MarshalUnmarshal(t *testing.T) { @@ -163,7 +164,7 @@ func TestServerConfig_MarshalUnmarshal(t *testing.T) { configA := DefaultServer() configA.Path = tt.inPath - err := configA.Load() + err := configA.Load(log) if err == nil { err = configA.Validate(log) } @@ -194,7 +195,7 @@ func TestServerConfig_MarshalUnmarshal(t *testing.T) { t.Fatal(err) } - err = configB.Load() + err = configB.Load(log) if err == nil { err = configB.Validate(log) } @@ -220,10 +221,13 @@ func TestServerConfig_Constructed(t *testing.T) { testDir, cleanup := test.CreateTestDir(t) defer cleanup() + log, buf := logging.NewTestLogger(t.Name()) + defer test.ShowBufferOnFailure(t, buf) + // First, load a config based on the server config with all options uncommented. testFile := filepath.Join(testDir, sConfigUncomment) uncommentServerConfig(t, testFile) - defaultCfg, err := mockConfigFromFile(t, testFile) + defaultCfg, err := mockConfigFromFile(t, log, testFile) if err != nil { t.Fatalf("failed to load %s: %s", testFile, err) } @@ -251,7 +255,7 @@ func TestServerConfig_Constructed(t *testing.T) { WithSocketDir("./.daos/daos_server"). WithFabricProvider("ofi+verbs;ofi_rxm"). WithCrtTimeout(30). - WithAccessPoints("hostname1"). + WithMgmtSvcReplicas("hostname1", "hostname2", "hostname3"). WithFaultCb("./.daos/fd_callback"). WithFaultPath("/vcdu0/rack1/hostname"). WithClientEnvVars([]string{"foo=bar"}). @@ -407,7 +411,7 @@ func TestServerConfig_MDonSSD_Constructed(t *testing.T) { log, buf := logging.NewTestLogger(t.Name()) defer test.ShowBufferOnFailure(t, buf) - mdOnSSDCfg, err := mockConfigFromFile(t, mdOnSSDExample) + mdOnSSDCfg, err := mockConfigFromFile(t, log, mdOnSSDExample) if err != nil { t.Fatalf("failed to load %s: %s", mdOnSSDExample, err) } @@ -419,7 +423,7 @@ func TestServerConfig_MDonSSD_Constructed(t *testing.T) { WithControlLogFile("/tmp/daos_server.log"). WithTelemetryPort(9191). WithFabricProvider("ofi+tcp"). - WithAccessPoints("example") + WithMgmtSvcReplicas("example1", "example2", "example3") constructed.Engines = []*engine.Config{ engine.MockConfig(). @@ -471,6 +475,9 @@ func TestServerConfig_Validation(t *testing.T) { testDir, cleanup := test.CreateTestDir(t) defer cleanup() + log, buf := logging.NewTestLogger(t.Name()) + defer test.ShowBufferOnFailure(t, buf) + // First, load a config based on the server config with all options uncommented. testFile := filepath.Join(testDir, sConfigUncomment) uncommentServerConfig(t, testFile) @@ -503,90 +510,90 @@ func TestServerConfig_Validation(t *testing.T) { }, expErr: FaultConfigNoProvider, }, - "no access point": { + "no MS replica": { extraConfig: func(c *Server) *Server { - return c.WithAccessPoints() + return c.WithMgmtSvcReplicas() }, - expErr: FaultConfigBadAccessPoints, + expErr: FaultConfigBadMgmtSvcReplicas, }, - "single access point": { + "single MS replica": { extraConfig: func(c *Server) *Server { - return c.WithAccessPoints("1.2.3.4:1234") + return c.WithMgmtSvcReplicas("1.2.3.4:1234") }, }, - "multiple access points (even)": { + "multiple MS replicas (even)": { extraConfig: func(c *Server) *Server { - return c.WithAccessPoints("1.2.3.4:1234", "5.6.7.8:5678") + return c.WithMgmtSvcReplicas("1.2.3.4:1234", "5.6.7.8:5678") }, - expErr: FaultConfigEvenAccessPoints, + expErr: FaultConfigEvenMgmtSvcReplicas, }, - "multiple access points (odd)": { + "multiple MS replicas (odd)": { extraConfig: func(c *Server) *Server { - return c.WithAccessPoints("1.2.3.4:1234", "5.6.7.8:5678", "1.5.3.8:6247") + return c.WithMgmtSvcReplicas("1.2.3.4:1234", "5.6.7.8:5678", "1.5.3.8:6247") }, }, - "multiple access points (dupes)": { + "multiple MS replicas (dupes)": { extraConfig: func(c *Server) *Server { - return c.WithAccessPoints("1.2.3.4", "5.6.7.8", "1.2.3.4") + return c.WithMgmtSvcReplicas("1.2.3.4", "5.6.7.8", "1.2.3.4") }, - expErr: FaultConfigBadAccessPoints, + expErr: FaultConfigBadMgmtSvcReplicas, }, - "multiple access points (dupes with ports)": { + "multiple MS replicas (dupes with ports)": { extraConfig: func(c *Server) *Server { - return c.WithAccessPoints("1.2.3.4:1234", "5.6.7.8:5678", "1.2.3.4:1234") + return c.WithMgmtSvcReplicas("1.2.3.4:1234", "5.6.7.8:5678", "1.2.3.4:1234") }, - expErr: FaultConfigBadAccessPoints, + expErr: FaultConfigBadMgmtSvcReplicas, }, - "multiple access points (dupes with and without ports)": { + "multiple MS replicas (dupes with and without ports)": { extraConfig: func(c *Server) *Server { - return c.WithAccessPoints("1.2.3.4:10001", "5.6.7.8:5678", "1.2.3.4") + return c.WithMgmtSvcReplicas("1.2.3.4:10001", "5.6.7.8:5678", "1.2.3.4") }, - expErr: FaultConfigBadAccessPoints, + expErr: FaultConfigBadMgmtSvcReplicas, }, - "multiple access points (dupes with different ports)": { + "multiple MS replicas (dupes with different ports)": { extraConfig: func(c *Server) *Server { - return c.WithAccessPoints("1.2.3.4:10002", "5.6.7.8:5678", "1.2.3.4") + return c.WithMgmtSvcReplicas("1.2.3.4:10002", "5.6.7.8:5678", "1.2.3.4") }, }, - "no access points": { + "no MS replicas": { extraConfig: func(c *Server) *Server { - return c.WithAccessPoints() + return c.WithMgmtSvcReplicas() }, - expErr: FaultConfigBadAccessPoints, + expErr: FaultConfigBadMgmtSvcReplicas, }, - "single access point no port": { + "single MS replica no port": { extraConfig: func(c *Server) *Server { - return c.WithAccessPoints("1.2.3.4") + return c.WithMgmtSvcReplicas("1.2.3.4") }, }, - "single access point invalid port": { + "single MS replica invalid port": { extraConfig: func(c *Server) *Server { - return c.WithAccessPoints("1.2.3.4"). + return c.WithMgmtSvcReplicas("1.2.3.4"). WithControlPort(0) }, expErr: FaultConfigBadControlPort, }, - "single access point including invalid port (alphanumeric)": { + "single MS replica including invalid port (alphanumeric)": { extraConfig: func(c *Server) *Server { - return c.WithAccessPoints("1.2.3.4:0a0") + return c.WithMgmtSvcReplicas("1.2.3.4:0a0") }, expErr: FaultConfigBadControlPort, }, - "single access point including invalid port (zero)": { + "single MS replica including invalid port (zero)": { extraConfig: func(c *Server) *Server { - return c.WithAccessPoints("1.2.3.4:0") + return c.WithMgmtSvcReplicas("1.2.3.4:0") }, expErr: FaultConfigBadControlPort, }, - "single access point including negative port": { + "single MS replica including negative port": { extraConfig: func(c *Server) *Server { - return c.WithAccessPoints("1.2.3.4:-10002") + return c.WithMgmtSvcReplicas("1.2.3.4:-10002") }, expErr: FaultConfigBadControlPort, }, - "single access point hostname including negative port": { + "single MS replica hostname including negative port": { extraConfig: func(c *Server) *Server { - return c.WithAccessPoints("hostX:-10002") + return c.WithMgmtSvcReplicas("hostX:-10002") }, expErr: FaultConfigBadControlPort, }, @@ -703,8 +710,8 @@ func TestServerConfig_Validation(t *testing.T) { ), ) }, - expConfig: baseCfg(t, testFile). - WithAccessPoints("hostname1:10001"). + expConfig: baseCfg(t, log, testFile). + WithMgmtSvcReplicas("hostname1:10001", "hostname2:10001", "hostname3:10001"). WithControlMetadata(storage.ControlMetadata{ Path: testMetadataDir, DevicePath: "/dev/something", @@ -778,8 +785,8 @@ func TestServerConfig_Validation(t *testing.T) { ), ) }, - expConfig: baseCfg(t, testFile). - WithAccessPoints("hostname1:10001"). + expConfig: baseCfg(t, log, testFile). + WithMgmtSvcReplicas("hostname1:10001", "hostname2:10001", "hostname3:10001"). WithControlMetadata(storage.ControlMetadata{ Path: testMetadataDir, DevicePath: "/dev/something", @@ -863,8 +870,8 @@ func TestServerConfig_Validation(t *testing.T) { ), ) }, - expConfig: baseCfg(t, testFile). - WithAccessPoints("hostname1:10001"). + expConfig: baseCfg(t, log, testFile). + WithMgmtSvcReplicas("hostname1:10001", "hostname2:10001", "hostname3:10001"). WithControlMetadata(storage.ControlMetadata{ Path: testMetadataDir, }). @@ -956,7 +963,7 @@ func TestServerConfig_Validation(t *testing.T) { } // Apply test case changes to basic config - cfg := tt.extraConfig(baseCfg(t, testFile)) + cfg := tt.extraConfig(baseCfg(t, log, testFile)) log.Debugf("baseCfg metadata: %+v", cfg.Metadata) @@ -1165,7 +1172,7 @@ func TestServerConfig_SetNrHugepages(t *testing.T) { defer test.ShowBufferOnFailure(t, buf) // Apply test case changes to basic config - cfg := tc.extraConfig(baseCfg(t, testFile)) + cfg := tc.extraConfig(baseCfg(t, log, testFile)) mi := &common.MemInfo{ HugepageSizeKiB: defHpSizeKb, @@ -1357,7 +1364,7 @@ func TestServerConfig_SetRamdiskSize(t *testing.T) { defer test.ShowBufferOnFailure(t, buf) // Apply test case changes to basic config - cfg := tc.extraConfig(baseCfg(t, testFile)) + cfg := tc.extraConfig(baseCfg(t, log, testFile)) val := tc.memTotBytes / humanize.KiByte if val > math.MaxInt { @@ -1456,7 +1463,7 @@ func replaceFile(t *testing.T, name, oldTxt, newTxt string) { func TestServerConfig_Parsing(t *testing.T) { noopExtra := func(c *Server) *Server { return c } - cfgFromFile := func(t *testing.T, testFile string, matchText, replaceText []string) (*Server, error) { + cfgFromFile := func(t *testing.T, log logging.Logger, testFile string, matchText, replaceText []string) (*Server, error) { t.Helper() if len(matchText) != len(replaceText) { @@ -1470,17 +1477,17 @@ func TestServerConfig_Parsing(t *testing.T) { replaceFile(t, testFile, m, replaceText[i]) } - return mockConfigFromFile(t, testFile) + return mockConfigFromFile(t, log, testFile) } // load a config based on the server config with all options uncommented. - loadFromFile := func(t *testing.T, testDir string, matchText, replaceText []string) (*Server, error) { + loadFromFile := func(t *testing.T, log logging.Logger, testDir string, matchText, replaceText []string) (*Server, error) { t.Helper() defaultConfigFile := filepath.Join(testDir, sConfigUncomment) uncommentServerConfig(t, defaultConfigFile) - return cfgFromFile(t, defaultConfigFile, matchText, replaceText) + return cfgFromFile(t, log, defaultConfigFile, matchText, replaceText) } for name, tt := range map[string]struct { @@ -1593,7 +1600,7 @@ func TestServerConfig_Parsing(t *testing.T) { tt.outTxtList = []string{tt.outTxt} } - config, errParse := loadFromFile(t, testDir, tt.inTxtList, tt.outTxtList) + config, errParse := loadFromFile(t, log, testDir, tt.inTxtList, tt.outTxtList) test.CmpErr(t, tt.expParseErr, errParse) if tt.expParseErr != nil { return diff --git a/src/control/server/mgmt_pool_test.go b/src/control/server/mgmt_pool_test.go index d5a056b8938..0505279a7b4 100644 --- a/src/control/server/mgmt_pool_test.go +++ b/src/control/server/mgmt_pool_test.go @@ -48,6 +48,7 @@ var ( MemRatio: mockMemRatio, }, } + errNotReplica = errors.New("not a MS replica") ) func getPoolLockCtx(t *testing.T, parent context.Context, sysdb poolDatabase, poolUUID uuid.UUID) (*raft.PoolLock, context.Context) { @@ -442,9 +443,9 @@ func TestServer_MgmtSvc_PoolCreate(t *testing.T) { TierBytes: []uint64{100 * humanize.GiByte, 0}, Properties: testPoolLabelProp(), }, - expErr: errors.New("not an access point"), + expErr: errNotReplica, }, - "not access point": { + "not MS replica": { mgmtSvc: notAP, targetCount: 8, req: &mgmtpb.PoolCreateReq{ @@ -452,7 +453,7 @@ func TestServer_MgmtSvc_PoolCreate(t *testing.T) { TierBytes: []uint64{100 * humanize.GiByte, 0}, Properties: testPoolLabelProp(), }, - expErr: errors.New("not an access point"), + expErr: errNotReplica, }, "dRPC send fails": { targetCount: 8, @@ -873,7 +874,7 @@ func TestServer_MgmtSvc_PoolDestroy(t *testing.T) { drpcResps: []*mockDrpcResponse{ &mockDrpcResponse{ Message: &mgmtpb.ListContResp{}, - Error: errors.New("not an access point"), + Error: errNotReplica, }, }, expDrpcListContReq: &mgmtpb.ListContReq{ @@ -881,7 +882,7 @@ func TestServer_MgmtSvc_PoolDestroy(t *testing.T) { Id: mockUUID, SvcRanks: []uint32{0, 1, 2}, }, - expErr: errors.New("not an access point"), + expErr: errNotReplica, }, // Note: evict dRPC fails as no pool service alive, remains in creating state. // getPoolService() returns TryAgain in resp before list-cont dRPC is issued. @@ -1392,11 +1393,11 @@ func TestServer_MgmtSvc_PoolExtend(t *testing.T) { }, "missing superblock": { mgmtSvc: missingSB, - expErr: errors.New("not an access point"), + expErr: errNotReplica, }, - "not access point": { + "not MS replica": { mgmtSvc: notAP, - expErr: errors.New("not an access point"), + expErr: errNotReplica, }, "dRPC send fails": { expErr: errors.New("send failure"), @@ -1508,11 +1509,11 @@ func TestServer_MgmtSvc_PoolReintegrate(t *testing.T) { }, "missing superblock": { mgmtSvc: missingSB, - expErr: errors.New("not an access point"), + expErr: errNotReplica, }, - "not access point": { + "not MS replica": { mgmtSvc: notAP, - expErr: errors.New("not an access point"), + expErr: errNotReplica, }, "dRPC send fails": { expErr: errors.New("send failure"), @@ -1624,12 +1625,12 @@ func TestServer_MgmtSvc_PoolExclude(t *testing.T) { "missing superblock": { mgmtSvc: missingSB, req: &mgmtpb.PoolExcludeReq{Id: mockUUID, Rank: 2, TargetIdx: []uint32{1, 2}}, - expErr: errors.New("not an access point"), + expErr: errNotReplica, }, - "not access point": { + "not MS replica": { mgmtSvc: notAP, req: &mgmtpb.PoolExcludeReq{Id: mockUUID, Rank: 2, TargetIdx: []uint32{1, 2}}, - expErr: errors.New("not an access point"), + expErr: errNotReplica, }, "dRPC send fails": { req: &mgmtpb.PoolExcludeReq{Id: mockUUID, Rank: 2, TargetIdx: []uint32{1, 2}}, @@ -1720,12 +1721,12 @@ func TestServer_MgmtSvc_PoolDrain(t *testing.T) { "missing superblock": { mgmtSvc: missingSB, req: &mgmtpb.PoolDrainReq{Id: mockUUID, Rank: 2, TargetIdx: []uint32{1, 2}}, - expErr: errors.New("not an access point"), + expErr: errNotReplica, }, - "not access point": { + "not MS replica": { mgmtSvc: notAP, req: &mgmtpb.PoolDrainReq{Id: mockUUID, Rank: 2, TargetIdx: []uint32{1, 2}}, - expErr: errors.New("not an access point"), + expErr: errNotReplica, }, "dRPC send fails": { req: &mgmtpb.PoolDrainReq{Id: mockUUID, Rank: 2, TargetIdx: []uint32{1, 2}}, @@ -1816,12 +1817,12 @@ func TestServer_MgmtSvc_PoolEvict(t *testing.T) { "missing superblock": { mgmtSvc: missingSB, req: &mgmtpb.PoolEvictReq{Id: mockUUID}, - expErr: errors.New("not an access point"), + expErr: errNotReplica, }, - "not access point": { + "not MS replica": { mgmtSvc: notAP, req: &mgmtpb.PoolEvictReq{Id: mockUUID}, - expErr: errors.New("not an access point"), + expErr: errNotReplica, }, "dRPC send fails": { req: &mgmtpb.PoolEvictReq{Id: mockUUID}, @@ -2339,7 +2340,7 @@ func TestServer_MgmtSvc_PoolQuery(t *testing.T) { req: &mgmtpb.PoolQueryReq{ Id: mockUUID, }, - expErr: errors.New("not an access point"), + expErr: errNotReplica, }, "dRPC send fails": { req: &mgmtpb.PoolQueryReq{ @@ -2708,12 +2709,12 @@ func TestServer_MgmtSvc_PoolUpgrade(t *testing.T) { "missing superblock": { mgmtSvc: missingSB, req: &mgmtpb.PoolUpgradeReq{Id: mockUUID}, - expErr: errors.New("not an access point"), + expErr: errNotReplica, }, - "not access point": { + "not MS replica": { mgmtSvc: notAP, req: &mgmtpb.PoolUpgradeReq{Id: mockUUID}, - expErr: errors.New("not an access point"), + expErr: errNotReplica, }, "dRPC send fails": { req: &mgmtpb.PoolUpgradeReq{Id: mockUUID}, diff --git a/src/control/server/mgmt_system.go b/src/control/server/mgmt_system.go index 4dab1724b37..e1a60a06d6a 100644 --- a/src/control/server/mgmt_system.go +++ b/src/control/server/mgmt_system.go @@ -132,7 +132,7 @@ func (svc *mgmtSvc) GetAttachInfo(ctx context.Context, req *mgmtpb.GetAttachInfo return resp, nil } -// LeaderQuery returns the system leader and access point replica details. +// LeaderQuery returns the system leader and MS replica details. func (svc *mgmtSvc) LeaderQuery(ctx context.Context, req *mgmtpb.LeaderQueryReq) (*mgmtpb.LeaderQueryResp, error) { if err := svc.checkSystemRequest(req); err != nil { return nil, err diff --git a/src/control/server/server.go b/src/control/server/server.go index bc54d31ef58..fdf7d4ea8e1 100644 --- a/src/control/server/server.go +++ b/src/control/server/server.go @@ -240,7 +240,7 @@ func (srv *server) createServices(ctx context.Context) (err error) { // Create event distribution primitives. srv.pubSub = events.NewPubSub(ctx, srv.log) srv.OnShutdown(srv.pubSub.Close) - srv.evtForwarder = control.NewEventForwarder(rpcClient, srv.cfg.AccessPoints) + srv.evtForwarder = control.NewEventForwarder(rpcClient, srv.cfg.MgmtSvcReplicas) srv.evtLogger = control.NewEventLogger(srv.log) srv.ctlSvc = NewControlService(srv.log, srv.harness, srv.cfg, srv.pubSub, @@ -321,7 +321,7 @@ func (srv *server) initNetwork() error { func (srv *server) createEngine(ctx context.Context, idx int, cfg *engine.Config) (*EngineInstance, error) { // Closure to join an engine instance to a system using control API. joinFn := func(ctxIn context.Context, req *control.SystemJoinReq) (*control.SystemJoinResp, error) { - req.SetHostList(srv.cfg.AccessPoints) + req.SetHostList(srv.cfg.MgmtSvcReplicas) req.SetSystem(srv.cfg.SystemName) req.ControlAddr = srv.ctlAddr diff --git a/src/control/server/server_utils.go b/src/control/server/server_utils.go index 269a5201e30..1e8c9e0c14d 100644 --- a/src/control/server/server_utils.go +++ b/src/control/server/server_utils.go @@ -112,12 +112,12 @@ func getBdevCfgsFromSrvCfg(cfg *config.Server) storage.TierConfigs { func cfgGetReplicas(cfg *config.Server, lookup ipLookupFn) ([]*net.TCPAddr, error) { var dbReplicas []*net.TCPAddr - for _, ap := range cfg.AccessPoints { - apAddr, err := resolveFirstAddr(ap, lookup) + for _, rep := range cfg.MgmtSvcReplicas { + repAddr, err := resolveFirstAddr(rep, lookup) if err != nil { - return nil, config.FaultConfigBadAccessPoints + return nil, config.FaultConfigBadMgmtSvcReplicas } - dbReplicas = append(dbReplicas, apAddr) + dbReplicas = append(dbReplicas, repAddr) } return dbReplicas, nil diff --git a/src/control/server/server_utils_test.go b/src/control/server/server_utils_test.go index 1c8eadb8856..66b2a4595cc 100644 --- a/src/control/server/server_utils_test.go +++ b/src/control/server/server_utils_test.go @@ -636,7 +636,7 @@ func TestServer_prepBdevStorage(t *testing.T) { cfg := config.DefaultServer(). WithFabricProvider("ofi+verbs"). - WithAccessPoints("foo", "bar", "baz") // Suppress redundancy NOTICE log msg + WithMgmtSvcReplicas("foo", "bar", "baz") // Suppress redundancy NOTICE log msg if tc.srvCfgExtra != nil { cfg = tc.srvCfgExtra(cfg) } diff --git a/src/control/server/util_test.go b/src/control/server/util_test.go index 67542c47d3a..cd6869c59b6 100644 --- a/src/control/server/util_test.go +++ b/src/control/server/util_test.go @@ -265,7 +265,7 @@ func newTestMgmtSvc(t *testing.T, log logging.Logger) *mgmtSvc { // newTestMgmtSvcMulti creates a mgmtSvc that contains the requested // number of EngineInstances. If requested, the first instance is -// configured as an access point. +// configured as a MS replica. func newTestMgmtSvcMulti(t *testing.T, log logging.Logger, count int, isAP bool) *mgmtSvc { harness := NewEngineHarness(log) provider := storage.MockProvider(log, 0, nil, nil, nil, nil, nil) diff --git a/src/tests/ftest/config_file_gen.py b/src/tests/ftest/config_file_gen.py index 58f97f3b902..66172cd2201 100755 --- a/src/tests/ftest/config_file_gen.py +++ b/src/tests/ftest/config_file_gen.py @@ -1,6 +1,6 @@ #!/usr/bin/env python3 """ - (C) Copyright 2020-2023 Intel Corporation. + (C) Copyright 2020-2024 Intel Corporation. SPDX-License-Identifier: BSD-2-Clause-Patent """ @@ -32,7 +32,7 @@ def generate_agent_config(args): common_cfg = CommonConfig(args.group_name, DaosAgentTransportCredentials()) config = DaosAgentYamlParameters(args.agent_file, common_cfg) # Update the configuration file access points - config.other_params.access_points.value = args.node_list.split(",") + config.access_points.value = args.node_list.split(",") return create_config(args, config) @@ -51,8 +51,8 @@ def generate_server_config(args): config.engine_params[0].storage.storage_tiers[0].storage_class.value = "ram" config.engine_params[0].storage.storage_tiers[0].scm_mount.value = "/mnt/daos" config.engine_params[0].storage.storage_tiers[0].scm_size.value = 0 - # Update the configuration file access points - config.other_params.access_points.value = args.node_list.split(",") + # Update the configuration file MS replicas + config.mgmt_svc_replicas.value = args.node_list.split(",") return create_config(args, config) @@ -153,13 +153,13 @@ def main(): action="store", type=str, default=None, - help="comma-separated list of node names to use as the access points") + help="comma-separated list of node names to use as the MS replicas") parser.add_argument( "-p", "--port", action="store", type=int, default=None, - help="the access point port") + help="the MS replica port") parser.add_argument( "-s", "--server_file", action="store", diff --git a/src/tests/ftest/control/config_generate_output.py b/src/tests/ftest/control/config_generate_output.py index b574088cf98..35ce1cf521f 100644 --- a/src/tests/ftest/control/config_generate_output.py +++ b/src/tests/ftest/control/config_generate_output.py @@ -119,11 +119,11 @@ def check_errors(self, errors): if errors: self.fail("\n----- Errors detected! -----\n{}".format("\n".join(errors))) - def verify_access_point(self, host_port_input, failure_expected=None): - """Run with given AP and verify the AP in the output. + def verify_ms_replica(self, host_port_input, failure_expected=None): + """Run with given MS replica and verify the MS replica in the output. Args: - host_port_input (str): Host:Port or just Host. Supports multiple APs + host_port_input (str): Host:Port or just Host. Supports multiple MS replicas that are separated by comma. failure_expected (str): Expected error message. Set it to None if not expecting any error. Defaults to None. @@ -147,17 +147,17 @@ def verify_access_point(self, host_port_input, failure_expected=None): try: result = dmg.config_generate( - access_points=host_port_input, net_provider=self.def_provider) + mgmt_svc_replicas=host_port_input, net_provider=self.def_provider) except CommandFailure as err: errors.append("Unexpected failure! {}".format(err)) if result.exit_status == 0 and failure_expected is None: try: yaml_data = yaml.safe_load(result.stdout) - check["actual"] = yaml_data["access_points"] + check["actual"] = yaml_data["mgmt_svc_replicas"] if sorted(check["expected"]) != sorted(check["actual"]): errors.append( - "Unexpected access point: {} != {}".format( + "Unexpected MS replica: {} != {}".format( check["expected"], check["actual"])) except yaml.YAMLError as error: errors.append("Error loading dmg generated config!: {}".format(error)) @@ -195,7 +195,7 @@ def test_basic_config(self): # 1. Call dmg config generate. result = self.get_dmg_command().config_generate( - access_points="wolf-a", net_provider=self.def_provider) + mgmt_svc_replicas="wolf-a", net_provider=self.def_provider) generated_yaml = yaml.safe_load(result.stdout) errors = [] @@ -275,7 +275,7 @@ def test_tmpfs_scm_config(self): # Call dmg config generate. result = self.get_dmg_command().config_generate( - access_points="wolf-a", net_provider=self.def_provider, use_tmpfs_scm=True, + mgmt_svc_replicas="wolf-a", net_provider=self.def_provider, use_tmpfs_scm=True, control_metadata_path=self.test_dir) if result.exit_status != 0: errors.append("Config generate failed with use_tmpfs_scm = True!") @@ -328,76 +328,77 @@ def test_tmpfs_scm_config(self): self.check_errors(errors) - def test_access_points_single(self): - """Test --access-points with single AP with and without port. + def test_mgmt_svc_replicas_single(self): + """Test --ms-replica with single MS replica with and without port. :avocado: tags=all,full_regression :avocado: tags=hw,large - :avocado: tags=control,dmg_config_generate,access_points - :avocado: tags=ConfigGenerateOutput,test_access_points_single + :avocado: tags=control,dmg_config_generate,mgmt_svc_replicas + :avocado: tags=ConfigGenerateOutput,test_mgmt_svc_replicas_single """ errors = [] # Single AP. - errors.extend(self.verify_access_point("wolf-a")) + errors.extend(self.verify_ms_replica("wolf-a")) # Single AP with a valid port. - errors.extend(self.verify_access_point("wolf-a:12345")) + errors.extend(self.verify_ms_replica("wolf-a:12345")) self.check_errors(errors) - def test_access_points_odd(self): - """Test --access-points with odd number of APs. + def test_mgmt_svc_replicas_odd(self): + """Test --ms-replicas with odd number of MS replicas. :avocado: tags=all,full_regression :avocado: tags=hw,large - :avocado: tags=control,dmg_config_generate,access_points - :avocado: tags=ConfigGenerateOutput,test_access_points_odd + :avocado: tags=control,dmg_config_generate,mgmt_svc_replicas + :avocado: tags=ConfigGenerateOutput,test_mgmt_svc_replicas_odd """ errors = [] # Odd AP. - errors.extend(self.verify_access_point("wolf-a,wolf-b,wolf-c")) + errors.extend(self.verify_ms_replica("wolf-a,wolf-b,wolf-c")) # Odd AP with port. - errors.extend(self.verify_access_point("wolf-a:12345,wolf-b:12345,wolf-c:12345")) + errors.extend(self.verify_ms_replica("wolf-a:12345,wolf-b:12345,wolf-c:12345")) self.check_errors(errors) - def test_access_points_invalid(self): - """Test --access-points with invalid port. + def test_mgmt_svc_replicas_invalid(self): + """Test --ms-replicas with invalid port. :avocado: tags=all,full_regression :avocado: tags=hw,large - :avocado: tags=control,dmg_config_generate,access_points - :avocado: tags=ConfigGenerateOutput,test_access_points_invalid + :avocado: tags=control,dmg_config_generate,mgmt_svc_replicas + :avocado: tags=ConfigGenerateOutput,test_mgmt_svc_replicas_invalid """ errors = [] # Even AP. - errors.extend(self.verify_access_point("wolf-a,wolf-b", "non-odd")) + errors.extend(self.verify_ms_replica("wolf-a,wolf-b", "non-odd")) # Single AP with an invalid port. - errors.extend(self.verify_access_point("wolf-a:abcd", "invalid access point port")) + errors.extend(self.verify_ms_replica("wolf-a:abcd", "invalid MS replica port")) # Odd AP with both valid and invalid port. errors.extend( - self.verify_access_point( - "wolf-a:12345,wolf-b:12345,wolf-c:abcd", "invalid access point port")) + self.verify_ms_replica( + "wolf-a:12345,wolf-b:12345,wolf-c:abcd", "invalid MS replica port")) self.check_errors(errors) - def test_access_points_same_ap_repeated(self): - """Test --access-points with the same APs repeated. + def test_mgmt_svc_replicas_same_ap_repeated(self): + """Test --ms-replicas with the same MS replicas repeated. :avocado: tags=all,full_regression :avocado: tags=hw,large - :avocado: tags=control,dmg_config_generate,access_points - :avocado: tags=ConfigGenerateOutput,test_access_points_same_ap_repeated + :avocado: tags=control,dmg_config_generate,mgmt_svc_replicas + :avocado: tags=ConfigGenerateOutput,test_mgmt_svc_replicas_same_ap_repeated """ errors = [] - errors.extend(self.verify_access_point("wolf-a,wolf-a,wolf-a", "duplicate access points")) + errors.extend(self.verify_ms_replica("wolf-a,wolf-a,wolf-a", + "duplicate MS replica addresses")) self.check_errors(errors) @@ -432,7 +433,7 @@ def test_num_engines(self): # Call dmg config generate --num-engines=<1 to max_engine> for num_engines in range(1, max_engine + 1): result = dmg.config_generate( - access_points="wolf-a", num_engines=num_engines, net_provider=self.def_provider) + mgmt_svc_replicas="wolf-a", num_engines=num_engines, net_provider=self.def_provider) generated_yaml = yaml.safe_load(result.stdout) actual_num_engines = len(generated_yaml["engines"]) @@ -444,7 +445,7 @@ def test_num_engines(self): # Verify that max_engine + 1 fails. result = dmg.config_generate( - access_points="wolf-a", num_engines=max_engine + 1, net_provider=self.def_provider) + mgmt_svc_replicas="wolf-a", num_engines=max_engine + 1, net_provider=self.def_provider) if result.exit_status == 0: errors.append("Host + invalid num engines succeeded with {}!".format(max_engine + 1)) @@ -473,7 +474,7 @@ def test_scm_only(self): # Call dmg config generate with --scm-only=False result = dmg.config_generate( - access_points="wolf-a", scm_only=False, net_provider=self.def_provider) + mgmt_svc_replicas="wolf-a", scm_only=False, net_provider=self.def_provider) if result.exit_status != 0: errors.append("config generate failed with scm_only = False!") generated_yaml = yaml.safe_load(result.stdout) @@ -492,7 +493,7 @@ def test_scm_only(self): # Call dmg config generate with --scm-only=True result = dmg.config_generate( - access_points="wolf-a", scm_only=True, net_provider=self.def_provider) + mgmt_svc_replicas="wolf-a", scm_only=True, net_provider=self.def_provider) if result.exit_status != 0: errors.append("config generate failed with scm_only = True!") generated_yaml = yaml.safe_load(result.stdout) @@ -548,7 +549,7 @@ def test_net_class(self): for num_engines in range(1, ib_count + 1): # dmg config generate should pass. result = dmg.config_generate( - access_points="wolf-a", num_engines=num_engines, net_class="infiniband", + mgmt_svc_replicas="wolf-a", num_engines=num_engines, net_class="infiniband", net_provider=self.def_provider) if result.exit_status != 0: @@ -574,7 +575,7 @@ def test_net_class(self): # Call dmg config generate --num-engines= # --net-class=infiniband. Too many engines. Should fail. result = dmg.config_generate( - access_points="wolf-a", num_engines=ib_count + 1, net_class="infiniband", + mgmt_svc_replicas="wolf-a", num_engines=ib_count + 1, net_class="infiniband", net_provider=self.def_provider) if result.exit_status == 0: msg = "config generate succeeded with --net-class=infiniband num_engines = {}!".format( @@ -593,7 +594,7 @@ def test_net_class(self): for num_engines in range(1, eth_count + 1): # dmg config generate should pass. result = dmg.config_generate( - access_points="wolf-a", num_engines=num_engines, net_class="ethernet", + mgmt_svc_replicas="wolf-a", num_engines=num_engines, net_class="ethernet", net_provider=self.def_provider) if result.exit_status != 0: @@ -619,7 +620,7 @@ def test_net_class(self): # Call dmg config generate --num-engines= # --net-class=ethernet. Too many engines. Should fail. result = dmg.config_generate( - access_points="wolf-a", num_engines=eth_count + 1, net_class="ethernet", + mgmt_svc_replicas="wolf-a", num_engines=eth_count + 1, net_class="ethernet", net_provider=self.def_provider) if result.exit_status == 0: msg = "config generate succeeded with --net-class=ethernet, num_engines = {}!".format( diff --git a/src/tests/ftest/control/config_generate_run.py b/src/tests/ftest/control/config_generate_run.py index f73ae0f5c7e..cb03f7f33c5 100644 --- a/src/tests/ftest/control/config_generate_run.py +++ b/src/tests/ftest/control/config_generate_run.py @@ -55,7 +55,7 @@ def test_config_generate_run(self): self.log_step("Generating server configuration") server_host = self.hostlist_servers[0] result = self.get_dmg_command().config_generate( - access_points=server_host, num_engines=num_engines, scm_only=scm_only, + mgmt_svc_replicas=server_host, num_engines=num_engines, scm_only=scm_only, net_class=net_class, net_provider=net_provider, use_tmpfs_scm=use_tmpfs_scm, control_metadata_path=control_metadata) diff --git a/src/tests/ftest/control/config_generate_run.yaml b/src/tests/ftest/control/config_generate_run.yaml index 73a1476a0b4..2e27ee1a982 100644 --- a/src/tests/ftest/control/config_generate_run.yaml +++ b/src/tests/ftest/control/config_generate_run.yaml @@ -24,7 +24,7 @@ dmg: setup: start_servers_once: False config_generate_params: !mux - # 1. Access points only. Use default for others. + # 1. MS replicas only. Use default for others. all_default: net_provider: ofi+tcp # 2. Use one engine. diff --git a/src/tests/ftest/control/daos_control_config.py b/src/tests/ftest/control/daos_control_config.py index 2757816102e..c2d5b61c1c0 100644 --- a/src/tests/ftest/control/daos_control_config.py +++ b/src/tests/ftest/control/daos_control_config.py @@ -1,5 +1,5 @@ """ - (C) Copyright 2020-2023 Intel Corporation. + (C) Copyright 2020-2024 Intel Corporation. SPDX-License-Identifier: BSD-2-Clause-Patent """ @@ -43,7 +43,7 @@ def test_daos_control_config_basic(self): "Error setting the '{}' config file parameter to '{}'".format( c_val[0], c_val[1])) - # Setup the access points with the server hosts + # Setup the hostlist with the server hosts self.log.info( "Executing dmg config with %s = %s, expecting to %s", c_val[0], c_val[1], c_val[2]) diff --git a/src/tests/ftest/control/daos_system_query.py b/src/tests/ftest/control/daos_system_query.py index 52d9c50877e..89b81cd35ab 100644 --- a/src/tests/ftest/control/daos_system_query.py +++ b/src/tests/ftest/control/daos_system_query.py @@ -32,8 +32,8 @@ def test_daos_system_query(self): exp_sys_name = self.server_managers[0].get_config_value("name") exp_provider = self.server_managers[0].get_config_value("provider") - num_access_points = len(self.host_info.access_points) - exp_num_ap_ranks = num_access_points * engines_per_host + num_ms_replicas = len(self.host_info.mgmt_svc_replicas) + exp_num_ms_ranks = num_ms_replicas * engines_per_host query_output = daos_cmd.system_query()["response"] @@ -50,6 +50,6 @@ def test_daos_system_query(self): self.fail("expected {} rank URIs, got '{}'".format(exp_num_ranks, num_ranks)) num_ap_ranks = len(query_output["access_point_rank_uris"]) - if num_ap_ranks != exp_num_ap_ranks: - self.fail("expected {} access point rank URIs, got '{}'".format(exp_num_ap_ranks, + if num_ap_ranks != exp_num_ms_ranks: + self.fail("expected {} access point rank URIs, got '{}'".format(exp_num_ms_ranks, num_ap_ranks)) diff --git a/src/tests/ftest/control/ms_failover.py b/src/tests/ftest/control/ms_failover.py index 92c668aa607..58fa5cb9e37 100644 --- a/src/tests/ftest/control/ms_failover.py +++ b/src/tests/ftest/control/ms_failover.py @@ -1,5 +1,5 @@ """ -(C) Copyright 2021-2023 Intel Corporation. +(C) Copyright 2021-2024 Intel Corporation. SPDX-License-Identifier: BSD-2-Clause-Patent """ @@ -47,7 +47,7 @@ def verify_leader(self, replicas): """Verify the leader of the MS is in the replicas. Args: - replicas (list): list of hostnames representing the access points + replicas (list): list of hostnames representing the replicas for the MS. Returns: @@ -58,7 +58,7 @@ def verify_leader(self, replicas): start = time.time() while not l_hostname and (time.time() - start) < self.L_QUERY_TIMER: l_hostname = self.get_leader() - # Check that the new leader is in the access list + # Check that the new leader is in the MS replica list if l_hostname not in replicas: self.log.error("Selected leader <%s> is not within the replicas" " provided to servers", l_hostname) @@ -78,7 +78,7 @@ def launch_servers(self, replica_count=5): replica_count (int): Number of replicas to launch. Returns: - list: list of access point hosts where MS has been started. + list: list of replica hosts where MS has been started. """ self.log.info("*** launching %d servers", replica_count) @@ -87,7 +87,7 @@ def launch_servers(self, replica_count=5): self.server_group: { "hosts": self.hostlist_servers, - "access_points": replicas, + "mgmt_svc_replicas": replicas, "svr_config_file": None, "dmg_config_file": None, "svr_config_temp": None, diff --git a/src/tests/ftest/control/ms_resilience.py b/src/tests/ftest/control/ms_resilience.py index 8c646a40dbc..61e1f2701c1 100644 --- a/src/tests/ftest/control/ms_resilience.py +++ b/src/tests/ftest/control/ms_resilience.py @@ -1,5 +1,5 @@ """ -(C) Copyright 2021-2023 Intel Corporation. +(C) Copyright 2021-2024 Intel Corporation. SPDX-License-Identifier: BSD-2-Clause-Patent """ @@ -99,7 +99,7 @@ def verify_leader(self, replicas): """Verify the leader of the MS is in the replicas. Args: - replicas (NodeSet): host names representing the access points for the MS. + replicas (NodeSet): host names representing the replicas for the MS. Returns: NodeSet: hostname of the MS leader. @@ -165,7 +165,7 @@ def launch_servers(self, resilience_num): resiliency. Returns: - NodeSet: access point hosts where MS has been started. + NodeSet: replica hosts where MS has been started. """ self.log.info("*** launching %d servers", resilience_num) @@ -175,7 +175,7 @@ def launch_servers(self, resilience_num): self.server_group: { "hosts": self.hostlist_servers, - "access_points": replicas, + "mgmt_svc_replicas": replicas, "svr_config_file": None, "dmg_config_file": None, "svr_config_temp": None, diff --git a/src/tests/ftest/harness/config.py b/src/tests/ftest/harness/config.py index e6237c26e0b..4b93643e883 100644 --- a/src/tests/ftest/harness/config.py +++ b/src/tests/ftest/harness/config.py @@ -1,5 +1,5 @@ """ -(C) Copyright 2021-2022 Intel Corporation. +(C) Copyright 2021-2024 Intel Corporation. SPDX-License-Identifier: BSD-2-Clause-Patent """ @@ -19,7 +19,7 @@ def test_harness_config(self): """Verify the config handling. Verifies the following: - TestWithServers.access_points + TestWithServers.mgmt_svc_replicas DaosAgentYamlParameters.exclude_fabric_ifaces :avocado: tags=all @@ -27,39 +27,41 @@ def test_harness_config(self): :avocado: tags=harness :avocado: tags=HarnessConfigTest,test_harness_config """ - self.log.info('Verify access_points_suffix set from yaml') - access_points_suffix = self.params.get("access_points_suffix", "/run/setup/*") - self.assertEqual(self.access_points_suffix, access_points_suffix) + self.log.info('Verify mgmt_svc_replicas_suffix set from yaml') + mgmt_svc_replicas_suffix = self.params.get("mgmt_svc_replicas_suffix", "/run/setup/*") + self.assertEqual(self.mgmt_svc_replicas_suffix, mgmt_svc_replicas_suffix) - self.log.info('Verify access_points_suffix is appended exactly once') - access_points = nodeset_append_suffix(self.access_points, access_points_suffix) - self.assertEqual(sorted(self.access_points), sorted(access_points)) - access_points = nodeset_append_suffix(access_points, access_points_suffix) - self.assertEqual(sorted(self.access_points), sorted(access_points)) + self.log.info('Verify mgmt_svc_replicas_suffix is appended exactly once') + mgmt_svc_replicas = nodeset_append_suffix(self.mgmt_svc_replicas, mgmt_svc_replicas_suffix) + self.assertEqual(sorted(self.mgmt_svc_replicas), sorted(mgmt_svc_replicas)) + mgmt_svc_replicas = nodeset_append_suffix(mgmt_svc_replicas, mgmt_svc_replicas_suffix) + self.assertEqual(sorted(self.mgmt_svc_replicas), sorted(mgmt_svc_replicas)) self.log.info('Verify self.get_dmg_command().hostlist_suffix and hostlist') dmg = self.get_dmg_command() - self.assertEqual(dmg.hostlist_suffix, self.access_points_suffix) - expected_hostlist = sorted(nodeset_append_suffix(dmg.hostlist, self.access_points_suffix)) + self.assertEqual(dmg.hostlist_suffix, self.mgmt_svc_replicas_suffix) + expected_hostlist = sorted(nodeset_append_suffix(dmg.hostlist, + self.mgmt_svc_replicas_suffix)) self.assertEqual(sorted(dmg.hostlist), expected_hostlist) self.log.info('Verify self.get_dmg_command().yaml.get_yaml_data()["hostlist"]') self.assertEqual(sorted(dmg.yaml.get_yaml_data()['hostlist']), expected_hostlist) self.log.info('Verify DmgCommand().hostlist_suffix and hostlist') - dmg2 = DmgCommand(self.bin, hostlist_suffix=access_points_suffix) + dmg2 = DmgCommand(self.bin, hostlist_suffix=mgmt_svc_replicas_suffix) dmg2.hostlist = dmg.hostlist - self.assertEqual(dmg2.hostlist_suffix, self.access_points_suffix) + self.assertEqual(dmg2.hostlist_suffix, self.mgmt_svc_replicas_suffix) self.assertEqual(sorted(dmg2.hostlist), expected_hostlist) - self.log.info('Verify server_manager...get_yaml_data...access_points"]') + self.log.info('Verify server_manager...get_yaml_data...mgmt_svc_replicas"]') yaml_data = self.server_managers[0].manager.job.yaml.get_yaml_data() - self.assertEqual(sorted(yaml_data['access_points']), sorted(self.access_points)) + self.assertEqual(sorted(yaml_data['mgmt_svc_replicas']), sorted(self.mgmt_svc_replicas)) - self.log.info('Verify daos_server.yaml access_points') + self.log.info('Verify daos_server.yaml mgmt_svc_replicas') with open(self.server_managers[0].manager.job.temporary_file, 'r') as yaml_file: daos_server_yaml = yaml.safe_load(yaml_file.read()) - self.assertEqual(sorted(daos_server_yaml['access_points']), sorted(self.access_points)) + self.assertEqual(sorted(daos_server_yaml['mgmt_svc_replicas']), + sorted(self.mgmt_svc_replicas)) self.log.info('Verify daos_control.yaml hostlist') with open(self.get_dmg_command().temporary_file, 'r') as yaml_file: @@ -69,7 +71,7 @@ def test_harness_config(self): self.log.info('Verify daos_agent.yaml access_points') with open(self.agent_managers[0].manager.job.temporary_file, 'r') as yaml_file: daos_agent_yaml = yaml.safe_load(yaml_file.read()) - self.assertEqual(sorted(daos_agent_yaml['access_points']), sorted(self.access_points)) + self.assertEqual(sorted(daos_agent_yaml['access_points']), sorted(self.mgmt_svc_replicas)) self.log.info('Verify daos_agent.yaml exclude_fabric_ifaces') expected = self.params.get('exclude_fabric_ifaces', '/run/agent_config/*') diff --git a/src/tests/ftest/harness/config.yaml b/src/tests/ftest/harness/config.yaml index be19a641812..4c2f98a7572 100644 --- a/src/tests/ftest/harness/config.yaml +++ b/src/tests/ftest/harness/config.yaml @@ -3,7 +3,7 @@ hosts: test_clients: 1 timeout: 60 setup: - access_points_suffix: .wolf.hpdd.intel.com + mgmt_svc_replicas_suffix: .wolf.hpdd.intel.com server_config: name: daos_server engines_per_host: 1 diff --git a/src/tests/ftest/network/cart_self_test.py b/src/tests/ftest/network/cart_self_test.py index 9dbf0f5e884..afb74c142e7 100644 --- a/src/tests/ftest/network/cart_self_test.py +++ b/src/tests/ftest/network/cart_self_test.py @@ -62,7 +62,7 @@ def setUp(self): self.server_managers[-1], self.hostlist_servers, self.hostfile_servers_slots, - self.access_points) + self.mgmt_svc_replicas) # Setup additional environment variables for the server orterun command self.cart_env["CRT_CTX_NUM"] = "8" diff --git a/src/tests/ftest/pool/destroy.py b/src/tests/ftest/pool/destroy.py index e5dcb6cc521..d9a70534f95 100644 --- a/src/tests/ftest/pool/destroy.py +++ b/src/tests/ftest/pool/destroy.py @@ -1,5 +1,5 @@ """ -(C) Copyright 2018-2023 Intel Corporation. +(C) Copyright 2018-2024 Intel Corporation. SPDX-License-Identifier: BSD-2-Clause-Patent """ @@ -53,13 +53,13 @@ def get_group_info(hosts, svr_config_file=None, dmg_config_file=None, using the config_file specification. Defaults to None. Returns: - dict: a dictionary identifying the hosts and access points for the + dict: a dictionary identifying the hosts and MS replicas for the server group dictionary """ return { "hosts": hosts, - "access_points": hosts[:1], + "mgmt_svc_replicas": hosts[:1], "svr_config_file": svr_config_file, "dmg_config_file": dmg_config_file, "svr_config_temp": svr_config_temp, diff --git a/src/tests/ftest/pool/destroy_rebuild.py b/src/tests/ftest/pool/destroy_rebuild.py index 753e8fb009e..bb4c48ed8df 100644 --- a/src/tests/ftest/pool/destroy_rebuild.py +++ b/src/tests/ftest/pool/destroy_rebuild.py @@ -1,5 +1,5 @@ ''' - (C) Copyright 2018-2023 Intel Corporation. + (C) Copyright 2018-2024 Intel Corporation. SPDX-License-Identifier: BSD-2-Clause-Patent ''' @@ -54,27 +54,27 @@ def test_destroy_while_rebuilding(self): # 3. self.log_step("Start rebuild, system stop") all_ranks = self.server_managers[0].ranks.keys() - ap_ranks = self.server_managers[0].get_host_ranks(self.access_points) - non_ap_ranks = list(set(all_ranks) - set(ap_ranks)) + ms_ranks = self.server_managers[0].get_host_ranks(self.mgmt_svc_replicas) + non_ms_ranks = list(set(all_ranks) - set(ms_ranks)) # Get the pool leader rank pool.set_query_data() leader_rank = pool.query_data["response"]["svc_ldr"] - if leader_rank in ap_ranks: - ap_ranks.remove(leader_rank) - elif leader_rank in non_ap_ranks: - non_ap_ranks.remove(leader_rank) + if leader_rank in ms_ranks: + ms_ranks.remove(leader_rank) + elif leader_rank in non_ms_ranks: + non_ms_ranks.remove(leader_rank) # Select the following ranks to stop # - the pool leader rank - # - a random rank that is not an access point - # - a random rank this is an access point and not the pool leader + # - a random rank that is not a MS replica + # - a random rank this is a MS replica and not the pool leader self.log.debug( - "Engine ranks: pool leader=%s, access points=%s, other=%s", - leader_rank, ap_ranks, non_ap_ranks) + "Engine ranks: pool leader=%s, MS replicas=%s, other=%s", + leader_rank, ms_ranks, non_ms_ranks) ranks = [leader_rank] - ranks.append(random.choice(ap_ranks)) # nosec - ranks.append(random.choice(non_ap_ranks)) # nosec + ranks.append(random.choice(ms_ranks)) # nosec + ranks.append(random.choice(non_ms_ranks)) # nosec self.log.info("ranks to rebuild: %s", ranks) self.server_managers[0].stop_ranks(ranks, self.d_log, force=True) diff --git a/src/tests/ftest/pool/destroy_rebuild.yaml b/src/tests/ftest/pool/destroy_rebuild.yaml index e927f7357e5..f599737977b 100644 --- a/src/tests/ftest/pool/destroy_rebuild.yaml +++ b/src/tests/ftest/pool/destroy_rebuild.yaml @@ -5,7 +5,7 @@ hosts: test_servers: 7 test_clients: 1 setup: - access_points_qty: 5 + mgmt_svc_replicas_qty: 5 server_config: name: daos_server engines_per_host: 1 diff --git a/src/tests/ftest/server/daos_server_config.yaml b/src/tests/ftest/server/daos_server_config.yaml index 0f6716c46ba..429789fcdc7 100644 --- a/src/tests/ftest/server/daos_server_config.yaml +++ b/src/tests/ftest/server/daos_server_config.yaml @@ -44,24 +44,24 @@ server_config_val: !mux - "name" - "amjustin, 12345, abcdef" - "FAIL" - access_point_noport: + mgt_svc_replica_noport: config_val: - - "access_points" + - "mgmt_svc_replicas" - [localhost] - "PASS" - # access_point_repeated: + # mgmt_svc_replica_repeated: # config_val: - # - "access_points" + # - "mgmt_svc_replicas" # - [localhost:10001, localhost:10001] # - "FAIL" - # access_point_wrong_port: + # mgmt_svc_replica_wrong_port: # config_val: - # - "access_points" + # - "mgmt_svc_replicas" # - [localhost:12345] # - "FAIL" - # access_point_wrong_name: + # mgmt_svc_replica_wrong_name: # config_val: - # - "access_points" + # - "mgmt_svc_replicas" # - [wrongName:10001] # - "FAIL" control_log_file_does_not_exist: diff --git a/src/tests/ftest/util/agent_utils_params.py b/src/tests/ftest/util/agent_utils_params.py index 9fe4ddd455e..bd091a162e6 100644 --- a/src/tests/ftest/util/agent_utils_params.py +++ b/src/tests/ftest/util/agent_utils_params.py @@ -72,6 +72,10 @@ def __init__(self, filename, common_yaml): # Enable client telemetry for all client processes. # - telemetry_retain: , e.g. 5m # Time to retain per-client telemetry data. + # - access_points: , e.g. ["hostname1:10001"] + # Hosts can be specified with or without port, default port below + # assumed if not specified. Defaults to the hostname of this node + # at port 10000 for local testing. self.runtime_dir = BasicParameter(None, default_runtime_dir) self.log_file = LogParameter(log_dir, None, "daos_agent.log") self.control_log_mask = BasicParameter(None, "debug") @@ -81,6 +85,7 @@ def __init__(self, filename, common_yaml): self.telemetry_port = BasicParameter(None) self.telemetry_enabled = BasicParameter(None) self.telemetry_retain = BasicParameter(None) + self.access_points = BasicParameter(None, ["localhost"]) def update_log_file(self, name): """Update the log file name for the daos agent. diff --git a/src/tests/ftest/util/apricot/apricot/test.py b/src/tests/ftest/util/apricot/apricot/test.py index bb4dba4f1e0..185849c0189 100644 --- a/src/tests/ftest/util/apricot/apricot/test.py +++ b/src/tests/ftest/util/apricot/apricot/test.py @@ -25,8 +25,9 @@ from environment_utils import TestEnvironment from exception_utils import CommandFailure from fault_config_utils import FaultInjection -from general_utils import (dict_to_str, dump_engines_stacks, get_avocado_config_value, - nodeset_append_suffix, set_avocado_config_value) +from general_utils import (DaosTestError, dict_to_str, dump_engines_stacks, + get_avocado_config_value, nodeset_append_suffix, + set_avocado_config_value) from host_utils import HostException, HostInfo, HostRole, get_host_parameters, get_local_host from logger_utils import TestLogger from pydaos.raw import DaosApiError, DaosContext, DaosLog @@ -665,8 +666,8 @@ def __init__(self, *args, **kwargs): self.__dump_engine_ult_on_failure = True # Whether engines ULT stacks have been already dumped self.__have_dumped_ult_stacks = False - # Suffix to append to each access point name - self.access_points_suffix = None + # Suffix to append to each MS replica name + self.mgmt_svc_replicas_suffix = None def setUp(self): """Set up each test case.""" @@ -723,23 +724,23 @@ def setUp(self): self.hostlist_servers = NodeSet(self.host_info.servers.hosts) self.hostlist_clients = NodeSet(self.host_info.clients.hosts) - # Access points to use by default when starting servers and agents - # - for 1 or 2 servers use 1 access point - # - for 3 or more servers use 3 access points - default_access_points_qty = 1 if len(self.hostlist_servers) < 3 else 3 - access_points_qty = self.params.get( - "access_points_qty", "/run/setup/*", default_access_points_qty) - if access_points_qty < 1 or access_points_qty > len(self.hostlist_servers): - self.fail("Invalid access points node quantity") - default_access_points = self.hostlist_servers[:access_points_qty] - self.access_points = NodeSet( - self.params.get("access_points", "/run/setup/*", default_access_points)) - self.access_points_suffix = self.params.get( - "access_points_suffix", "/run/setup/*", self.access_points_suffix) - if self.access_points_suffix: - self.access_points = nodeset_append_suffix( - self.access_points, self.access_points_suffix) - self.host_info.access_points = self.access_points + # MS replicas to use by default when starting servers and agents + # - for 1 or 2 servers use 1 replica + # - for 3 or more servers use 3 replicas + default_mgmt_svc_replicas_qty = 1 if len(self.hostlist_servers) < 3 else 3 + mgmt_svc_replicas_qty = self.params.get( + "mgmt_svc_replicas_qty", "/run/setup/*", default_mgmt_svc_replicas_qty) + if mgmt_svc_replicas_qty < 1 or mgmt_svc_replicas_qty > len(self.hostlist_servers): + self.fail("Invalid MS replica node quantity") + default_mgmt_svc_replicas = self.hostlist_servers[:mgmt_svc_replicas_qty] + self.mgmt_svc_replicas = NodeSet( + self.params.get("mgmt_svc_replicas", "/run/setup/*", default_mgmt_svc_replicas)) + self.mgmt_svc_replicas_suffix = self.params.get( + "mgmt_svc_replicas_suffix", "/run/setup/*", self.mgmt_svc_replicas_suffix) + if self.mgmt_svc_replicas_suffix: + self.mgmt_svc_replicas = nodeset_append_suffix( + self.mgmt_svc_replicas, self.mgmt_svc_replicas_suffix) + self.host_info.mgmt_svc_replicas = self.mgmt_svc_replicas # Toggle whether to dump server ULT stacks on failure self.__dump_engine_ult_on_failure = self.params.get( @@ -880,9 +881,9 @@ def start_agents(self, agent_groups=None, force=False): Args: agent_groups (dict, optional): dictionary of dictionaries, containing the list of hosts on which to start the daos agent - and the list of server access points, using a unique server + and the list of MS replicas, using a unique server group name key. Defaults to None which will use the server group - name, all of the client hosts, and the access points from the + name, all of the client hosts, and the MS replicas from the test's yaml file to define a single server group entry. force (bool, optional): whether or not to force starting the agents. Defaults to False. @@ -903,9 +904,9 @@ def start_servers(self, server_groups=None, force=False): Args: server_groups (dict, optional): dictionary of dictionaries, containing the list of hosts on which to start the daos server - and the list of access points, using a unique server group name + and the list of MS replicas, using a unique server group name key. Defaults to None which will use the server group name, all - of the server hosts, and the access points from the test's yaml + of the server hosts, and the MS replicas from the test's yaml file to define a single server group entry. force (bool, optional): whether or not to force starting the servers. Defaults to False. @@ -970,9 +971,9 @@ def setup_agents(self, agent_groups=None): Args: agent_groups (dict, optional): dictionary of dictionaries, containing the list of hosts on which to start the daos agent - and the list of server access points, using a unique server + and the list of server MS replicas, using a unique server group name key. Defaults to None which will use the server group - name, all of the client hosts, and the access points from the + name, all of the client hosts, and the MS replicas from the test's yaml file to define a single server group entry. Raises: @@ -986,7 +987,7 @@ def setup_agents(self, agent_groups=None): agent_groups = { self.server_group: { "hosts": include_local_host(self.hostlist_clients), - "access_points": self.access_points + "mgmt_svc_replicas": self.mgmt_svc_replicas } } @@ -1001,7 +1002,7 @@ def setup_agents(self, agent_groups=None): self.agent_managers[-1], info["hosts"], self.hostfile_clients_slots, - info["access_points"]) + info["mgmt_svc_replicas"]) def setup_servers(self, server_groups=None): """Start the daos_server processes. @@ -1009,9 +1010,9 @@ def setup_servers(self, server_groups=None): Args: server_groups (dict, optional): dictionary of dictionaries, containing the list of hosts on which to start the daos server - and the list of access points, using a unique server group name + and the list of MS replicas, using a unique server group name key. Defaults to None which will use the server group name, all - of the server hosts, and the access points from the test's yaml + of the server hosts, and the MS replicas from the test's yaml file to define a single server group entry. Raises: @@ -1023,7 +1024,7 @@ def setup_servers(self, server_groups=None): server_groups = { self.server_group: { "hosts": self.hostlist_servers, - "access_points": self.access_points, + "mgmt_svc_replicas": self.mgmt_svc_replicas, "svr_config_file": None, "dmg_config_file": None, "svr_config_temp": None, @@ -1044,7 +1045,7 @@ def setup_servers(self, server_groups=None): self.server_managers[-1], info["hosts"], self.hostfile_servers_slots, - info["access_points"]) + info["mgmt_svc_replicas"]) def get_config_file(self, name, command, path=None): """Get the yaml configuration file. @@ -1164,7 +1165,7 @@ def add_server_manager(self, group=None, svr_config_file=None, DaosServerManager( group, self.bin, svr_cert_dir, svr_config_file, dmg_cert_dir, dmg_config_file, svr_config_temp, dmg_config_temp, - self.server_manager_class, access_points_suffix=self.access_points_suffix) + self.server_manager_class, mgmt_svc_replicas_suffix=self.mgmt_svc_replicas_suffix) ) if self.server_config_namespace is not None: self.log.debug( @@ -1172,7 +1173,7 @@ def add_server_manager(self, group=None, svr_config_file=None, len(self.server_managers) - 1, self.server_config_namespace) self.server_managers[-1].manager.job.yaml.namespace = self.server_config_namespace - def configure_manager(self, name, manager, hosts, slots, access_points=None): + def configure_manager(self, name, manager, hosts, slots, mgmt_svc_replicas=None): """Configure the agent/server manager object. Defines the environment variables, host list, and hostfile settings used @@ -1183,17 +1184,22 @@ def configure_manager(self, name, manager, hosts, slots, access_points=None): manager (SubprocessManager): the daos agent/server process manager hosts (NodeSet): hosts on which to start the daos agent/server slots (int): number of slots per engine to define in the hostfile - access_points (NodeSet): access point hosts. Defaults to None which - uses self.access_points. + mgmt_svc_replicas (NodeSet): MS replica hosts. Defaults to None which + uses self.mgmt_svc_replicas. """ self.log.info("-" * 100) self.log.info("--- CONFIGURING %s MANAGER ---", name.upper()) - if access_points is None: - access_points = NodeSet(self.access_points) + if mgmt_svc_replicas is None: + mgmt_svc_replicas = NodeSet(self.mgmt_svc_replicas) # Calling get_params() will set the test-specific log names manager.get_params(self) - manager.set_config_value("access_points", list(access_points)) + if name == "server": + manager.set_config_value("mgmt_svc_replicas", list(mgmt_svc_replicas)) + elif name == "agent": + manager.set_config_value("access_points", list(mgmt_svc_replicas)) + else: + raise DaosTestError("invalid manager name: {}".format(name)) manager.manager.assign_environment( EnvironmentVariables({"PATH": None}), True) manager.hosts = (hosts, self.workdir, slots) @@ -1653,13 +1659,13 @@ def get_dmg_command(self, index=0): """Get a DmgCommand setup to interact with server manager index. Return a DmgCommand object configured with: - - the "-l" parameter assigned to the server's access point list + - the "-l" parameter assigned to the server's MS replica list - the "-i" parameter assigned to the server's interactive mode This method is intended to be used by tests that wants to use dmg to create and destroy pool. Pass in the object to TestPool constructor. - Access point should be passed in to -l regardless of the number of + MS replica should be passed in to -l regardless of the number of servers. Args: @@ -1683,8 +1689,8 @@ def get_dmg_command(self, index=0): dmg_cmd = get_dmg_command( self.server_group, dmg_cert_dir, self.bin, dmg_config_file, - dmg_config_temp, self.access_points_suffix) - dmg_cmd.hostlist = self.access_points + dmg_config_temp, self.mgmt_svc_replicas_suffix) + dmg_cmd.hostlist = self.mgmt_svc_replicas return dmg_cmd def get_daos_command(self): @@ -1829,7 +1835,7 @@ def add_container_qty(self, quantity, pool, namespace=CONT_NAMESPACE, create=Tru self.container.append( self.get_container(pool=pool, namespace=namespace, create=create)) - def start_additional_servers(self, additional_servers, index=0, access_points=None): + def start_additional_servers(self, additional_servers, index=0, mgmt_svc_replicas=None): """Start additional servers. This method can be used to start a new daos_server during a test. @@ -1838,8 +1844,8 @@ def start_additional_servers(self, additional_servers, index=0, access_points=No additional_servers (NodeSet): hosts on which to start daos_server. index (int): Determines which server_managers to use when creating the new server. - access_points (NodeSet): access point hosts. Defaults to None which - uses self.access_points. + mgmt_svc_replicas (NodeSet): MS replica hosts. Defaults to None which + uses self.mgmt_svc_replicas. """ self.add_server_manager( self.server_managers[index].manager.job.get_config_value("name"), @@ -1853,6 +1859,6 @@ def start_additional_servers(self, additional_servers, index=0, access_points=No self.server_managers[-1], additional_servers, self.hostfile_servers_slots, - access_points + mgmt_svc_replicas ) self._start_manager_list("server", [self.server_managers[-1]]) diff --git a/src/tests/ftest/util/command_utils_base.py b/src/tests/ftest/util/command_utils_base.py index 837e61b339f..d867fbba4c8 100644 --- a/src/tests/ftest/util/command_utils_base.py +++ b/src/tests/ftest/util/command_utils_base.py @@ -794,7 +794,6 @@ class CommonConfig(YamlParameters): Includes: - the daos system name (name) - - a list of access point nodes (access_points) - the default port number (port) - the transport credentials """ @@ -812,18 +811,12 @@ def __init__(self, name, transport): # - name: , e.g. "daos_server" # Name associated with the DAOS system. # - # - access_points: , e.g. ["hostname1:10001"] - # Hosts can be specified with or without port, default port below - # assumed if not specified. Defaults to the hostname of this node - # at port 10000 for local testing - # # - port: , e.g. 10001 # Default port number with with to bind the daos_server. This # will also be used when connecting to access points if the list # only contains host names. # self.name = BasicParameter(None, name) - self.access_points = BasicParameter(None, ["localhost"]) self.port = BasicParameter(None, 10001) def _get_new(self): diff --git a/src/tests/ftest/util/dmg_utils.py b/src/tests/ftest/util/dmg_utils.py index 881cb298a0a..cbca403895a 100644 --- a/src/tests/ftest/util/dmg_utils.py +++ b/src/tests/ftest/util/dmg_utils.py @@ -1216,13 +1216,13 @@ def pool_evict(self, pool): """ return self._get_result(("pool", "evict"), pool=pool) - def config_generate(self, access_points, num_engines=None, scm_only=False, + def config_generate(self, mgmt_svc_replicas, num_engines=None, scm_only=False, net_class=None, net_provider=None, use_tmpfs_scm=False, control_metadata_path=None): """Produce a server configuration. Args: - access_points (str): Comma separated list of access point addresses. + mgmt_svc_replicas (str): Comma separated list of MS replica addresses. num_pmem (int): Number of SCM (pmem) devices required per storage host in DAOS system. Defaults to None. scm_only (bool, option): Whether to omit NVMe from generated config. @@ -1242,7 +1242,7 @@ def config_generate(self, access_points, num_engines=None, scm_only=False, """ return self._get_result( - ("config", "generate"), access_points=access_points, + ("config", "generate"), mgmt_svc_replicas=mgmt_svc_replicas, num_engines=num_engines, scm_only=scm_only, net_class=net_class, net_provider=net_provider, use_tmpfs_scm=use_tmpfs_scm, control_metadata_path=control_metadata_path) diff --git a/src/tests/ftest/util/dmg_utils_base.py b/src/tests/ftest/util/dmg_utils_base.py index 7e5d2300a53..4787c05d393 100644 --- a/src/tests/ftest/util/dmg_utils_base.py +++ b/src/tests/ftest/util/dmg_utils_base.py @@ -235,7 +235,7 @@ def __init__(self): super( DmgCommandBase.ConfigSubCommand.GenerateSubCommand, self).__init__("/run/dmg/config/generate/*", "generate") - self.access_points = FormattedParameter("--access-points={}", None) + self.mgmt_svc_replicas = FormattedParameter("--ms-replicas={}", None) self.num_engines = FormattedParameter("--num-engines={}", None) self.scm_only = FormattedParameter("--scm-only", False) self.net_class = FormattedParameter("--net-class={}", None) diff --git a/src/tests/ftest/util/host_utils.py b/src/tests/ftest/util/host_utils.py index 83add3924c2..5277321df4a 100644 --- a/src/tests/ftest/util/host_utils.py +++ b/src/tests/ftest/util/host_utils.py @@ -1,5 +1,5 @@ """ -(C) Copyright 2018-2023 Intel Corporation. +(C) Copyright 2018-2024 Intel Corporation. SPDX-License-Identifier: BSD-2-Clause-Patent """ @@ -75,7 +75,7 @@ def __init__(self): """Initialize a HostInfo object.""" self._servers = HostRole() self._clients = HostRole() - self.access_points = NodeSet() + self.mgmt_svc_replicas = NodeSet() @property def all_hosts(self): @@ -119,7 +119,7 @@ def display(self, log): log.info("client_partition: %s", self.clients.partition.name) log.info("server_reservation: %s", self.servers.partition.reservation) log.info("client_reservation: %s", self.clients.partition.reservation) - log.info("access_points: %s", self.access_points) + log.info("mgmt_svc_replicas: %s", self.mgmt_svc_replicas) def set_hosts(self, log, control_host, server_hosts, server_partition, server_reservation, client_hosts, client_partition, client_reservation, include_local_host=False): diff --git a/src/tests/ftest/util/server_utils.py b/src/tests/ftest/util/server_utils.py index 752473021a3..ec79f029c6e 100644 --- a/src/tests/ftest/util/server_utils.py +++ b/src/tests/ftest/util/server_utils.py @@ -71,7 +71,7 @@ class DaosServerManager(SubprocessManager): def __init__(self, group, bin_dir, svr_cert_dir, svr_config_file, dmg_cert_dir, dmg_config_file, svr_config_temp=None, dmg_config_temp=None, manager="Orterun", - namespace="/run/server_manager/*", access_points_suffix=None): + namespace="/run/server_manager/*", mgmt_svc_replicas_suffix=None): # pylint: disable=too-many-arguments """Initialize a DaosServerManager object. @@ -92,7 +92,7 @@ def __init__(self, group, bin_dir, manage the YamlCommand defined through the "job" attribute. Defaults to "Orterun". namespace (str): yaml namespace (path to parameters) - access_points_suffix (str, optional): Suffix to append to each access point name. + mgmt_svc_replicas_suffix (str, optional): Suffix to append to each MS replica name. Defaults to None. """ self.group = group @@ -104,7 +104,8 @@ def __init__(self, group, bin_dir, # Dmg command to access this group of servers which will be configured # to access the daos_servers when they are started self.dmg = get_dmg_command( - group, dmg_cert_dir, bin_dir, dmg_config_file, dmg_config_temp, access_points_suffix) + group, dmg_cert_dir, bin_dir, dmg_config_file, dmg_config_temp, + mgmt_svc_replicas_suffix) # Set the correct certificate file ownership if manager == "Systemctl": @@ -166,7 +167,7 @@ def management_service_hosts(self): NodeSet: the hosts running the management service """ - return NodeSet.fromlist(self.get_config_value('access_points')) + return NodeSet.fromlist(self.get_config_value('mgmt_svc_replicas')) @property def management_service_ranks(self): @@ -202,7 +203,7 @@ def prepare_dmg(self, hosts=None): Args: hosts (list, optional): dmg hostlist value. Defaults to None which - results in using the 'access_points' host list. + results in using the 'mgmt_svc_replicas' host list. """ self._prepare_dmg_certificates() self._prepare_dmg_hostlist(hosts) diff --git a/src/tests/ftest/util/server_utils_params.py b/src/tests/ftest/util/server_utils_params.py index 46db4891220..19dd8ea4df3 100644 --- a/src/tests/ftest/util/server_utils_params.py +++ b/src/tests/ftest/util/server_utils_params.py @@ -107,6 +107,10 @@ def __init__(self, filename, common_yaml): # is set for the running process. If group look up fails or user # is not member, use uid return from user lookup. # + # - mgmt_svc_replicas: , e.g. ["hostname1:10001"] + # Hosts can be specified with or without port, default port below + # assumed if not specified. Defaults to the hostname of this node + # at port 10000 for local testing. default_provider = os.environ.get("D_PROVIDER", "ofi+tcp") # All log files should be placed in the same directory on each host to @@ -133,6 +137,7 @@ def __init__(self, filename, common_yaml): self.helper_log_file = LogParameter(log_dir, None, "daos_server_helper.log") self.telemetry_port = BasicParameter(None, 9191) self.client_env_vars = BasicParameter(None) + self.mgmt_svc_replicas = BasicParameter(None, ["localhost"]) # Used to drop privileges before starting data plane # (if started as root to perform hardware provisioning) diff --git a/src/tests/ftest/util/test_utils_pool.py b/src/tests/ftest/util/test_utils_pool.py index e75510ec8b5..71f05bb131e 100644 --- a/src/tests/ftest/util/test_utils_pool.py +++ b/src/tests/ftest/util/test_utils_pool.py @@ -194,7 +194,7 @@ def __init__(self, context, dmg_command, label_generator=None, namespace=POOL_NA self.context when calling from a test. dmg_command (DmgCommand): DmgCommand used to call dmg command. This value can be obtained by calling self.get_dmg_command() from a - test. It'll return the object with -l + test. It'll return the object with -l and --insecure. label_generator (LabelGenerator, optional): Generates label by adding number to the end of the prefix set in self.label. diff --git a/utils/config/daos_server.yml b/utils/config/daos_server.yml index ea6bf7e5154..a9642631f2d 100644 --- a/utils/config/daos_server.yml +++ b/utils/config/daos_server.yml @@ -15,16 +15,20 @@ #name: daos_server # # -## Access points +## MS replicas ## Immutable after running "dmg storage format". # -## To operate, DAOS will need a quorum of access point nodes to be available. -## Must have the same value for all agents and servers in a system. +## To operate, DAOS requires a quorum of Management Service (MS) replica +## hosts to be available. All servers (replica or otherwise) must have the +## same list of replicas in order for the system to operate correctly. Choose +## 3-5 hosts to serve as replicas, preferably not co-located within the same +## fault domains. +## ## Hosts can be specified with or without port. The default port that is set ## up in port: will be used if a port is not specified here. # ## default: hostname of this node -#access_points: ['hostname1'] +#mgmt_svc_replicas: ['hostname1', 'hostname2', 'hostname3'] # # ## Control plane metadata @@ -46,7 +50,7 @@ ## Default control plane port # ## Port number to bind daos_server to. This will also be used when connecting -## to access points, unless a port is specified in access_points: +## to MS replicas, unless a port is specified in mgmt_svc_replicas: # ## default: 10001 #port: 10001 diff --git a/utils/config/examples/daos_server_local.yml b/utils/config/examples/daos_server_local.yml index 814ac659824..ac5bb6ee808 100644 --- a/utils/config/examples/daos_server_local.yml +++ b/utils/config/examples/daos_server_local.yml @@ -1,7 +1,7 @@ # For a single-server system name: daos_server -access_points: ['localhost'] +mgmt_svc_replicas: ['localhost'] provider: ofi+tcp control_log_file: /tmp/daos_server.log transport_config: diff --git a/utils/config/examples/daos_server_mdonssd.yml b/utils/config/examples/daos_server_mdonssd.yml index 090df281a6b..8b73e53e431 100644 --- a/utils/config/examples/daos_server_mdonssd.yml +++ b/utils/config/examples/daos_server_mdonssd.yml @@ -1,8 +1,11 @@ # Example configuration file for Metadata on SSD. -name: daos_server # sys group daos_server -access_points: ['example'] # management service leader (bootstrap) -# port: 10001 # control listen port, default 10001 +# sys group daos_server +name: daos_server +# management service replicas +mgmt_svc_replicas: ['example1', 'example2', 'example3'] +# control listen port, default 10001 +# port: 10001 provider: ofi+tcp control_log_mask: INFO control_log_file: /tmp/daos_server.log diff --git a/utils/config/examples/daos_server_tcp.yml b/utils/config/examples/daos_server_tcp.yml index 635abe89dce..38f40d7ec67 100644 --- a/utils/config/examples/daos_server_tcp.yml +++ b/utils/config/examples/daos_server_tcp.yml @@ -1,8 +1,11 @@ # Example configuration file using TCP sockets -name: daos_server # sys group daos_server -access_points: ['example'] # management service leader (bootstrap) -# port: 10001 # control listen port, default 10001 +# sys group daos_server +name: daos_server +# management service replicas +mgmt_svc_replicas: ['example1', 'example2', 'example3'] +# control listen port, default 10001 +# port: 10001 provider: ofi+tcp control_log_mask: DEBUG control_log_file: /tmp/daos_server.log diff --git a/utils/config/examples/daos_server_ucx.yml b/utils/config/examples/daos_server_ucx.yml index bdd35a4c647..bd413af495e 100644 --- a/utils/config/examples/daos_server_ucx.yml +++ b/utils/config/examples/daos_server_ucx.yml @@ -1,8 +1,11 @@ # Example configuration file for UCX -name: daos_server # sys group daos_server -access_points: ['example'] # management service leader (bootstrap) -# port: 10001 # control listen port, default 10001 +# sys group daos_server +name: daos_server +# management service replicas +mgmt_svc_replicas: ['example1', 'example2', 'example3'] +# control listen port, default 10001 +# port: 10001 # UCX providers: # diff --git a/utils/config/examples/daos_server_verbs.yml b/utils/config/examples/daos_server_verbs.yml index 667992351fc..32146674739 100644 --- a/utils/config/examples/daos_server_verbs.yml +++ b/utils/config/examples/daos_server_verbs.yml @@ -1,8 +1,11 @@ # Example configuration file for verbs -name: daos_server # sys group daos_server -access_points: ['example'] # management service leader (bootstrap) -# port: 10001 # control listen port, default 10001 +# sys group daos_server +name: daos_server +# management service replicas +mgmt_svc_replicas: ['example1', 'example2', 'example3'] +# control listen port, default 10001 +# port: 10001 provider: ofi+verbs control_log_mask: INFO control_log_file: /tmp/daos_server.log diff --git a/utils/docker/examples/daos-server/el8/daos_server.yml.example b/utils/docker/examples/daos-server/el8/daos_server.yml.example index 64d3baf7e79..8d0e90f8f07 100644 --- a/utils/docker/examples/daos-server/el8/daos_server.yml.example +++ b/utils/docker/examples/daos-server/el8/daos_server.yml.example @@ -6,7 +6,7 @@ # https://github.com/daos-stack/daos/blob/master/utils/config/daos_server.yml name: daos_server -access_points: ['localhost'] +mgmt_svc_replicas: ['localhost'] port: 10001 provider: ofi+tcp;ofi_rxm diff --git a/utils/docker/vcluster/daos-server/el8/daos_server.yml.in b/utils/docker/vcluster/daos-server/el8/daos_server.yml.in index 0d9ea2086f9..a0b1bd3ef01 100644 --- a/utils/docker/vcluster/daos-server/el8/daos_server.yml.in +++ b/utils/docker/vcluster/daos-server/el8/daos_server.yml.in @@ -1,7 +1,7 @@ # DAOS Server configuration file name: daos_server -access_points: ['daos-server'] +mgmt_svc_replicas: ['daos-server'] port: 10001 provider: ofi+tcp diff --git a/utils/nlt_server.yaml b/utils/nlt_server.yaml index d30dd9721bf..1e24e4d9c3c 100644 --- a/utils/nlt_server.yaml +++ b/utils/nlt_server.yaml @@ -3,7 +3,7 @@ port: 10001 provider: ofi+tcp disable_hugepages: true control_log_mask: DEBUG -access_points: ['localhost:10001'] +mgmt_svc_replicas: ['localhost:10001'] engines: - targets: 4 diff --git a/utils/node_local_test.py b/utils/node_local_test.py index 4d58999e5c1..727f2214ecf 100755 --- a/utils/node_local_test.py +++ b/utils/node_local_test.py @@ -779,7 +779,7 @@ def _start(self): agent_config = join(self.agent_dir, 'nlt_agent.yaml') with open(agent_config, 'w') as fd: agent_data = { - 'access_points': scyaml['access_points'], + 'access_points': scyaml['mgmt_svc_replicas'], 'control_log_mask': 'NOTICE', # INFO logs every client process connection } json.dump(agent_data, fd)