Skip to content

Commit

Permalink
DAOS-16840 control: Deprecate access_points in server config (#15548)
Browse files Browse the repository at this point in the history
The access_points name in the server configuration is
an ongoing source of confusion. Rename it to the more
descriptive mgmt_svc_replicas and emit deprecation
notices for older configurations.

Signed-off-by: Michael MacDonald <[email protected]>
  • Loading branch information
mjmac authored Dec 17, 2024
1 parent b31f634 commit b020347
Show file tree
Hide file tree
Showing 66 changed files with 582 additions and 502 deletions.
4 changes: 2 additions & 2 deletions docs/QSG/qemu-vms.md
Original file line number Diff line number Diff line change
Expand Up @@ -199,10 +199,10 @@ I follow these [steps](https://docs.daos.io/latest/QSG/setup_rhel/) to install b

5. Update config files.

Update the daos-server config file `/etc/daos/daos_server.yml` on daos-server. You may need to update "access\_points", "fabric\_iface" and "bdev\_list". Update "access\_points" accordingly if you name daos-server differently. Check if the network device has the same name as listed under "fabric\_iface". Look in the output of `lspci` for "bdev\_list". The info for our NVMe controller is like *??:??:? Non-Volatile memory controller: Red Hat, Inc. QEMU NVM Express Controller (rev 02)*. Prefix *??:??.?* is the address of the NVMe devices.
Update the daos-server config file `/etc/daos/daos_server.yml` on daos-server. You may need to update "mgmt\_svc\_replicas", "fabric\_iface" and "bdev\_list". Update "mgmt\_svc\_replicas" accordingly if you name daos-server differently. Check if the network device has the same name as listed under "fabric\_iface". Look in the output of `lspci` for "bdev\_list". The info for our NVMe controller is like *??:??:? Non-Volatile memory controller: Red Hat, Inc. QEMU NVM Express Controller (rev 02)*. Prefix *??:??.?* is the address of the NVMe devices.
```
name: daos_server
access_points:
mgmt_svc_replicas:
- daos-server
port: 10001
Expand Down
2 changes: 1 addition & 1 deletion docs/QSG/setup_rhel.md
Original file line number Diff line number Diff line change
Expand Up @@ -273,7 +273,7 @@ Examples are available on [github](https://github.com/daos-stack/daos/tree/maste


name: daos_server
access_points:
mgmt_svc_replicas:
- server-1
port: 10001

Expand Down
2 changes: 1 addition & 1 deletion docs/QSG/setup_suse.md
Original file line number Diff line number Diff line change
Expand Up @@ -292,7 +292,7 @@ Examples are available on [github](https://github.com/daos-stack/daos/tree/maste
An example of the daos_server.yml is presented below. Copy the modified server yaml file to all the server nodes at `/etc/daos/daos_server.yml`.

name: daos_server
access_points:
mgmt_svc_replicas:
- node-4
port: 10001

Expand Down
2 changes: 1 addition & 1 deletion docs/QSG/tour.md
Original file line number Diff line number Diff line change
Expand Up @@ -223,7 +223,7 @@ bring-up DAOS servers and clients.
### Run dfuse

# Bring up 4 hosts server with appropriate daos_server.yml and
# access-point, reference to DAOS Set-Up
# MS replicas, reference to DAOS Set-Up
# After DAOS servers, DAOS admin and client started.

$ dmg storage format
Expand Down
20 changes: 11 additions & 9 deletions docs/admin/administration.md
Original file line number Diff line number Diff line change
Expand Up @@ -825,9 +825,9 @@ device would remain in this state until replaced by a new device.
## System Operations
The DAOS server acting as the access point records details of engines
that join the DAOS system. Once an engine has joined the DAOS system, it is
identified by a unique system "rank". Multiple ranks can reside on the same
The DAOS server acting as the Management Service (MS) leader records details
of engines that join the DAOS system. Once an engine has joined the DAOS system,
it is identified by a unique system "rank". Multiple ranks can reside on the same
host machine, accessible via the same network address.
A DAOS system can be shutdown and restarted to perform maintenance and/or
Expand All @@ -837,14 +837,14 @@ made to the rank's metadata stored on persistent memory.
Storage reformat can also be performed after system shutdown. Pools will be
removed and storage wiped.
System commands will be handled by a DAOS Server acting as access point and
System commands will be handled by a DAOS Server acting as the MS leader and
listening on the address specified in the DMG config file "hostlist" parameter.
See
[`daos_control.yml`](https://github.com/daos-stack/daos/blob/master/utils/config/daos_control.yml)
for details.
At least one of the addresses in the hostlist parameters should match one of the
"access point" addresses specified in the server config file
`mgmt_svc_replicas` addresses specified in the server config file
[`daos_server.yml`](https://github.com/daos-stack/daos/blob/master/utils/config/daos_server.yml)
that is supplied when starting `daos_server` instances.
Expand Down Expand Up @@ -1028,13 +1028,15 @@ formatted again by running `dmg storage format`.
To add a new server to an existing DAOS system, one should install:
- the relevant certificates
- the server yaml file pointing to the access points of the running
DAOS system
- A copy of the relevant certificates from an existing server. All servers must
share the same set of certificates in order to provide services.
- A copy of the server yaml file from an existing server (DAOS server configurations
should be homogeneous) -- the `mgmt_svc_replicas` entry is used by the new server in
order to know which servers should handle its SystemJoin request.
The daos\_control.yml file should also be updated to include the new DAOS server.
Then starts the daos\_server via systemd and format the new server via
Then start the daos\_server via systemd and format the new server via
dmg as follows:
```
Expand Down
4 changes: 2 additions & 2 deletions docs/admin/common_tasks.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ This section describes some of the common tasks handled by admins at a high leve
3. Install `daos-server` and `daos-client` RPMs.
4. Generate certificate files.
5. Copy one of the example configs from `utils/config/examples` to
`/etc/daos` and adjust it based on the environment. E.g., `access_points`,
`/etc/daos` and adjust it based on the environment. E.g., `mgmt_svc_replicas`,
`class`.
6. Check that the directory where the log files will be created exists. E.g.,
`control_log_file`, `log_file` field in `engines` section.
Expand Down Expand Up @@ -38,7 +38,7 @@ to server hosts and `daos-client` to client hosts.
4. Generate certificate files and distribute them to all the hosts.
5. Copy one of the example configs from `utils/config/examples` to
`/etc/daos` of one of the server hosts and adjust it based on the environment.
E.g., `access_points`, `class`.
E.g., `mgmt_svc_replicas`, `class`.
6. Check that the directory where the log files will be created exists. E.g.,
`control_log_file`, `log_file` field in `engines` section.
7. Start `daos_server`.
Expand Down
24 changes: 11 additions & 13 deletions docs/admin/deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ A recommended workflow to get up and running is as follows:
server config file (default location at `/etc/daos/daos_server.yml`) has not
yet been populated.

* Run `dmg config generate -l <hostset> -a <access_points>` across the entire
* Run `dmg config generate -l <hostset> -r <ms_replicas>` across the entire
hostset (all the storage servers that are now running the `daos_server` service
after RPM install).
The command will only generate a config if hardware setups on all the hosts are
Expand Down Expand Up @@ -285,7 +285,7 @@ Help Options:

[generate command options]
-l, --helper-log-file= Log file location for debug from daos_server_helper binary
-a, --access-points= Comma separated list of access point addresses
-r, --ms-replicas= Comma separated list of MS replica addresses
<ipv4addr/hostname> (default: localhost)
-e, --num-engines= Set the number of DAOS Engine sections to be populated in the
config file output. If unset then the value will be set to the
Expand Down Expand Up @@ -331,7 +331,7 @@ Help Options:

[generate command options]
-l, --host-list= A comma separated list of addresses <ipv4addr/hostname> to connect to
-a, --access-points= Comma separated list of access point addresses <ipv4addr/hostname>
-r, --ms-replicas= Comma separated list of MS replica addresses <ipv4addr/hostname>
to host management service (default: localhost)
-e, --num-engines= Set the number of DAOS Engine sections to be populated in the
config file output. If unset then the value will be set to the
Expand Down Expand Up @@ -371,8 +371,8 @@ engines:

The options that can be supplied to the config generate command are as follows:

- `--access-points` specifies the access points (identified storage servers that will host the
management service for the DAOS system across the cluster).
- `--ms-replicas` specifies the MS replicas (identified storage servers that will host the
Management Service for the DAOS system across the cluster).

- `--num-engines` specifies the number of engine sections to populate in the config file output.
If not set explicitly on the commandline, default is the number of NUMA nodes detected on the host.
Expand Down Expand Up @@ -502,7 +502,7 @@ core_dump_filter: 19
name: daos_server
socket_dir: /var/run/daos_server
provider: ofi+tcp
access_points:
mgmt_svc_replicas:
- localhost:10001
fault_cb: ""
hyperthreads: false
Expand All @@ -515,7 +515,7 @@ and runs until the point where a storage format is required, as expected.
[user@wolf-226 daos]$ install/bin/daos_server start -i -o ~/configs/tmp.yml
DAOS Server config loaded from /home/user/configs/tmp.yml
install/bin/daos_server logging to file /tmp/daos_server.log
NOTICE: Configuration includes only one access point. This provides no redundancy in the event of an access point failure.
NOTICE: Configuration includes only one MS replica. This provides no redundancy in the event of a MS replica failure.
DAOS Control Server v2.3.101 (pid 1211553) listening on 127.0.0.1:10001
Checking DAOS I/O Engine instance 0 storage ...
Checking DAOS I/O Engine instance 1 storage ...
Expand Down Expand Up @@ -821,8 +821,6 @@ To set the addresses of which DAOS Servers to task, provide either:

Where `<hostlist>` represents a slurm-style hostlist string e.g.
`foo-1[28-63],bar[256-511]`.
The first entry in the hostlist (after alphabetic then numeric sorting) will be
assumed to be the access point as set in the server configuration file.

Local configuration files stored in the user directory will be used in
preference to the default location e.g. `~/.daos_control.yml`.
Expand Down Expand Up @@ -1322,7 +1320,7 @@ as follows to establish 2-tier storage:
```yaml
<snip>
port: 10001
access_points: ["wolf-71"] # <----- updated
mgmt_svc_replicas: ["wolf-71"] # <----- updated
<snip>
engines:
-
Expand Down Expand Up @@ -1367,10 +1365,10 @@ information, please refer to the [DAOS build documentation][6].
DAOS Control Servers will need to be restarted on all hosts after updates to the server
configuration file.

Pick an odd number of hosts in the system and set `access_points` to list of that host's
hostname or IP address (don't need to specify port).
Pick an odd number (3-7) of hosts in the system and set the `mgmt_svc_replicas` list to
include the hostnames or IP addresses (don't need to specify port) of those hosts.

This will be the host which bootstraps the DAOS management service (MS).
This will be the set of servers which host the replicated DAOS management service (MS).

>The support of the optional providers is not guarantee and can be removed
>without further notification.
Expand Down
6 changes: 3 additions & 3 deletions docs/admin/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -313,7 +313,7 @@ sudo ipcrm -M 0x10242049
1. Format the SCMs defined in the config file.
1. Generate the config file using `dmg config generate`. The various requirements will be populated without a syntax error.
1. Try starting with `allow_insecure: true`. This will rule out the credential certificate issue.
1. Verify that the `access_points` host is accessible and the port is not used.
1. Verify that the `mgmt_svc_replicas` host is accessible and the port is not used.
1. Check the `provider` entry. See the "Network Scan and Configuration" section of the admin guide for determining the right provider to use.
1. Check `fabric_iface` in `engines`. They should be available and enabled.
1. Check that `socket_dir` is writable by the daos_server.
Expand All @@ -327,7 +327,7 @@ sudo ipcrm -M 0x10242049
1. When the server configuration is changed, it's necessary to restart the agent.
1. `DER_UNREACH(-1006)`: Check the socket ID consistency between PMem and NVMe. First, determine which socket you're using with `daos_server network scan -p all`. e.g., if the interface you're using in the engine section is eth0, find which NUMA Socket it belongs to. Next, determine the disks you can use with this socket by calling `daos_server nvme scan` or `dmg storage scan`. e.g., if eth0 belongs to NUMA Socket 0, use only the disks with 0 in the Socket ID column.
1. Check the interface used in the server config (`fabric_iface`) also exists in the client and can communicate with the server.
1. Check the access_points of the agent config points to the correct server host.
1. Check the `access_points` of the agent config points to the correct server hosts.
1. Call `daos pool query` and check that the pool exists and has free space.

### Applications run slow
Expand Down Expand Up @@ -512,7 +512,7 @@ fabric providers.

After starting `daos_server`, ranks will be unable to join if their configuration's fabric provider
does not match that of the system. The system configuration is determined by the management service
(MS) leader node, which may be arbitrarily chosen from the configured access points.
(MS) leader node, which may be arbitrarily chosen from the configured MS replicas.

The error message will include the string: `fabric provider <provider1> does not match system provider <provider2>`

Expand Down
2 changes: 1 addition & 1 deletion src/control/cmd/daos_server/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ The control API is responsible for working out which `daos_server` instance
is the MS leader and issuing the request, `dmg` uses the control API.
The `dmg` tool requires the hostlist of all hosts in the DAOS system to be
specified either on the command line or in the `daos_control.yml` config file.
A list of access point servers is defined in the server's config file.
A list of MS replica servers is defined in the server's config file.

## Functionality

Expand Down
2 changes: 2 additions & 0 deletions src/control/cmd/daos_server/auto.go
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,8 @@ func (cmd *configGenCmd) confGen(ctx context.Context, getFabric getFabricFn, get
}
cmd.Debugf("fetched host storage info on localhost: %+v", hs)

cmd.CheckDeprecated(cmd.Logger)

req := new(control.ConfGenerateReq)
if err := convert.Types(cmd, req); err != nil {
return nil, err
Expand Down
Loading

0 comments on commit b020347

Please sign in to comment.