From 32dde7653c7d76274b7db46effca4779c0c2c50a Mon Sep 17 00:00:00 2001 From: Michael MacDonald Date: Tue, 3 Dec 2024 22:37:33 +0000 Subject: [PATCH] doc updates Required-githooks: true Signed-off-by: Michael MacDonald --- docs/QSG/qemu-vms.md | 4 ++-- docs/QSG/setup_rhel.md | 2 +- docs/QSG/setup_suse.md | 2 +- docs/QSG/tour.md | 2 +- docs/admin/administration.md | 20 +++++++++++--------- docs/admin/common_tasks.md | 4 ++-- docs/admin/deployment.md | 22 ++++++++++------------ docs/admin/troubleshooting.md | 6 +++--- 8 files changed, 31 insertions(+), 31 deletions(-) diff --git a/docs/QSG/qemu-vms.md b/docs/QSG/qemu-vms.md index 108d89295d9..3531a82fb3a 100644 --- a/docs/QSG/qemu-vms.md +++ b/docs/QSG/qemu-vms.md @@ -199,10 +199,10 @@ I follow these [steps](https://docs.daos.io/latest/QSG/setup_rhel/) to install b 5. Update config files. -Update the daos-server config file `/etc/daos/daos_server.yml` on daos-server. You may need to update "access\_points", "fabric\_iface" and "bdev\_list". Update "access\_points" accordingly if you name daos-server differently. Check if the network device has the same name as listed under "fabric\_iface". Look in the output of `lspci` for "bdev\_list". The info for our NVMe controller is like *??:??:? Non-Volatile memory controller: Red Hat, Inc. QEMU NVM Express Controller (rev 02)*. Prefix *??:??.?* is the address of the NVMe devices. +Update the daos-server config file `/etc/daos/daos_server.yml` on daos-server. You may need to update "mgmt\_svc\_replicas", "fabric\_iface" and "bdev\_list". Update "mgmt\_svc\_replicas" accordingly if you name daos-server differently. Check if the network device has the same name as listed under "fabric\_iface". Look in the output of `lspci` for "bdev\_list". The info for our NVMe controller is like *??:??:? Non-Volatile memory controller: Red Hat, Inc. QEMU NVM Express Controller (rev 02)*. Prefix *??:??.?* is the address of the NVMe devices. ``` name: daos_server -access_points: +mgmt_svc_replicas: - daos-server port: 10001 diff --git a/docs/QSG/setup_rhel.md b/docs/QSG/setup_rhel.md index 9ed73ce8ec8..dbe855ae762 100644 --- a/docs/QSG/setup_rhel.md +++ b/docs/QSG/setup_rhel.md @@ -273,7 +273,7 @@ Examples are available on [github](https://github.com/daos-stack/daos/tree/maste name: daos_server - access_points: + mgmt_svc_replicas: - server-1 port: 10001 diff --git a/docs/QSG/setup_suse.md b/docs/QSG/setup_suse.md index c05b30945ac..32500dfe4fb 100644 --- a/docs/QSG/setup_suse.md +++ b/docs/QSG/setup_suse.md @@ -292,7 +292,7 @@ Examples are available on [github](https://github.com/daos-stack/daos/tree/maste An example of the daos_server.yml is presented below. Copy the modified server yaml file to all the server nodes at `/etc/daos/daos_server.yml`. name: daos_server - access_points: + mgmt_svc_replicas: - node-4 port: 10001 diff --git a/docs/QSG/tour.md b/docs/QSG/tour.md index e3254ac25cd..78dd290cc03 100644 --- a/docs/QSG/tour.md +++ b/docs/QSG/tour.md @@ -223,7 +223,7 @@ bring-up DAOS servers and clients. ### Run dfuse # Bring up 4 hosts server with appropriate daos_server.yml and - # access-point, reference to DAOS Set-Up + # MS replicas, reference to DAOS Set-Up # After DAOS servers, DAOS admin and client started. $ dmg storage format diff --git a/docs/admin/administration.md b/docs/admin/administration.md index 9859baebd2a..c472a9996c2 100644 --- a/docs/admin/administration.md +++ b/docs/admin/administration.md @@ -825,9 +825,9 @@ device would remain in this state until replaced by a new device. ## System Operations -The DAOS server acting as the access point records details of engines -that join the DAOS system. Once an engine has joined the DAOS system, it is -identified by a unique system "rank". Multiple ranks can reside on the same +The DAOS server acting as the Management Service (MS) leader records details +of engines that join the DAOS system. Once an engine has joined the DAOS system, +it is identified by a unique system "rank". Multiple ranks can reside on the same host machine, accessible via the same network address. A DAOS system can be shutdown and restarted to perform maintenance and/or @@ -837,14 +837,14 @@ made to the rank's metadata stored on persistent memory. Storage reformat can also be performed after system shutdown. Pools will be removed and storage wiped. -System commands will be handled by a DAOS Server acting as access point and +System commands will be handled by a DAOS Server acting as the MS leader and listening on the address specified in the DMG config file "hostlist" parameter. See [`daos_control.yml`](https://github.com/daos-stack/daos/blob/master/utils/config/daos_control.yml) for details. At least one of the addresses in the hostlist parameters should match one of the -"access point" addresses specified in the server config file +`mgmt_svc_replicas` addresses specified in the server config file [`daos_server.yml`](https://github.com/daos-stack/daos/blob/master/utils/config/daos_server.yml) that is supplied when starting `daos_server` instances. @@ -1028,13 +1028,15 @@ formatted again by running `dmg storage format`. To add a new server to an existing DAOS system, one should install: -- the relevant certificates -- the server yaml file pointing to the access points of the running - DAOS system +- A copy of the relevant certificates from an existing server. All servers must + share the same set of certificate in order to provide services. +- A copy of the server yaml file from an existing server (DAOS server configurations + should be homegenous) -- the `mgmt_svc_replicas` entry is used by the new server in + order to know which servers will handle its SystemJoin request. The daos\_control.yml file should also be updated to include the new DAOS server. -Then starts the daos\_server via systemd and format the new server via +Then start the daos\_server via systemd and format the new server via dmg as follows: ``` diff --git a/docs/admin/common_tasks.md b/docs/admin/common_tasks.md index 065edd13d11..fef3dbe9e60 100644 --- a/docs/admin/common_tasks.md +++ b/docs/admin/common_tasks.md @@ -9,7 +9,7 @@ This section describes some of the common tasks handled by admins at a high leve 3. Install `daos-server` and `daos-client` RPMs. 4. Generate certificate files. 5. Copy one of the example configs from `utils/config/examples` to -`/etc/daos` and adjust it based on the environment. E.g., `access_points`, +`/etc/daos` and adjust it based on the environment. E.g., `mgmt_svc_replicas`, `class`. 6. Check that the directory where the log files will be created exists. E.g., `control_log_file`, `log_file` field in `engines` section. @@ -38,7 +38,7 @@ to server hosts and `daos-client` to client hosts. 4. Generate certificate files and distribute them to all the hosts. 5. Copy one of the example configs from `utils/config/examples` to `/etc/daos` of one of the server hosts and adjust it based on the environment. -E.g., `access_points`, `class`. +E.g., `mgmt_svc_replicas`, `class`. 6. Check that the directory where the log files will be created exists. E.g., `control_log_file`, `log_file` field in `engines` section. 7. Start `daos_server`. diff --git a/docs/admin/deployment.md b/docs/admin/deployment.md index 0f5c128b1f4..c131c6020ce 100644 --- a/docs/admin/deployment.md +++ b/docs/admin/deployment.md @@ -50,7 +50,7 @@ A recommended workflow to get up and running is as follows: server config file (default location at `/etc/daos/daos_server.yml`) has not yet been populated. -* Run `dmg config generate -l -a ` across the entire +* Run `dmg config generate -l -r ` across the entire hostset (all the storage servers that are now running the `daos_server` service after RPM install). The command will only generate a config if hardware setups on all the hosts are @@ -285,7 +285,7 @@ Help Options: [generate command options] -l, --helper-log-file= Log file location for debug from daos_server_helper binary - -a, --access-points= Comma separated list of access point addresses + -r, --ms-replicas= Comma separated list of MS replica addresses (default: localhost) -e, --num-engines= Set the number of DAOS Engine sections to be populated in the config file output. If unset then the value will be set to the @@ -331,7 +331,7 @@ Help Options: [generate command options] -l, --host-list= A comma separated list of addresses to connect to - -a, --access-points= Comma separated list of access point addresses + -r, --ms-replicas= Comma separated list of MS replica addresses to host management service (default: localhost) -e, --num-engines= Set the number of DAOS Engine sections to be populated in the config file output. If unset then the value will be set to the @@ -371,8 +371,8 @@ engines: The options that can be supplied to the config generate command are as follows: -- `--access-points` specifies the access points (identified storage servers that will host the -management service for the DAOS system across the cluster). +- `--ms-replicas` specifies the MS replicas (identified storage servers that will host the +Management Service for the DAOS system across the cluster). - `--num-engines` specifies the number of engine sections to populate in the config file output. If not set explicitly on the commandline, default is the number of NUMA nodes detected on the host. @@ -502,7 +502,7 @@ core_dump_filter: 19 name: daos_server socket_dir: /var/run/daos_server provider: ofi+tcp -access_points: +mgmt_svc_replicas: - localhost:10001 fault_cb: "" hyperthreads: false @@ -515,7 +515,7 @@ and runs until the point where a storage format is required, as expected. [user@wolf-226 daos]$ install/bin/daos_server start -i -o ~/configs/tmp.yml DAOS Server config loaded from /home/user/configs/tmp.yml install/bin/daos_server logging to file /tmp/daos_server.log -NOTICE: Configuration includes only one access point. This provides no redundancy in the event of an access point failure. +NOTICE: Configuration includes only one MS replica. This provides no redundancy in the event of a MS replica failure. DAOS Control Server v2.3.101 (pid 1211553) listening on 127.0.0.1:10001 Checking DAOS I/O Engine instance 0 storage ... Checking DAOS I/O Engine instance 1 storage ... @@ -821,8 +821,6 @@ To set the addresses of which DAOS Servers to task, provide either: Where `` represents a slurm-style hostlist string e.g. `foo-1[28-63],bar[256-511]`. -The first entry in the hostlist (after alphabetic then numeric sorting) will be -assumed to be the access point as set in the server configuration file. Local configuration files stored in the user directory will be used in preference to the default location e.g. `~/.daos_control.yml`. @@ -1322,7 +1320,7 @@ as follows to establish 2-tier storage: ```yaml port: 10001 -access_points: ["wolf-71"] # <----- updated +mgmt_svc_replicas: ["wolf-71"] # <----- updated engines: - @@ -1367,8 +1365,8 @@ information, please refer to the [DAOS build documentation][6]. DAOS Control Servers will need to be restarted on all hosts after updates to the server configuration file. - Pick an odd number of hosts in the system and set `access_points` to list of that host's - hostname or IP address (don't need to specify port). + Pick an odd number of hosts in the system and set `mgmt_svc_replicas` to list of + that host's hostname or IP address (don't need to specify port). This will be the host which bootstraps the DAOS management service (MS). diff --git a/docs/admin/troubleshooting.md b/docs/admin/troubleshooting.md index 79173e782f7..5de7b95412d 100644 --- a/docs/admin/troubleshooting.md +++ b/docs/admin/troubleshooting.md @@ -313,7 +313,7 @@ sudo ipcrm -M 0x10242049 1. Format the SCMs defined in the config file. 1. Generate the config file using `dmg config generate`. The various requirements will be populated without a syntax error. 1. Try starting with `allow_insecure: true`. This will rule out the credential certificate issue. -1. Verify that the `access_points` host is accessible and the port is not used. +1. Verify that the `mgmt_svc_replicas` host is accessible and the port is not used. 1. Check the `provider` entry. See the "Network Scan and Configuration" section of the admin guide for determining the right provider to use. 1. Check `fabric_iface` in `engines`. They should be available and enabled. 1. Check that `socket_dir` is writable by the daos_server. @@ -327,7 +327,7 @@ sudo ipcrm -M 0x10242049 1. When the server configuration is changed, it's necessary to restart the agent. 1. `DER_UNREACH(-1006)`: Check the socket ID consistency between PMem and NVMe. First, determine which socket you're using with `daos_server network scan -p all`. e.g., if the interface you're using in the engine section is eth0, find which NUMA Socket it belongs to. Next, determine the disks you can use with this socket by calling `daos_server nvme scan` or `dmg storage scan`. e.g., if eth0 belongs to NUMA Socket 0, use only the disks with 0 in the Socket ID column. 1. Check the interface used in the server config (`fabric_iface`) also exists in the client and can communicate with the server. -1. Check the access_points of the agent config points to the correct server host. +1. Check the `access_points` of the agent config points to the correct server hosts. 1. Call `daos pool query` and check that the pool exists and has free space. ### Applications run slow @@ -512,7 +512,7 @@ fabric providers. After starting `daos_server`, ranks will be unable to join if their configuration's fabric provider does not match that of the system. The system configuration is determined by the management service -(MS) leader node, which may be arbitrarily chosen from the configured access points. +(MS) leader node, which may be arbitrarily chosen from the configured MS replicas. The error message will include the string: `fabric provider does not match system provider `