Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/master' into amd/scons-methods
Browse files Browse the repository at this point in the history
  • Loading branch information
ashleypittman committed Oct 26, 2022
2 parents d0289d4 + 3cd2643 commit 4be9ce5
Show file tree
Hide file tree
Showing 20 changed files with 288 additions and 61 deletions.
2 changes: 1 addition & 1 deletion Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -252,7 +252,7 @@ pipeline {
env.COMMIT_MESSAGE.split('\n').each { line ->
String key, value
try {
(key, value) = line.split(':')
(key, value) = line.split(':', 2)
if (key.contains(' ')) {
return
}
Expand Down
7 changes: 7 additions & 0 deletions docs/admin/pool_operations.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,13 @@ The capacity of the pool can be specified in three different ways:
So in the first example above, specifying `--scm-size=256GB`
would fail as 256 GB is smaller than the minimum 256 GiB.

!!! warning
Concurrent creation of pools using **size percentage** could lead to
`ENOSPACE` errors. Indeed, these operations are not atomic and the overall
available size retrieved in the first step could be different from the size
actually available when the second step will be performed (i.e. allocation
of space for the pool).

Examples:

To create a pool labeled `tank`:
Expand Down
156 changes: 156 additions & 0 deletions docs/admin/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -971,6 +971,162 @@ To configure a Syslog daemon to resolve the delivery errors and receive messages
consult the relevant operating system specific documentation for installing and/or enabling a syslog
server package e.g. 'rsyslog'.

## Tools to debug connectivity issues across nodes

### ifconfig
```
$ ifconfig
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 127 bytes 9664 (9.4 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 127 bytes 9664 (9.4 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
inet 10.165.192.121 netmask 255.255.255.128 broadcast 10.165.192.127
inet6 fe80::9a03:9bff:fea2:9716 prefixlen 64 scopeid 0x20<link>
ether 98:03:9b:a2:97:16 txqueuelen 1000 (Ethernet)
RX packets 2347 bytes 766600 (748.6 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 61 bytes 4156 (4.0 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
inet 10.165.192.122 netmask 255.255.255.128 broadcast 10.165.192.127
inet6 fe80::9a03:9bff:fea2:967e prefixlen 64 scopeid 0x20<link>
ether 98:03:9b:a2:96:7e txqueuelen 1000 (Ethernet)
RX packets 2346 bytes 766272 (748.3 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 61 bytes 4156 (4.0 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
```
You can get the ip and network interface card (NIC) name with ifconfig. Important: Please run ifconfig on both DAOS server and client nodes to make sure mtu size are same for the network interfaces on different nodes. Mismatched mtu size could lead to DAOS hang on RDMA over converged Ethernet (RoCE) interfaces.

### lstopo-no-graphics
```
$ lstopo-no-graphics
...
HostBridge
PCIBridge
PCI 18:00.0 (Ethernet)
Net "eth0"
OpenFabrics "mlx5_0"
...
HostBridge
PCIBridge
PCI af:00.0 (Ethernet)
Net "eth1"
OpenFabrics "mlx5_1"
...
```
You can get the domain name and numa node information of your NICs.
In case lstopo-no-graphics in not installed, you can install package "hwloc" with yum/dnf or other package managers.

### ping
```
client_node $ ping -c 3 -I eth1 10.165.192.121
PING 10.165.192.121 (10.165.192.121) from 10.165.192.2 ens102: 56(84) bytes of data.
64 bytes from 10.165.192.121: icmp_seq=1 ttl=64 time=0.177 ms
64 bytes from 10.165.192.121: icmp_seq=2 ttl=64 time=0.120 ms
64 bytes from 10.165.192.121: icmp_seq=3 ttl=64 time=0.083 ms
```
Make sure ping can reach the NIC your DAOS server is bound to.

### fi_pingpong
```
server_node $ fi_pingpong -p "tcp;ofi_rxm" -e rdm -d eth0
client_node $ fi_pingpong -p "tcp;ofi_rxm" -e rdm -d eth0 ip_of_eth0_server
bytes #sent #ack total time MB/sec usec/xfer Mxfers/sec
64 10 =10 1.2k 0.03s 0.05 1378.30 0.00
256 10 =10 5k 0.00s 22.26 11.50 0.09
1k 10 =10 20k 0.00s 89.04 11.50 0.09
4k 10 =10 80k 0.00s 320.00 12.80 0.08
64k 10 =10 1.2m 0.01s 154.89 423.10 0.00
1m 10 =10 20m 0.01s 2659.00 394.35 0.00
Make sure communications with tcp can go through.
server_node $ fi_pingpong -p "tcp;ofi_rxm" -e rdm -d eth0
client_node $ fi_pingpong -p "tcp;ofi_rxm" -e rdm -d eth0 ip_of_eth0_server
Make sure communications with verbs can go through.
server_node $ fi_pingpong -p "verbs;ofi_rxm" -e rdm -d mlx5_0
client_node $ fi_pingpong -p "verbs;ofi_rxm" -e rdm -d mlx5_0 ip_of_mlx5_0_server
```
### ib_send_lat
```
server_node $ ib_send_lat -d mlx5_0 -s 16384 -D 3
client_node $ ib_send_lat -d mlx5_0 -s 16384 -D 3 ip_of_server
```
This test checks whether verbs goes through with Infiniband or RoCE cards. In case ib_send_lat in not installed, you can install package "perftest" with yum/dnf or other package managers.

## Tools to measure the network latency and bandwidth across nodes

### The tools in perftest for Infiniband and RoCE
You can install package "perftest" with yum/dnf or other package managers if it is not available.

Examples for measuring bandwidth,
```
ib_read_bw -a
ib_read_bw -a 192.168.1.46
ib_write_bw -a
ib_write_bw -a 192.168.1.46
ib_send_bw -a
ib_send_bw -a 192.168.1.46
```
Examples for measuring latency,
```
ib_read_lat -a
ib_read_lat -a 192.168.1.46
ib_write_lat -a
ib_write_lat -a 192.168.1.46
ib_send_lat -a
ib_send_lat -a 192.168.1.46
```

### fi_pingpong for Ethernet
You can install package "libfabric" with yum/dnf or other package managers if it is not available.

Example,
```
server_node $ fi_pingpong -p "tcp;ofi_rxm" -e rdm -d eth0 -I 1000
client_node $ fi_pingpong -p "tcp;ofi_rxm" -e rdm -d eth0 -I 1000 ip_of_eth0_server
```
This reports network bandwidth. One can deduce the latency for given packet size.

## Tools to diagnose network issues for a large cluster

### [Intel CLuster Checker](https://www.intel.com/content/www/us/en/developer/tools/oneapi/cluster-checker.html)
This suite contains multiple useful tools including network_time_uniformity to debug network issue.

### [mpi-benchmarks](https://github.com/intel/mpi-benchmarks)
Tools like IMB-P2P, IMB-MPI1, and IMB-RMA are helpful for the sanity check of the latency and bandwidth.
```
$ for((i=1;i<=65536;i*=4)); do echo "$i"; done &> msglen
$ mpirun -np 4 -f hostlist ./IMB-P2P -msglen msglen PingPong
#----------------------------------------------------------------
# Benchmarking PingPong
# #processes = 4
#----------------------------------------------------------------
#bytes #repetitions t[usec] Mbytes/sec Msg/sec
1 100000 24.50 0.08 81627
4 100000 24.50 0.33 81631
16 100000 24.50 1.31 81629
64 100000 24.50 5.22 81631
256 100000 24.60 20.73 80983
1024 100000 49.50 41.37 40404
4096 100000 224.05 36.43 8894
16384 51200 230.22 141.65 8646
65536 12800 741.47 176.58 2694
```

## Bug Report

Bugs should be reported through our [issue tracker](https://jira.daos.io/)
Expand Down
25 changes: 16 additions & 9 deletions src/client/dfs/dfs.c
Original file line number Diff line number Diff line change
Expand Up @@ -478,10 +478,10 @@ dfs_suggest_oclass(dfs_t *dfs, const char *hint, daos_oclass_id_t *cid)
if (rc)
D_GOTO(out, rc);

*cid = daos_obj_get_oclass(dfs->coh, type, obj_hint, 0);
if (*cid < 0) {
D_ERROR("Failed to generate object class from hints %s\n", hint);
return EINVAL;
rc = daos_obj_get_oclass(dfs->coh, type, obj_hint, 0, cid);
if (rc) {
D_ERROR("daos_obj_get_oclass() failed "DF_RC"\n", DP_RC(rc));
return daos_der2errno(rc);
}
out:
D_FREE(local);
Expand Down Expand Up @@ -1789,8 +1789,10 @@ dfs_cont_create(daos_handle_t poh, uuid_t *cuuid, dfs_attr_t *attr,
else
dattr.da_chunk_size = DFS_DEFAULT_CHUNK_SIZE;

if (attr->da_hints[0] != 0)
if (attr->da_hints[0] != 0) {
strncpy(dattr.da_hints, attr->da_hints, DAOS_CONT_HINT_MAX_LEN);
dattr.da_hints[DAOS_CONT_HINT_MAX_LEN - 1] = '\0';
}
} else {
dattr.da_oclass_id = 0;
dattr.da_dir_oclass_id = 0;
Expand Down Expand Up @@ -2988,12 +2990,17 @@ dfs_obj_get_info(dfs_t *dfs, dfs_obj_t *obj, dfs_obj_info_t *info)

switch (obj->mode & S_IFMT) {
case S_IFDIR:
if (obj->d.oclass)
if (obj->d.oclass) {
info->doi_oclass_id = obj->d.oclass;
else if (dfs->attr.da_dir_oclass_id)
} else if (dfs->attr.da_dir_oclass_id) {
info->doi_oclass_id = dfs->attr.da_dir_oclass_id;
else
info->doi_oclass_id = daos_obj_get_oclass(dfs->coh, 0, 0, 0);
} else {
rc = daos_obj_get_oclass(dfs->coh, 0, 0, 0, &info->doi_oclass_id);
if (rc) {
D_ERROR("daos_obj_get_oclass() failed "DF_RC"\n", DP_RC(rc));
return daos_der2errno(rc);
}
}

if (obj->d.chunk_size)
info->doi_chunk_size = obj->d.chunk_size;
Expand Down
4 changes: 3 additions & 1 deletion src/client/dfs/duns.c
Original file line number Diff line number Diff line change
Expand Up @@ -674,8 +674,10 @@ create_cont(daos_handle_t poh, struct duns_attr_t *attrp, bool create_with_label
dfs_attr.da_dir_oclass_id = attrp->da_dir_oclass_id;
dfs_attr.da_chunk_size = attrp->da_chunk_size;
dfs_attr.da_props = attrp->da_props;
if (attrp->da_hints[0] != 0)
if (attrp->da_hints[0] != 0) {
strncpy(dfs_attr.da_hints, attrp->da_hints, DAOS_CONT_HINT_MAX_LEN);
dfs_attr.da_hints[DAOS_CONT_HINT_MAX_LEN - 1] = '\0';
}
if (create_with_label)
rc = dfs_cont_create_with_label(poh, attrp->da_cont, &dfs_attr,
&attrp->da_cuuid, NULL, NULL);
Expand Down
9 changes: 9 additions & 0 deletions src/control/lib/control/pool.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ import (

"github.com/daos-stack/daos/src/control/common/proto/convert"
mgmtpb "github.com/daos-stack/daos/src/control/common/proto/mgmt"
"github.com/daos-stack/daos/src/control/fault"
"github.com/daos-stack/daos/src/control/fault/code"
"github.com/daos-stack/daos/src/control/lib/daos"
"github.com/daos-stack/daos/src/control/lib/ranklist"
"github.com/daos-stack/daos/src/control/logging"
Expand Down Expand Up @@ -205,6 +207,13 @@ func (r *poolRequest) canRetry(reqErr error, try uint) bool {
default:
return false
}
case *fault.Fault:
switch e.Code {
case code.ServerDataPlaneNotStarted:
return true
default:
return false
}
default:
return false
}
Expand Down
23 changes: 23 additions & 0 deletions src/control/lib/control/pool_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ import (

mgmtpb "github.com/daos-stack/daos/src/control/common/proto/mgmt"
"github.com/daos-stack/daos/src/control/common/test"
"github.com/daos-stack/daos/src/control/fault"
"github.com/daos-stack/daos/src/control/fault/code"
"github.com/daos-stack/daos/src/control/lib/daos"
"github.com/daos-stack/daos/src/control/lib/ranklist"
"github.com/daos-stack/daos/src/control/logging"
Expand Down Expand Up @@ -86,6 +88,17 @@ func TestControl_PoolDestroy(t *testing.T) {
},
},
},
"DataPlaneNotStarted error is retried": {
req: &PoolDestroyReq{
ID: test.MockUUID(),
},
mic: &MockInvokerConfig{
UnaryResponseSet: []*UnaryResponse{
MockMSResponse("host1", &fault.Fault{Code: code.ServerDataPlaneNotStarted}, nil),
MockMSResponse("host1", nil, &mgmtpb.PoolDestroyResp{}),
},
},
},
"success": {
req: &PoolDestroyReq{
ID: test.MockUUID(),
Expand Down Expand Up @@ -397,6 +410,16 @@ func TestControl_PoolCreate(t *testing.T) {
},
expResp: &PoolCreateResp{},
},
"create DataPlaneNotStarted error is retried": {
req: &PoolCreateReq{TotalBytes: 10},
mic: &MockInvokerConfig{
UnaryResponseSet: []*UnaryResponse{
MockMSResponse("host1", &fault.Fault{Code: code.ServerDataPlaneNotStarted}, nil),
MockMSResponse("host1", nil, &mgmtpb.PoolCreateResp{}),
},
},
expResp: &PoolCreateResp{},
},
"success": {
req: &PoolCreateReq{
TotalBytes: 10,
Expand Down
2 changes: 1 addition & 1 deletion src/control/server/mgmt_pool.go
Original file line number Diff line number Diff line change
Expand Up @@ -404,7 +404,7 @@ func (svc *mgmtSvc) PoolCreate(ctx context.Context, req *mgmtpb.PoolCreateReq) (
}

switch errors.Cause(err) {
case errInstanceNotReady, FaultDataPlaneNotStarted:
case errInstanceNotReady:
// If the pool create failed because there was no available instance
// to service the request, signal to the client that it should try again.
resp.Status = int32(daos.TryAgain)
Expand Down
4 changes: 2 additions & 2 deletions src/include/daos/object.h
Original file line number Diff line number Diff line change
Expand Up @@ -237,8 +237,8 @@ unsigned int daos_oclass_grp_nr(struct daos_oclass_attr *oc_attr,
int daos_oclass_fit_max(daos_oclass_id_t oc_id, int domain_nr, int target_nr,
enum daos_obj_redun *ord, uint32_t *nr);
bool daos_oclass_is_valid(daos_oclass_id_t oc_id);
daos_oclass_id_t daos_obj_get_oclass(daos_handle_t coh, enum daos_otype_t type,
daos_oclass_hints_t hints, uint32_t args);
int daos_obj_get_oclass(daos_handle_t coh, enum daos_otype_t type, daos_oclass_hints_t hints,
uint32_t args, daos_oclass_id_t *cid);
#define daos_oclass_grp_off_by_shard(oca, shard) \
(rounddown(shard, daos_oclass_grp_size(oca)))

Expand Down
Loading

0 comments on commit 4be9ce5

Please sign in to comment.