Skip to content

Commit

Permalink
Merge branch 'master' into mjmac/DAOS-8331
Browse files Browse the repository at this point in the history
Features: telemetry
Required-githooks: true

Change-Id: I18754a81a93c9ce055aec0c399c9f8b193db393e
Signed-off-by: Michael MacDonald <[email protected]>
  • Loading branch information
mjmac committed Apr 16, 2024
2 parents 35d0330 + c555ef0 commit e781790
Show file tree
Hide file tree
Showing 85 changed files with 2,057 additions and 1,419 deletions.
3 changes: 2 additions & 1 deletion .github/actions/provision-cluster/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,8 @@ runs:
run: |
. ci/gha_functions.sh
inst_repos="${{ env.CP_PR_REPOS }} ${{ github.event.inputs.pr-repos }}"
if [[ $inst_repos != *daos@* ]]; then
if [ -z "${{ env.CP_RPM_TEST_VERSION }}" ] &&
[[ $inst_repos != *daos@* ]]; then
inst_repos+=" daos@PR-${{ github.event.pull_request.number }}"
inst_repos+=":${{ github.run_number }}"
fi
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/bash_unit_testing.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ defaults:
run:
shell: bash --noprofile --norc -ueo pipefail {0}

permissions: {}

jobs:
Test-gha-functions:
name: Tests in ci/gha_functions.sh
Expand Down
6 changes: 6 additions & 0 deletions .github/workflows/ci2.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,19 @@ concurrency:
group: ci2-${{ github.head_ref }}
cancel-in-progress: true

permissions: {}

jobs:

# reuse the cache from the landing-builds workflow if available, if not then build the images
# from scratch, but do not save them.
Build-and-test:
name: Run DAOS/NLT tests
runs-on: ubuntu-22.04
permissions:
# https://github.com/EnricoMi/publish-unit-test-result-action#permissions
checks: write
pull-requests: write
strategy:
matrix:
distro: [ubuntu]
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/clang-format.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ name: clang-format
on:
pull_request:

permissions: {}

jobs:
pylint:
name: Clang Format
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/create_release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ on:
- master
- 'release/**'

permissions: {}

jobs:
make_release:
name: Create Release
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/doxygen.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ name: Doxygen
on:
pull_request:

permissions: {}

jobs:

Doxygen:
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/flake.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ name: Flake
on:
pull_request:

permissions: {}

jobs:
flake8-lint:
runs-on: ubuntu-22.04
Expand Down
6 changes: 6 additions & 0 deletions .github/workflows/landing-builds.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ on:
- requirements-build.txt
- requirements-utest.txt

permissions: {}

jobs:

# Build a base Docker image, and save it with a key based on the hash of the dependencies, and a
Expand Down Expand Up @@ -86,6 +88,10 @@ jobs:
name: Run DAOS/NLT tests
needs: Prepare
runs-on: ubuntu-22.04
permissions:
# https://github.com/EnricoMi/publish-unit-test-result-action#permissions
checks: write
pull-requests: write
strategy:
matrix:
distro: [ubuntu]
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/linting.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ on:
- 'release/*'
pull_request:

permissions: {}

jobs:
# Run isort on the tree.
# This checks .py files only so misses SConstruct and SConscript files are not checked, rather
Expand Down
76 changes: 76 additions & 0 deletions .github/workflows/ossf-scorecard.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# This workflow uses actions that are not certified by GitHub. They are provided
# by a third-party and are governed by separate terms of service, privacy
# policy, and support documentation.

name: Scorecard supply-chain security
on:
# For Branch-Protection check. Only the default branch is supported. See
# https://github.com/ossf/scorecard/blob/main/docs/checks.md#branch-protection
branch_protection_rule:
# To guarantee Maintained check is occasionally updated. See
# https://github.com/ossf/scorecard/blob/main/docs/checks.md#maintained
schedule:
- cron: '45 8 * * 0'
push:
branches: ["master"]
pull_request:

# Declare default permissions as nothing.
permissions: {}

jobs:
analysis:
name: Scorecard analysis
runs-on: ubuntu-latest
permissions:
# Needed to upload the results to code-scanning dashboard.
security-events: write
# Needed to publish results and get a badge (see publish_results below).
id-token: write
# Uncomment the permissions below if installing in a private repository.
# contents: read
# actions: read

steps:
- name: "Checkout code"
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
with:
persist-credentials: false

- name: "Run analysis"
uses: ossf/scorecard-action@0864cf19026789058feabb7e87baa5f140aac736 # v2.3.1
with:
results_file: results.sarif
results_format: sarif
# (Optional) "write" PAT token. Uncomment the `repo_token` line below if:
# - you want to enable the Branch-Protection check on a *public* repository, or
# - you are installing Scorecard on a *private* repository
# To create the PAT, follow the steps in
# https://github.com/ossf/scorecard-action?tab=readme-ov-file#authentication-with-fine-grained-pat-optional.
# repo_token: ${{ secrets.SCORECARD_TOKEN }}

# Public repositories:
# - Publish results to OpenSSF REST API for easy access by consumers
# - Allows the repository to include the Scorecard badge.
# - See https://github.com/ossf/scorecard-action#publishing-results.
# For private repositories:
# - `publish_results` will always be set to `false`, regardless
# of the value entered here.
publish_results: true

# Upload the results as artifacts (optional). Commenting out will disable
# uploads of run results in SARIF
# format to the repository Actions tab.
- name: "Upload artifact"
uses: actions/upload-artifact@97a0fba1372883ab732affbe8f94b823f91727db # v3.pre.node20
with:
name: SARIF file
path: results.sarif
retention-days: 5

# Upload the results to GitHub's code scanning dashboard (optional).
# Commenting out will disable upload of results to your repo's Code Scanning dashboard
- name: "Upload to code-scanning"
uses: github/codeql-action/upload-sarif@1b1aada464948af03b950897e5eb522f92603cc2 # v3.24.9
with:
sarif_file: results.sarif
3 changes: 3 additions & 0 deletions .github/workflows/pr-metadata.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,13 @@ on:
pull_request_target:
types: [opened, synchronize, reopened, edited]

permissions: {}

jobs:
example_comment_pr:
runs-on: ubuntu-22.04
permissions:
pull-requests: write
name: Report Jira data to PR comment
steps:
- name: Checkout
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/pylint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@ name: Pylint
on:
pull_request:

permissions: {}

jobs:
pylint:
name: Pylint check
Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/rpm-build-and-test-report.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,14 @@ on:
# for testing before landing
workflow_dispatch:

permissions:
contents: read
actions: read
checks: write
permissions: {}

jobs:
report-vm-1:
runs-on: [self-hosted, docker]
# https://github.com/dorny/test-reporter/issues/149
permissions:
checks: write
strategy:
matrix:
# TODO: figure out how to determine this matrix
Expand Down
16 changes: 9 additions & 7 deletions .github/workflows/rpm-build-and-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,13 +26,7 @@ defaults:
run:
shell: bash --noprofile --norc -ueo pipefail {0}

# https://github.com/dorny/test-reporter/issues/149
permissions:
id-token: write
contents: read
checks: write
# https://github.com/EnricoMi/publish-unit-test-result-action#permissions
pull-requests: write
permissions: {}

jobs:
# it's a real shame that this step is even needed. push events have the commit message # in
Expand Down Expand Up @@ -363,6 +357,10 @@ jobs:
Functional:
name: Functional Testing
runs-on: [self-hosted, wolf]
permissions:
# https://github.com/EnricoMi/publish-unit-test-result-action#permissions
checks: write
pull-requests: write
timeout-minutes: 7200
needs: [Build-RPM, Import-commit-message, Calc-functional-matrix, Import-commit-pragmas]
strategy:
Expand Down Expand Up @@ -594,6 +592,10 @@ jobs:
Functional_Hardware:
name: Functional Testing on Hardware
runs-on: [self-hosted, wolf]
permissions:
# https://github.com/EnricoMi/publish-unit-test-result-action#permissions
checks: write
pull-requests: write
timeout-minutes: 7200
needs: [Import-commit-message, Build-RPM, Calc-functional-hardware-matrix,
Import-commit-pragmas, Functional]
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/spelling.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ name: Codespell
on:
pull_request:

permissions: {}

jobs:

Codespell:
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/version-checks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ on:
paths:
- 'utils/cq/requirements.txt'

permissions: {}

jobs:
upgrade-check:
name: Check for updates
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/yaml.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ on:
- '**/*.yml'
- utils/cq/requirements.txt

permissions: {}

jobs:
yaml-lint:
runs-on: ubuntu-22.04
Expand Down
3 changes: 2 additions & 1 deletion Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -1017,7 +1017,8 @@ pipeline {
post {
always {
discoverGitReferenceBuild referenceJob: 'daos-stack/daos/master',
scm: 'daos-stack/daos'
scm: 'daos-stack/daos',
requiredResult: hudson.model.Result.UNSTABLE
recordIssues enabledForFailure: true,
failOnError: false,
ignoreQualityGate: true,
Expand Down
7 changes: 5 additions & 2 deletions docs/QSG/setup_rhel.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,16 +127,19 @@ used by DAOS and NVME SSDs will be identified.
pmem0 0 3.2 TB
pmem1 0 3.2 TB

4. Scan the available storage on the Server nodes:
4. Scan the available nvme storage on the Server nodes:

daos_server storage scan
daos_server nvme scan
Scanning locally-attached storage\...

NVMe PCI Model FW Revision Socket ID Capacity
-------- ----- ----------- --------- --------
0000:81:00.0 INTEL SSDPE2KE016T8 VDV10170 0 1.6 TB
0000:83:00.0 INTEL SSDPE2KE016T8 VDV10170 1 1.6 TB

5. Scan the available scm storage on the Server nodes:

daos_server scm scan
SCM Namespace Socket ID Capacity
------------- --------- --------
pmem0 0 3.2 TB
Expand Down
7 changes: 5 additions & 2 deletions docs/QSG/setup_suse.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,16 +148,19 @@ used by DAOS and NVME SSDs will be identified.
pmem0 0 3.2 TB
pmem1 0 3.2 TB

4. Scan the available storage on the Server nodes:
4. Scan the available nvme storage on the Server nodes:

daos_server storage scan
daos_server nvme scan
Scanning locally-attached storage\...

NVMe PCI Model FW Revision Socket ID Capacity
-------- ----- ----------- --------- --------
0000:81:00.0 INTEL SSDPE2KE016T8 VDV10170 0 1.6 TB
0000:83:00.0 INTEL SSDPE2KE016T8 VDV10170 1 1.6 TB

5. Scan the available scm storage on the Server nodes:

daos_server scm scan
SCM Namespace Socket ID Capacity
------------- --------- --------
pmem0 0 3.2 TB
Expand Down
4 changes: 2 additions & 2 deletions docs/admin/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -308,7 +308,7 @@ sudo ipcrm -M 0x10242049

!!! note
A server must be started with minimum setup.
You can also obtain the addresses with `daos_server storage scan`.
You can also obtain the addresses with `daos_server nvme scan`.

1. Format the SCMs defined in the config file.
1. Generate the config file using `dmg config generate`. The various requirements will be populated without a syntax error.
Expand All @@ -325,7 +325,7 @@ sudo ipcrm -M 0x10242049
### Problems creating a container
1. Check that the path to daos is your intended binary. It's usually `/usr/bin/daos`.
1. When the server configuration is changed, it's necessary to restart the agent.
1. `DER_UNREACH(-1006)`: Check the socket ID consistency between PMem and NVMe. First, determine which socket you're using with `daos_server network scan -p all`. e.g., if the interface you're using in the engine section is eth0, find which NUMA Socket it belongs to. Next, determine the disks you can use with this socket by calling `daos_server storage scan` or `dmg storage scan`. e.g., if eth0 belongs to NUMA Socket 0, use only the disks with 0 in the Socket ID column.
1. `DER_UNREACH(-1006)`: Check the socket ID consistency between PMem and NVMe. First, determine which socket you're using with `daos_server network scan -p all`. e.g., if the interface you're using in the engine section is eth0, find which NUMA Socket it belongs to. Next, determine the disks you can use with this socket by calling `daos_server nvme scan` or `dmg storage scan`. e.g., if eth0 belongs to NUMA Socket 0, use only the disks with 0 in the Socket ID column.
1. Check the interface used in the server config (`fabric_iface`) also exists in the client and can communicate with the server.
1. Check the access_points of the agent config points to the correct server host.
1. Call `daos pool query` and check that the pool exists and has free space.
Expand Down
10 changes: 5 additions & 5 deletions docs/admin/vmd.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ non-VMD setup to VMD is not possible without reformatting the DAOS storage.

The following is an example of the `lspci` view on a server with eight
NVMe SSDs, when VMD is _disabled_. This is the status when the devices are
still bound to the kernel (before running `daos_server storage prepare –n`):
still bound to the kernel (before running `daos_server nvme prepare`):

```bash
[root@nvm0806 ~]# lspci -vv | grep -i nvme
Expand All @@ -54,7 +54,7 @@ e5:00.0 Non-Volatile memory controller: Intel Corporation NVMe DC SSD [3DNAND, S
e6:00.0 Non-Volatile memory controller: Intel Corporation NVMe DC SSD [3DNAND, Sentinel Rock Controller] (prog-if 02 [NVM Express])
```

After running `daos_server storage prepare -n`, the NVMe SSDs are bound
After running `daos_server nvme prepare`, the NVMe SSDs are bound
to SPDK, and `lspci` or `nvme list` no longer show them.


Expand Down Expand Up @@ -101,7 +101,7 @@ additional NVMe drive slots, but those slots are not populated with NVMe SSDs.

## NVMe view with VMD enabled (after binding to SPDK)

After `daos_server storage prepare -n` has been run on a VMD-enabled DAOS server,
After `daos_server nvme prepare` has been run on a VMD-enabled DAOS server,
the NVMe disks are unbound from the Linux kernel and no longer show up in `lspci` or
`nvme list` (just like in the non-VMD case).
However, the VMD controller devices are still visible with `lspci`:
Expand All @@ -116,8 +116,8 @@ However, the VMD controller devices are still visible with `lspci`:
The VMD-managed NVMe backing devices now show up in the DAOS storage scan, with their VMD IDs:

```bash
[root@nvm0806 ~]# daos_server storage scan
Scanning locally-attached storage...
[root@nvm0806 ~]# daos_server nvme scan
Scanning locally-attached NVMe storage...
NVMe PCI Model FW Revision Socket ID Capacity
-------- ----- ----------- --------- --------
640005:81:00.0 SSDPF2KX038T9L 2CV1L028 0 3.8 TB
Expand Down
Loading

0 comments on commit e781790

Please sign in to comment.