Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-16265 test: Split erasurecode/multiple_failure.py (#15355) #15369

Merged
merged 1 commit into from
Dec 11, 2024

Conversation

phender
Copy link
Contributor

@phender phender commented Oct 22, 2024

Split the erasurecode/multiple_failure.py into two separate tests to reduce the possibility of a large number of ERR messages in the server log file from preventing other test variants from failing dure to out of space errors.

Skip-unit-tests: true
Skip-fault-injection-test: true
Test-tag: EcodFioRebuild EcodOnlineMultFail
Skip-func-hw-test-large-md-on-ssd: false

Before requesting gatekeeper:

  • Two review approvals and any prior change requests have been resolved.
  • Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
  • Features: (or Test-tag*) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.
  • Commit messages follows the guidelines outlined here.
  • Any tests skipped by the ticket being addressed have been run and passed in the PR.

Gatekeeper:

  • You are the appropriate gatekeeper to be landing the patch.
  • The PR has 2 reviews by people familiar with the code, including appropriate owners.
  • Githooks were used. If not, request that user install them and check copyright dates.
  • Checkpatch issues are resolved. Pay particular attention to ones that will show up on future PRs.
  • All builds have passed. Check non-required builds for any new compiler warnings.
  • Sufficient testing is done. Check feature pragmas and test tags and that tests skipped for the ticket are run and now pass with the changes.
  • If applicable, the PR has addressed any potential version compatibility issues.
  • Check the target branch. If it is master branch, should the PR go to a feature branch? If it is a release branch, does it have merge approval in the JIRA ticket.
  • Extra checks if forced landing is requested
    • Review comments are sufficiently resolved, particularly by prior reviewers that requested changes.
    • No new NLT or valgrind warnings. Check the classic view.
    • Quick-build or Quick-functional is not used.
  • Fix the commit message upon landing. Check the standard here. Edit it to create a single commit. If necessary, ask submitter for a new summary.

Split the erasurecode/multiple_failure.py into two separate tests to
reduce the possibility of a large number of ERR messages in the server
log file from preventing other test variants from failing dure to out of
space errors.

Skip-unit-tests: true
Skip-fault-injection-test: true
Test-tag: EcodFioRebuild EcodOnlineMultFail
Skip-func-hw-test-large-md-on-ssd: false

Signed-off-by: Phil Henderson <[email protected]>
@phender phender requested review from a team as code owners October 22, 2024 22:44
Copy link

Ticket title is '[12-24]-./erasurecode/rebuild_fio.py:EcodFioRebuild.test_ec_online_rebuild_fio tests fail due to daos_server startup problem.'
Status is 'Awaiting backport'
Labels: 'ci_master_weekly,md_on_ssd,scrubbed_2.8,weekly_test'
Job should run at elevated priority (1)
https://daosio.atlassian.net/browse/DAOS-16265

@github-actions github-actions bot added the priority Ticket has high priority (automatically managed) label Oct 22, 2024
@phender phender requested a review from daltonbohning October 22, 2024 22:44
@phender phender added clean-cherry-pick Cherry-pick from another branch that did not require additional edits forced-landing The PR has known failures or has intentionally reduced testing, but should still be landed. labels Oct 22, 2024
@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Large MD on SSD completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15369/2/testReport/

@phender
Copy link
Contributor Author

phender commented Oct 24, 2024

There was one failure in https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15369/2/artifact/Functional%20Hardware%20Large%20MD%20on%20SSD/erasurecode/multiple_rank_failure.py, but it was due to IOR reporting warnings - not the issue being fixed by this PR:

2024-10-24 00:16:24,405 process          L0677 INFO | Command /usr/lib64/mpich/bin/mpirun -genv COVFILE=/tmp/test.cov -genv D_LOG_FILE=/var/tmp/daos_testing/ior_daos.log -genv MPI_LIB="" -genv DAOS_UNS_PREFIX=daos://TestPool_1/49B11B48-F80A-42F8-B98A-4DF4C9C80A66 -genv IOR_HINT__MPI__romio_daos_obj_class=EC_2P2GX -hostfile /var/tmp/avocado_owogoxw7/avocado_job_seddzcdg/1-._erasurecode_multiple_rank_failure.py_EcodOnlineMultiRankFail.test_ec_multiple_rank_failure_run-container-hosts-ior-client_processes-iorflags-objectclass-EC_2P2GX-sizes-Full_Striped-pool-server_config-control_metadata-engines-0-storage-0-1-setup-bba5/hostfile_mtvbemws -np 32 ior -a DFS -b 8G -r -R -F -k -G 1 -vv -i 1 -o /testfile -t 8M --dfs.chunk_size 32M --dfs.cont 49B11B48-F80A-42F8-B98A-4DF4C9C80A66 --dfs.dir_oclass EC_2P2GX --dfs.oclass EC_2P2GX --dfs.pool TestPool_1 running on a thread
2024-10-24 00:16:25,045 process          L0416 DEBUG| [stdout] IOR-3.4.0+dev: MPI Coordinated Test of Parallel I/O
2024-10-24 00:16:25,045 process          L0416 DEBUG| [stdout] Began               : Thu Oct 24 00:16:25 2024
2024-10-24 00:16:25,045 process          L0416 DEBUG| [stdout] Command line        : ior -a DFS -b 8G -r -R -F -k -G 1 -vv -i
2024-10-24 00:16:25,046 process          L0416 DEBUG| [stdout]  1 -o /testfile -t 8M --dfs.chunk_size 32M --dfs.cont 49B11B48-F80A-42F8-B98A-4DF4C9C80A66 --dfs.dir_oclass EC_2P2GX --dfs.oclass EC_2P2GX --dfs.pool TestPool_1
2024-10-24 00:16:25,046 process          L0416 DEBUG| [stdout] Machine             : Linux wolf-324.wolf.hpdd.intel.com 4.18.0-477.27.1.el8_8.x86_64 #1 SMP Wed Sep 20 15:55:39 UTC 2023 x86_64
2024-10-24 00:16:28,656 process          L0416 DEBUG| [stdout] [0] DFS Pool = TestPool_1
2024-10-24 00:16:28,656 process          L0416 DEBUG| [stdout] [0] DFS Container = 49B11B48-F80A-42F8-B98A-4DF4C9C80A66
2024-10-24 00:16:28,750 process          L0416 DEBUG| [stdout] TestID              : 0
2024-10-24 00:16:28,750 process          L0416 DEBUG| [stdout] StartTime           : Thu Oct 24 00:16:28 2024
2024-10-24 00:16:28,751 process          L0416 DEBUG| [stdout] Path                : /testfile.00000000
2024-10-24 00:16:28,751 process          L0416 DEBUG| [stdout] FS                  : 32.0 TiB   Used FS: 2.4%   Inodes: -0.0 Mi   Used Inodes: 0.0%
2024-10-24 00:16:28,751 process          L0416 DEBUG| [stdout] Participating tasks : 32
2024-10-24 00:16:28,751 process          L0416 DEBUG| [stdout] 
2024-10-24 00:16:28,751 process          L0416 DEBUG| [stdout] Options: 
2024-10-24 00:16:28,751 process          L0416 DEBUG| [stdout] api                 : DFS
2024-10-24 00:16:28,751 process          L0416 DEBUG| [stdout] apiVersion          : DAOS
2024-10-24 00:16:28,751 process          L0416 DEBUG| [stdout] test filename       : /testfile
2024-10-24 00:16:28,751 process          L0416 DEBUG| [stdout] access              : file-per-process
2024-10-24 00:16:28,751 process          L0416 DEBUG| [stdout] type                : independent
2024-10-24 00:16:28,751 process          L0416 DEBUG| [stdout] segments            : 1
2024-10-24 00:16:28,751 process          L0416 DEBUG| [stdout] ordering in a file  : sequential
2024-10-24 00:16:28,751 process          L0416 DEBUG| [stdout] ordering inter file : no tasks offsets
2024-10-24 00:16:28,751 process          L0416 DEBUG| [stdout] nodes               : 2
2024-10-24 00:16:28,751 process          L0416 DEBUG| [stdout] tasks               : 32
2024-10-24 00:16:28,752 process          L0416 DEBUG| [stdout] clients per node    : 16
2024-10-24 00:16:28,752 process          L0416 DEBUG| [stdout] repetitions         : 1
2024-10-24 00:16:28,752 process          L0416 DEBUG| [stdout] xfersize            : 8 MiB
2024-10-24 00:16:28,752 process          L0416 DEBUG| [stdout] blocksize           : 8 GiB
2024-10-24 00:16:28,752 process          L0416 DEBUG| [stdout] aggregate filesize  : 256 GiB
2024-10-24 00:16:28,752 process          L0416 DEBUG| [stdout] verbose             : 2
2024-10-24 00:16:28,752 process          L0416 DEBUG| [stdout] 
2024-10-24 00:16:28,752 process          L0416 DEBUG| [stdout] Results: 
2024-10-24 00:16:28,753 process          L0416 DEBUG| [stdout] Using Time Stamp 1 (0x1) for Data Signature
2024-10-24 00:16:28,753 process          L0416 DEBUG| [stdout] 
2024-10-24 00:16:28,753 process          L0416 DEBUG| [stdout] access    bw(MiB/s)  IOPS       Latency(s)  block(KiB) xfer(KiB)  open(s)    wr/rd(s)   close(s)   total(s)   iter
2024-10-24 00:16:28,753 process          L0416 DEBUG| [stdout] ------    ---------  ----       ----------  ---------- ---------  --------   --------   --------   --------   ----
2024-10-24 00:16:38,160 process          L0416 DEBUG| [stdout] Commencing read performance test: Thu Oct 24 00:16:38 2024
2024-10-24 00:16:38,160 process          L0416 DEBUG| [stdout] 
2024-10-24 00:16:48,837 process          L0416 DEBUG| [stdout] WARNING: Expected aggregate file size       = 274877906944
2024-10-24 00:16:48,837 process          L0416 DEBUG| [stdout] WARNING: Stat() of aggregate file size      = 274877775872
2024-10-24 00:16:48,837 process          L0416 DEBUG| [stdout] WARNING: Using actual aggregate bytes moved = 274877906944
2024-10-24 00:16:48,838 process          L0416 DEBUG| [stdout] read      24525      
2024-10-24 00:16:48,838 process          L0416 DEBUG| [stdout] 3066.68    0.010426    8388608    8192       0.016944   10.69      0.000112   10.69      0   
2024-10-24 00:16:48,838 process          L0416 DEBUG| [stdout] Max Read:  24524.95 MiB/sec (25716.27 MB/sec)
2024-10-24 00:16:48,840 process          L0416 DEBUG| [stdout] [0] Disconnecting from DAOS POOL
2024-10-24 00:16:48,841 process          L0416 DEBUG| [stdout] [0] Finalizing DAOS..
2024-10-24 00:16:49,012 process          L0416 DEBUG| [stdout] 
2024-10-24 00:16:49,012 process          L0416 DEBUG| [stdout] Summary of all tests:
2024-10-24 00:16:49,012 process          L0416 DEBUG| [stdout] Operation   Max(MiB)   Min(MiB)  Mean(MiB)     StdDev   Max(OPs)   Min(OPs)  Mean(OPs)     StdDev    Mean(s) Stonewall(s) Stonewall(MiB) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt   blksiz    xsize aggs(MiB)   API RefNum
2024-10-24 00:16:49,012 process          L0416 DEBUG| [stdout] read        24524.95   24524.95   24524.95       0.00    3065.62    3065.62    3065.62       0.00 
2024-10-24 00:16:49,012 process          L0416 DEBUG| [stdout]   10.68887         NA            NA     0     32  16    1   1     0        1         0    0      1 8589934592  8388608  262144.0 DFS      0
2024-10-24 00:16:49,012 process          L0416 DEBUG| [stdout] Finished            : Thu Oct 24 00:16:49 2024

@phender phender requested a review from a team October 24, 2024 12:51
@daltonbohning daltonbohning added the release-2.6.2 Targeted for release 2.6.2 label Nov 12, 2024
@daltonbohning daltonbohning added release-2.6.3 Targeted for 2.6.3 and removed release-2.6.2 Targeted for release 2.6.2 labels Dec 10, 2024
@daltonbohning daltonbohning merged commit 612e227 into release/2.6 Dec 11, 2024
45 of 47 checks passed
@daltonbohning daltonbohning deleted the pahender/DAOS-16265_split branch December 11, 2024 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clean-cherry-pick Cherry-pick from another branch that did not require additional edits forced-landing The PR has known failures or has intentionally reduced testing, but should still be landed. priority Ticket has high priority (automatically managed) release-2.6.3 Targeted for 2.6.3
Development

Successfully merging this pull request may close these issues.

3 participants