Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-16076 test: Automate dmg scale test to be run on Aurora #14616

Merged
merged 11 commits into from
Jul 9, 2024

Conversation

shimizukko
Copy link
Contributor

@shimizukko shimizukko commented Jun 19, 2024

Steps:

  1. Format storage
  2. System query
  3. Create a 100% pool that spans all engines
  4. Pool query
  5. Pool destroy
  6. Create 49 pools spanning all the engines with each pool using a 1/50th of the capacity
  7. Pool list
  8. Get around 80 pool metrics
  9. Destroy all 49 pools
  10. System stop
  11. System start

Skip-unit-tests: true
Skip-fault-injection-test: true

Before requesting gatekeeper:

  • Two review approvals and any prior change requests have been resolved.
  • Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
  • Features: (or Test-tag*) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.
  • Commit messages follows the guidelines outlined here.
  • Any tests skipped by the ticket being addressed have been run and passed in the PR.

Gatekeeper:

  • You are the appropriate gatekeeper to be landing the patch.
  • The PR has 2 reviews by people familiar with the code, including appropriate owners.
  • Githooks were used. If not, request that user install them and check copyright dates.
  • Checkpatch issues are resolved. Pay particular attention to ones that will show up on future PRs.
  • All builds have passed. Check non-required builds for any new compiler warnings.
  • Sufficient testing is done. Check feature pragmas and test tags and that tests skipped for the ticket are run and now pass with the changes.
  • If applicable, the PR has addressed any potential version compatibility issues.
  • Check the target branch. If it is master branch, should the PR go to a feature branch? If it is a release branch, does it have merge approval in the JIRA ticket.
  • Extra checks if forced landing is requested
    • Review comments are sufficiently resolved, particularly by prior reviewers that requested changes.
    • No new NLT or valgrind warnings. Check the classic view.
    • Quick-build or Quick-functional is not used.
  • Fix the commit message upon landing. Check the standard here. Edit it to create a single commit. If necessary, ask submitter for a new summary.

Steps:
1. Format storages
2. System query
3. Create a 100% pool that spans all engines
4. Pool query
5. Pool destroy
6. Create 50 pools spanning all the engines with each pool using a 1/50th of the capacity
7. Pool list
8. Get around 80 pool metrics
9. Destroy all 50 pools
10. System stop
11. System start

Skip-unit-tests: true
Skip-fault-injection-test: true
Signed-off-by: Makito Kano <[email protected]>
Copy link

github-actions bot commented Jun 19, 2024

Ticket title is 'Automate dmg scale test to be run on Aurora'
Status is 'In Review'
https://daosio.atlassian.net/browse/DAOS-16076

Skip-unit-tests: true
Skip-fault-injection-test: true
Signed-off-by: Makito Kano <[email protected]>
Skip-unit-tests: true
Skip-fault-injection-test: true
Signed-off-by: Makito Kano <[email protected]>
Skip-unit-tests: true
Skip-fault-injection-test: true
Signed-off-by: Makito Kano <[email protected]>
… for the remaining 48 pools

Skip-unit-tests: true
Skip-fault-injection-test: true
Signed-off-by: Makito Kano <[email protected]>
Skip-unit-tests: true
Skip-fault-injection-test: true
"engine_pool_block_allocator_frags_small",
"engine_pool_block_allocator_free_blks",
"engine_pool_ops_key2anchor"
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why we want to keep this here? I feel we have to use the metrics list available under TelemetryUtils.py.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to keep them here because these are scattered across different variables in TelemetryUtils.py. Also, they can be moved around or removed by someone else in TelemetryUtils.py.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW @phender And I have gone back and forth on this too. I tend to agree with @shimizukko here: keeping them here makes it much less likely that someone accidentally breaks them in the utils

src/tests/ftest/control/dmg_scale.py Outdated Show resolved Hide resolved
"""
# This is a manual test and we need to find the durations from job.log, so add "##" to make
# it easy to search. The log is usually over 1 million lines.
self.log_step("## System query")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to not put formatting for log_step because it already formats the messages

Suggested change
self.log_step("## System query")
self.log_step("System query")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is printed as:
==> Step 4: ## System query [elapsed since last step: 0.00s]
If we don't use ##, We could search with ==>, but I'm using ## in other places such as total pool create duration. In my experience, it's easier to search with the same search string across the entire job.log than switching the strings to search different values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts @phender ? Similar rationale has come up before

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the cleaner way is to measure the duration for each step, but now we use the harness for some operations such as self.server_managers[0].system_stop(), so measuring the command duration isn't straightforward. Also, this test is manually executed only at RC (4 times in each RC), so I'm not sure if I want to put more effort into it.

src/tests/ftest/control/dmg_scale.py Outdated Show resolved Hide resolved
src/tests/ftest/control/dmg_scale.py Outdated Show resolved Hide resolved
src/tests/ftest/control/dmg_scale.py Outdated Show resolved Hide resolved
Remove unnecessary tags.

Skip-unit-tests: true
Skip-fault-injection-test: true
Signed-off-by: Makito Kano <[email protected]>
src/tests/ftest/control/dmg_scale.py Outdated Show resolved Hide resolved
src/tests/ftest/control/dmg_scale.py Outdated Show resolved Hide resolved
src/tests/ftest/control/dmg_scale.py Outdated Show resolved Hide resolved
Skip-unit-tests: true
Skip-fault-injection-test: true
…_start()

Also update variable names and comment.

Skip-unit-tests: true
Skip-fault-injection-test: true
Signed-off-by: Makito Kano <[email protected]>
@shimizukko shimizukko marked this pull request as ready for review June 28, 2024 01:01
@shimizukko shimizukko requested review from a team as code owners June 28, 2024 01:01
@shimizukko shimizukko requested a review from daltonbohning June 28, 2024 01:06
Skip-unit-tests: true
Skip-fault-injection-test: true
Skip-unit-tests: true
Skip-fault-injection-test: true
Signed-off-by: Makito Kano <[email protected]>
@shimizukko shimizukko requested a review from daltonbohning June 29, 2024 01:28
Copy link
Contributor

@saurabhtandan saurabhtandan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@shimizukko shimizukko requested a review from a team July 5, 2024 04:58
@daltonbohning daltonbohning added the forced-landing The PR has known failures or has intentionally reduced testing, but should still be landed. label Jul 9, 2024
@daltonbohning daltonbohning merged commit 406f35b into master Jul 9, 2024
43 checks passed
@daltonbohning daltonbohning deleted the makito/DAOS-16076 branch July 9, 2024 15:28
@daltonbohning
Copy link
Contributor

@shimizukko I didn't think about it until after merging, but couldn't this be made to work in CI so we know if the test is accidentally broken?

grom72 pushed a commit to grom72/daos that referenced this pull request Jul 25, 2024
…ack#14616)

Steps:
1. Format storages
2. System query
3. Create a 100% pool that spans all engines
4. Pool query
5. Pool destroy
6. Create 49 pools spanning all the engines with each pool using a 1/50th of the capacity
7. Pool list
8. Get around 80 pool metrics
9. Destroy all 49 pools
10. System stop
11. System start

Signed-off-by: Makito Kano <[email protected]>
shimizukko added a commit that referenced this pull request Sep 13, 2024
Steps:
1. Format storages
2. System query
3. Create a 100% pool that spans all engines
4. Pool query
5. Pool destroy
6. Create 49 pools spanning all the engines with each pool using a 1/50th of the capacity
7. Pool list
8. Get around 80 pool metrics
9. Destroy all 49 pools
10. System stop
11. System start

Skip-unit-tests: true
Skip-fault-injection-test: true
Signed-off-by: Makito Kano <[email protected]>
daltonbohning pushed a commit that referenced this pull request Sep 13, 2024
…15126

Skip-test: true
Skip-build: true

Steps:
1. Format storages
2. System query
3. Create a 100% pool that spans all engines
4. Pool query
5. Pool destroy
6. Create 49 pools spanning all the engines with each pool using a 1/50th of the capacity
7. Pool list
8. Get around 80 pool metrics
9. Destroy all 49 pools
10. System stop
11. System start

Skip-unit-tests: true
Skip-fault-injection-test: true
Signed-off-by: Makito Kano <[email protected]>
Signed-off-by: Dalton Bohning <[email protected]>
daltonbohning pushed a commit that referenced this pull request Sep 24, 2024
…15126

Skip-test: true
Skip-build: true

Steps:
1. Format storages
2. System query
3. Create a 100% pool that spans all engines
4. Pool query
5. Pool destroy
6. Create 49 pools spanning all the engines with each pool using a 1/50th of the capacity
7. Pool list
8. Get around 80 pool metrics
9. Destroy all 49 pools
10. System stop
11. System start

Skip-unit-tests: true
Skip-fault-injection-test: true
Signed-off-by: Makito Kano <[email protected]>
Signed-off-by: Dalton Bohning <[email protected]>
daltonbohning pushed a commit that referenced this pull request Oct 4, 2024
…15126

Skip-test: true
Skip-build: true

Steps:
1. Format storages
2. System query
3. Create a 100% pool that spans all engines
4. Pool query
5. Pool destroy
6. Create 49 pools spanning all the engines with each pool using a 1/50th of the capacity
7. Pool list
8. Get around 80 pool metrics
9. Destroy all 49 pools
10. System stop
11. System start

Skip-unit-tests: true
Skip-fault-injection-test: true
Signed-off-by: Makito Kano <[email protected]>
Signed-off-by: Dalton Bohning <[email protected]>
daltonbohning pushed a commit that referenced this pull request Oct 10, 2024
…#15126)

Steps:
1. Format storages
2. System query
3. Create a 100% pool that spans all engines
4. Pool query
5. Pool destroy
6. Create 49 pools spanning all the engines with each pool using a 1/50th of the capacity
7. Pool list
8. Get around 80 pool metrics
9. Destroy all 49 pools
10. System stop
11. System start

Signed-off-by: Makito Kano <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
forced-landing The PR has known failures or has intentionally reduced testing, but should still be landed.
Development

Successfully merging this pull request may close these issues.

4 participants