-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-16076 test: Automate dmg scale test to be run on Aurora #14616
Conversation
Steps: 1. Format storages 2. System query 3. Create a 100% pool that spans all engines 4. Pool query 5. Pool destroy 6. Create 50 pools spanning all the engines with each pool using a 1/50th of the capacity 7. Pool list 8. Get around 80 pool metrics 9. Destroy all 50 pools 10. System stop 11. System start Skip-unit-tests: true Skip-fault-injection-test: true Signed-off-by: Makito Kano <[email protected]>
Ticket title is 'Automate dmg scale test to be run on Aurora' |
Skip-unit-tests: true Skip-fault-injection-test: true Signed-off-by: Makito Kano <[email protected]>
Skip-unit-tests: true Skip-fault-injection-test: true Signed-off-by: Makito Kano <[email protected]>
Skip-unit-tests: true Skip-fault-injection-test: true Signed-off-by: Makito Kano <[email protected]>
… for the remaining 48 pools Skip-unit-tests: true Skip-fault-injection-test: true Signed-off-by: Makito Kano <[email protected]>
Skip-unit-tests: true Skip-fault-injection-test: true
"engine_pool_block_allocator_frags_small", | ||
"engine_pool_block_allocator_free_blks", | ||
"engine_pool_ops_key2anchor" | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why we want to keep this here? I feel we have to use the metrics list available under TelemetryUtils.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to keep them here because these are scattered across different variables in TelemetryUtils.py. Also, they can be moved around or removed by someone else in TelemetryUtils.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW @phender And I have gone back and forth on this too. I tend to agree with @shimizukko here: keeping them here makes it much less likely that someone accidentally breaks them in the utils
""" | ||
# This is a manual test and we need to find the durations from job.log, so add "##" to make | ||
# it easy to search. The log is usually over 1 million lines. | ||
self.log_step("## System query") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to not put formatting for log_step because it already formats the messages
self.log_step("## System query") | |
self.log_step("System query") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is printed as:
==> Step 4: ## System query [elapsed since last step: 0.00s]
If we don't use ##
, We could search with ==>
, but I'm using ##
in other places such as total pool create duration. In my experience, it's easier to search with the same search string across the entire job.log than switching the strings to search different values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts @phender ? Similar rationale has come up before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the cleaner way is to measure the duration for each step, but now we use the harness for some operations such as self.server_managers[0].system_stop()
, so measuring the command duration isn't straightforward. Also, this test is manually executed only at RC (4 times in each RC), so I'm not sure if I want to put more effort into it.
Remove unnecessary tags. Skip-unit-tests: true Skip-fault-injection-test: true Signed-off-by: Makito Kano <[email protected]>
Skip-unit-tests: true Skip-fault-injection-test: true
…_start() Also update variable names and comment. Skip-unit-tests: true Skip-fault-injection-test: true Signed-off-by: Makito Kano <[email protected]>
Skip-unit-tests: true Skip-fault-injection-test: true
Skip-unit-tests: true Skip-fault-injection-test: true Signed-off-by: Makito Kano <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@shimizukko I didn't think about it until after merging, but couldn't this be made to work in CI so we know if the test is accidentally broken? |
…ack#14616) Steps: 1. Format storages 2. System query 3. Create a 100% pool that spans all engines 4. Pool query 5. Pool destroy 6. Create 49 pools spanning all the engines with each pool using a 1/50th of the capacity 7. Pool list 8. Get around 80 pool metrics 9. Destroy all 49 pools 10. System stop 11. System start Signed-off-by: Makito Kano <[email protected]>
Steps: 1. Format storages 2. System query 3. Create a 100% pool that spans all engines 4. Pool query 5. Pool destroy 6. Create 49 pools spanning all the engines with each pool using a 1/50th of the capacity 7. Pool list 8. Get around 80 pool metrics 9. Destroy all 49 pools 10. System stop 11. System start Skip-unit-tests: true Skip-fault-injection-test: true Signed-off-by: Makito Kano <[email protected]>
…15126 Skip-test: true Skip-build: true Steps: 1. Format storages 2. System query 3. Create a 100% pool that spans all engines 4. Pool query 5. Pool destroy 6. Create 49 pools spanning all the engines with each pool using a 1/50th of the capacity 7. Pool list 8. Get around 80 pool metrics 9. Destroy all 49 pools 10. System stop 11. System start Skip-unit-tests: true Skip-fault-injection-test: true Signed-off-by: Makito Kano <[email protected]> Signed-off-by: Dalton Bohning <[email protected]>
…15126 Skip-test: true Skip-build: true Steps: 1. Format storages 2. System query 3. Create a 100% pool that spans all engines 4. Pool query 5. Pool destroy 6. Create 49 pools spanning all the engines with each pool using a 1/50th of the capacity 7. Pool list 8. Get around 80 pool metrics 9. Destroy all 49 pools 10. System stop 11. System start Skip-unit-tests: true Skip-fault-injection-test: true Signed-off-by: Makito Kano <[email protected]> Signed-off-by: Dalton Bohning <[email protected]>
…15126 Skip-test: true Skip-build: true Steps: 1. Format storages 2. System query 3. Create a 100% pool that spans all engines 4. Pool query 5. Pool destroy 6. Create 49 pools spanning all the engines with each pool using a 1/50th of the capacity 7. Pool list 8. Get around 80 pool metrics 9. Destroy all 49 pools 10. System stop 11. System start Skip-unit-tests: true Skip-fault-injection-test: true Signed-off-by: Makito Kano <[email protected]> Signed-off-by: Dalton Bohning <[email protected]>
…#15126) Steps: 1. Format storages 2. System query 3. Create a 100% pool that spans all engines 4. Pool query 5. Pool destroy 6. Create 49 pools spanning all the engines with each pool using a 1/50th of the capacity 7. Pool list 8. Get around 80 pool metrics 9. Destroy all 49 pools 10. System stop 11. System start Signed-off-by: Makito Kano <[email protected]>
Steps:
Skip-unit-tests: true
Skip-fault-injection-test: true
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: