Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-15937 test: Automate metadata duplicate rpc detection time consuming #14473

Open
wants to merge 111 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 88 commits
Commits
Show all changes
111 commits
Select commit Hold shift + click to select a range
238e8e4
recommit
dinghwah Apr 21, 2023
95100ab
DAOS-15396 test: Coverage Report on Master March 2024
dinghwah Mar 8, 2024
9a67543
retore pool_security_test_base.py
dinghwah Mar 8, 2024
09ff0cb
restore pool_security_test_base.py
dinghwah Apr 24, 2024
129205b
recommit
dinghwah Apr 21, 2023
a98bce9
DAOS-15396 test: Coverage Report on Master March 2024
dinghwah Mar 8, 2024
9f02561
retore pool_security_test_base.py
dinghwah Mar 8, 2024
8ed567c
restore pool_security_test_base.py
dinghwah Apr 24, 2024
02a265a
Restore pool_security_test_base.py
dinghwah May 23, 2024
d5e7422
DAOS-15937 test: Automate metadata duplicate rpc detection time consu…
dinghwah May 29, 2024
297eae5
Update metadata_svc_ops.py
dinghwah May 29, 2024
b8e750a
Update metadata_svc_ops.yaml with manual tests.
dinghwah May 31, 2024
bd3f199
Update test script and yaml per Ken's comments.
dinghwah Jun 5, 2024
5d067a9
recommit
dinghwah Apr 21, 2023
195afed
DAOS-15396 test: Coverage Report on Master March 2024
dinghwah Mar 8, 2024
477ceee
retore pool_security_test_base.py
dinghwah Mar 8, 2024
f09de12
restore pool_security_test_base.py
dinghwah Apr 24, 2024
f53b72f
recommit
dinghwah Apr 21, 2023
4f8efdd
DAOS-15396 test: Coverage Report on Master March 2024
dinghwah Mar 8, 2024
bf0102e
retore pool_security_test_base.py
dinghwah Mar 8, 2024
f560bbd
restore pool_security_test_base.py
dinghwah Apr 24, 2024
bf956d2
Restore pool_security_test_base.py
dinghwah May 23, 2024
de22b28
Merge branch 'master' into dinghwah/DAOS-15937-mdtest
dinghwah Jun 5, 2024
d332009
recommit
dinghwah Apr 21, 2023
e97e68c
DAOS-15396 test: Coverage Report on Master March 2024
dinghwah Mar 8, 2024
a48eae2
retore pool_security_test_base.py
dinghwah Mar 8, 2024
e849863
restore pool_security_test_base.py
dinghwah Apr 24, 2024
61a6a4f
recommit
dinghwah Apr 21, 2023
28babe2
DAOS-15396 test: Coverage Report on Master March 2024
dinghwah Mar 8, 2024
aadba94
retore pool_security_test_base.py
dinghwah Mar 8, 2024
c71fe72
restore pool_security_test_base.py
dinghwah Apr 24, 2024
1d01888
Restore pool_security_test_base.py
dinghwah May 23, 2024
e320a22
Merge branch 'master' into dinghwah/DAOS-15937-mdtest
dinghwah Jun 5, 2024
350276f
Update metadata_svc_ops.py per Ken's comments.
dinghwah Jun 10, 2024
f70788c
recommit
dinghwah Apr 21, 2023
c582b41
DAOS-15396 test: Coverage Report on Master March 2024
dinghwah Mar 8, 2024
9b3eb4e
retore pool_security_test_base.py
dinghwah Mar 8, 2024
adf7ef7
restore pool_security_test_base.py
dinghwah Apr 24, 2024
426b046
recommit
dinghwah Apr 21, 2023
7fcdd1e
DAOS-15396 test: Coverage Report on Master March 2024
dinghwah Mar 8, 2024
d38fd75
retore pool_security_test_base.py
dinghwah Mar 8, 2024
c2a2276
restore pool_security_test_base.py
dinghwah Apr 24, 2024
38ce1eb
Restore pool_security_test_base.py
dinghwah May 23, 2024
56a5af1
Merge branch 'master' into dinghwah/DAOS-15937-mdtest
dinghwah Jun 10, 2024
32479ea
recommit
dinghwah Apr 21, 2023
2e4f13d
DAOS-15396 test: Coverage Report on Master March 2024
dinghwah Mar 8, 2024
701b48a
retore pool_security_test_base.py
dinghwah Mar 8, 2024
091faef
restore pool_security_test_base.py
dinghwah Apr 24, 2024
ae6008f
recommit
dinghwah Apr 21, 2023
4d21e0b
DAOS-15396 test: Coverage Report on Master March 2024
dinghwah Mar 8, 2024
11c14ea
retore pool_security_test_base.py
dinghwah Mar 8, 2024
cddc526
restore pool_security_test_base.py
dinghwah Apr 24, 2024
70ea60d
Restore pool_security_test_base.py
dinghwah May 23, 2024
2db908d
Merge branch 'master' into dinghwah/DAOS-15937-mdtest
dinghwah Jun 11, 2024
b9194d3
Update script per Dalton's comments.
dinghwah Jun 12, 2024
ba9e54d
recommit
dinghwah Apr 21, 2023
966f5c7
DAOS-15396 test: Coverage Report on Master March 2024
dinghwah Mar 8, 2024
c42229e
retore pool_security_test_base.py
dinghwah Mar 8, 2024
8c2046e
restore pool_security_test_base.py
dinghwah Apr 24, 2024
7ac504f
recommit
dinghwah Apr 21, 2023
7c57f3f
DAOS-15396 test: Coverage Report on Master March 2024
dinghwah Mar 8, 2024
4ac3116
retore pool_security_test_base.py
dinghwah Mar 8, 2024
8e2a751
restore pool_security_test_base.py
dinghwah Apr 24, 2024
1a052ff
Restore pool_security_test_base.py
dinghwah May 23, 2024
2d2a1a4
Merge branch 'master' into dinghwah/DAOS-15937-mdtest
dinghwah Jul 29, 2024
b1c7728
Update metadata_svc_ops.py per Dalton's comment.
dinghwah Aug 6, 2024
682dfa6
recommit
dinghwah Apr 21, 2023
51f0ef2
DAOS-15396 test: Coverage Report on Master March 2024
dinghwah Mar 8, 2024
5e08dc4
retore pool_security_test_base.py
dinghwah Mar 8, 2024
7e2124f
restore pool_security_test_base.py
dinghwah Apr 24, 2024
23b096f
recommit
dinghwah Apr 21, 2023
536001d
DAOS-15396 test: Coverage Report on Master March 2024
dinghwah Mar 8, 2024
95939ab
retore pool_security_test_base.py
dinghwah Mar 8, 2024
c5b8717
restore pool_security_test_base.py
dinghwah Apr 24, 2024
a8ea862
Restore pool_security_test_base.py
dinghwah May 23, 2024
5e0c8db
Merge branch 'master' into dinghwah/DAOS-15937-mdtest
dinghwah Aug 6, 2024
1e646ef
Update metadata_svc_ops.py for pylint.
dinghwah Aug 6, 2024
d56e5b1
Recommit with factor to 2.5
dinghwah Aug 8, 2024
8250041
recommit
dinghwah Apr 21, 2023
8ec32ff
DAOS-15396 test: Coverage Report on Master March 2024
dinghwah Mar 8, 2024
0928f3a
retore pool_security_test_base.py
dinghwah Mar 8, 2024
e118912
restore pool_security_test_base.py
dinghwah Apr 24, 2024
384efe0
recommit
dinghwah Apr 21, 2023
7bb1dea
DAOS-15396 test: Coverage Report on Master March 2024
dinghwah Mar 8, 2024
6d40d39
retore pool_security_test_base.py
dinghwah Mar 8, 2024
519d4c0
restore pool_security_test_base.py
dinghwah Apr 24, 2024
8ab5f43
Restore pool_security_test_base.py
dinghwah May 23, 2024
0649717
Merge branch 'master' into dinghwah/DAOS-15937-mdtest
dinghwah Aug 8, 2024
f0e9314
recommit
dinghwah Apr 21, 2023
4efb7a9
DAOS-15396 test: Coverage Report on Master March 2024
dinghwah Mar 8, 2024
50444a1
retore pool_security_test_base.py
dinghwah Mar 8, 2024
542bca9
restore pool_security_test_base.py
dinghwah Apr 24, 2024
834d90a
recommit
dinghwah Apr 21, 2023
0f7ae5b
DAOS-15396 test: Coverage Report on Master March 2024
dinghwah Mar 8, 2024
ae16475
retore pool_security_test_base.py
dinghwah Mar 8, 2024
8d4602c
restore pool_security_test_base.py
dinghwah Apr 24, 2024
76eb46f
Restore pool_security_test_base.py
dinghwah May 23, 2024
984aefc
Merge branch 'master' into dinghwah/DAOS-15937-mdtest
dinghwah Aug 22, 2024
2b8dd5d
Update metadata_svc_ops.yaml per Ken's comment.
dinghwah Aug 22, 2024
62ad7aa
recommit
dinghwah Apr 21, 2023
595fcaa
DAOS-15396 test: Coverage Report on Master March 2024
dinghwah Mar 8, 2024
a0a7468
retore pool_security_test_base.py
dinghwah Mar 8, 2024
735a5fc
restore pool_security_test_base.py
dinghwah Apr 24, 2024
fa3564b
recommit
dinghwah Apr 21, 2023
223c922
DAOS-15396 test: Coverage Report on Master March 2024
dinghwah Mar 8, 2024
28525a1
retore pool_security_test_base.py
dinghwah Mar 8, 2024
4b89f5a
restore pool_security_test_base.py
dinghwah Apr 24, 2024
fcd1952
Restore pool_security_test_base.py
dinghwah May 23, 2024
5490200
Update metadata_svc_ops.yaml with silent: true
dinghwah Aug 27, 2024
e428682
Merge branch 'master' into dinghwah/DAOS-15937-mdtest
dinghwah Aug 27, 2024
f6addc1
Update metadata_svc_ops.yaml to test svc-ops 150
dinghwah Aug 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
133 changes: 133 additions & 0 deletions src/tests/ftest/server/metadata_svc_ops.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
"""
(C) Copyright 2022-2024 Intel Corporation.

SPDX-License-Identifier: BSD-2-Clause-Patent
"""

import statistics
import time

from apricot import TestWithServers
from avocado.core.exceptions import TestFail
from general_utils import DaosTestError
from thread_manager import ThreadManager


class DuplicateRpcDetection(TestWithServers):
"""Compare metadata handling performance between pools with and without duplicate rpc
detection feature enabled.

Test Class Description:
Create pools run test with and without duplicate rpc detection feature and verify time
dinghwah marked this conversation as resolved.
Show resolved Hide resolved
consuming with metadata workload before and after svc ops full.

:avocado: recursive
"""

def metadata_workload_test(self, pool, cont_num, workload_cycles, test_loops):
"""To create single container and perform metadata workload tests.

Args:
pool (str): pool handle to create container.
cont_num (int): Container number for logging.
workload_cycles (int): Number of metadata workload test cycles per test loop.
test_loops (int): Number of metadata workload test loops.

Returns:
list: List of time consumed per test loops of metadata workload test cycles.

"""
test_time = []
try:
daos = self.get_daos_command()
daos.verbose = False
container = self.get_container(pool, daos=daos)
self.log.info("Successfully created #%s container", cont_num)
except (DaosTestError, TestFail) as err:
self.fail(
"#({}.{}) container create failed. err={}".format(pool.label, cont_num, err))
for ind in range(test_loops):
start = time.time()
for _ in range(workload_cycles):
container.open()
container.close()
elapsed_time = time.time() - start
self.log.info("Completed container Metadata test-loop: %d, elapsed_time: %f",
ind + 1, elapsed_time)
test_time.append(elapsed_time)
for ind in range(test_loops):
self.log.info("Test time of Metadata test-loop: %d, %f",
ind, test_time[ind])
return test_time

def test_metadata_dup_rpc(self):
"""JIRA ID: DAOS-15937 metadata duplicate rpc detection time consuming.

Test Steps:
1. Bring up DAOS server.
2. Create pool1 with specified property svc_ops_entry_age.
3. Create containers by ThreadManager.
4. Run specified metadata workload cycles in multiple test loops (N cycles per loop).
5. Create pool2 with property svc_ops_enable:0.
6. Create containers by ThreadManager on pool2.
7. To establish a "baseline" time (without duplicate rpc detection), perform test
step 4 on pool2, calculating average time per loop executed.
8. Compare all metadata workload times (with duplicate rpc detection) to the average
baseline time (without duplicate rpc).

:avocado: tags=all,full_regression
:avocado: tags=hw,medium,md_on_ssd
:avocado: tags=server,metadata
:avocado: tags=DuplicateRpcDetection,test_metadata_dup_rpc
"""
number_thread = self.params.get("number_thread", '/run/metadata/*', default=1)
w_cycles = self.params.get("workload_test_cycles", '/run/metadata/*', default=5000)
t_loops = self.params.get("test_loops", '/run/metadata/*', default=10)
threshold_factor = self.params.get("threshold_factor", '/run/metadata/*', default=1.75)

self.log_step("Create pool with properties svc_ops_entry_age.")
pool1 = self.get_pool(dmg=self.get_dmg_command().copy())

self.log_step("Create containers by ThreadManager on pool1.")
container_manager = ThreadManager(
self.metadata_workload_test, self.get_remaining_time() - 30)
for cont_num in range(1, number_thread + 1):
container_manager.add(
pool=pool1, cont_num=cont_num, workload_cycles=w_cycles, test_loops=t_loops)

self.log_step("Run specified metadata workload cycles in multiple test loops.")
results = container_manager.run()
num_failed = len(list(filter(lambda r: not r.passed, results)))
if num_failed > 0:
self.fail('#{} container create threads failed'.format(num_failed))

self.log_step("Create pool2 with property svc_ops_enable:0.")
self.add_pool(properties="svc_ops_enabled:0")

self.log_step("Create containers by ThreadManager on pool2.")
container_manager = ThreadManager(
self.metadata_workload_test, self.get_remaining_time() - 30)
for cont_num in range(1, number_thread + 1):
container_manager.add(
pool=self.pool, cont_num=cont_num, workload_cycles=w_cycles, test_loops=t_loops)

self.log_step(
"To establish a baseline time without duplicate rpc detection, ",
"calculating average time per loop executed on pool2.")
base_results = container_manager.run()
num_failed = len(list(filter(lambda r: not r.passed, base_results)))
if num_failed > 0:
self.fail('#{} container create threads failed'.format(num_failed))
base_average_time = statistics.mean(base_results[0].result)

self.log_step(
"Compare metadata workload test time with and without duplicate rpc detection.")
self.log.info("pool1 results = %s", results[0].result)
self.log.info("baseline results = %s", base_results[0].result)
self.log.info("average baseline result= %s", base_average_time)
for result in results[0].result:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test steps documentation for step 4 should not specify that it is calculating an average for those loops following svc_ops_entry_age time.

This verification seems OK, since we expect the "early" loops before svc_ops_entry_age to be quicker than what the performance will "stabilize" to after svc_ops_entry_age time. i.e., all timings before and after svc_ops_entry_age time should fit underneath the baseline times the threshold_factor. But we particularly care about the iterations after svc_ops_entry_age time has passed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the new commit, which should have addressed all the comments.

if result > base_average_time * threshold_factor:
self.fail(
"#Dup rpc detection time {} > baseline_time {} * threshold_factor {}".format(
result, base_average_time, threshold_factor))
self.log.info("Test passed")
56 changes: 56 additions & 0 deletions src/tests/ftest/server/metadata_svc_ops.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
hosts:
test_servers: 3
test_clients: 1
timeout: 850
server_config:
name: daos_server
engines_per_host: 2
engines:
0:
targets: 8
nr_xs_helpers: 4
first_core: 0
pinned_numa_node: 0
fabric_iface: ib0
fabric_iface_port: 31317
log_file: daos_server0.log
log_mask: DEBUG,MEM=ERR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

want to have log_mask: ERR when running a performance-sensitive test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the new commit.

env_vars:
- RDB_COMPACT_THRESHOLD=64
- DD_MASK=group_metadata_only
storage:
0:
class: dcpm
scm_list: ["/dev/pmem0"]
scm_mount: /mnt/daos0
1:
targets: 8
nr_xs_helpers: 4
first_core: 0
pinned_numa_node: 1
fabric_iface: ib1
fabric_iface_port: 31417
log_file: daos_server1.log
log_mask: DEBUG,MEM=ERR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

want to have log_mask: ERR when running a performance-sensitive test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the new commit.

env_vars:
- RDB_COMPACT_THRESHOLD=64
- DD_MASK=group_metadata_only
storage:
0:
class: dcpm
scm_list: ["/dev/pmem1"]
scm_mount: /mnt/daos1
pool:
scm_size: 10G
label: pool
set_logmasks: False
properties: svc_ops_entry_age:60
# Uncomment the following for manual test with different svc_ops_entry_age value
# properties: svc_ops_entry_age:150
# properties: svc_ops_entry_age:300
# properties: svc_ops_entry_age:600
metadata:
number_thread: 1
workload_test_cycles: 5000
test_loops: 10
threshold_factor: 2.5
Loading