-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mjmac/DAOS 8331 no agent #14288
Closed
Closed
mjmac/DAOS 8331 no agent #14288
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Contributor
mjmac
commented
Apr 30, 2024
- DAOS-8331 client: Telemetry dump should go to unique file paths
- incorporate review feedback, tweak variable name
- DAOS-9576 test: remove path to ddb src in ut (DAOS-9576 test: remove path to ddb src in ut #14238)
- DAOS-623 dfuse: Update dfuse thread names. (DAOS-623 dfuse: Update dfuse thread names. #14223)
- DAOS-15499 dtx: cleanup DTX for failure (DAOS-15499 dtx: cleanup DTX for failure #14224)
- DAOS-15648 test: Avoid failures with virtual NVMe (DAOS-15648 test: Avoid failures with virtual NVMe #14233)
- DAOS-15642 test: Implement TestContainer register cleanup (DAOS-15642 test: Implement TestContainer register cleanup #14159)
- DAOS-15605 vos: Add version param to pool create (DAOS-15605 vos: Add version param to pool create #14133)
- DAOS-15595 cart: Remove SEP setting (DAOS-15595 cart: Remove SEP setting #14110)
- DAOS-15654 control: Ignore NEW state NVMe devices when processing space stats (DAOS-15654 control: Ignore NEW state NVMe devices when processing spa… #14168)
- DAOS-15750 test: Missing dfuse/mu_perms.py execution (DAOS-15750 test: Missing dfuse/mu_perms.py execution #14249)
- DAOS-15747 test: Quote filenames when creating stack traces (DAOS-15747 test: Quote filenames when creating stack traces #14246)
- DAOS-15717 bug: Fix memory leak cid 2555536 (DAOS-15717 bug: Fix memory leak cid 2555536 #14231)
- DAOS-15329 cq: Disable debug locking macros for coverity. (DAOS-15329 cq: Disable debug locking macros for coverity. #14207)
- DAOS-15622 test: enhance co_op_dup_timing() predictability (DAOS-15622 test: enhance co_op_dup_timing() predictability #14180)
- DAOS-15059 test: reduce parameter for rank_failure test (DAOS-15059 test: reduce parameter for rank_failure test #14236)
- DAOS-15749 test: Don't destroy an orphaned contianer (DAOS-15749 test: Don't destroy an orphaned contianer #14250)
- DAOS-15718 dfuse: Fix invalid read in error path. (DAOS-15718 dfuse: Fix invalid read in error path. #14237)
- DAOS-15768 test: skip cont cleanup in dmg_system_cleanup (DAOS-15768 test: skip cont cleanup in dmg_system_cleanup #14264)
- DAOS-15723 test: Fix coverity warning 2555531 (DAOS-15723 test: Fix coverity warning 2555531 #14240)
- DAOS-15048 control: Display NSID only when populated in storage query (DAOS-15048 control: Display NSID only when populated in storage query #14239)
- DAOS-15670 vos: SV overwrite missed tx_add_range() (DAOS-15670 vos: SV overwrite missed tx_add_range() #14241)
- DAOS-623 test: fix gid typo check in unit test (DAOS-623 test: fix gid typo check in unit test #14258)
- DAOS-13151 client: cache and reuse the attach info for the default system (DAOS-13151 client: cache and reuse the attach info for the default system #14172)
- DAOS-15753 dfuse: Do not deadlock when failing to mount. (DAOS-15753 dfuse: Do not deadlock when failing to mount. #14252)
- DAOS-14149 client: add compatible mode for libpil4dfs (DAOS-14149 client: add compatible mode for libpil4dfs #13294)
- DAOS-14657 test: ftest for libpil4dfs with fio (DAOS-14657 test: ftest for libpil4dfs with fio #13797)
- DAOS-13292 control: Use cart API to detect fabric (DAOS-13292 control: Use cart API to detect fabric #13989)
- DAOS-15745 dfuse: Add the pre_read metrics whilst holding reference. (DAOS-15745 dfuse: Add the pre_read metrics whilst holding reference. #14256)
- DAOS-15628 test: Verify maximum containers create with and without dup metadata ops (DAOS-15628 test: Verify maximum containers create with and without dup metadata ops #14243)
- DAOS-13520 control: Fix UUID filter for dmg check query (DAOS-13520 control: Fix UUID filter for dmg check query #13050)
- DAOS-623 test: fix avocado run --failfast (DAOS-15754 test: fix avocado run --failfast #14253)
- DAOS-15684 test: add test case for custom server name (DAOS-15684 test: add test case for custom server name #14225)
- DAOS-14823 test: Changing scm-size for pool create (DAOS-14823 tests: Changing scm-size for pool create #13871)
- DAOS-15759 test: Remove utils/cr_demo (DAOS-15759 test: Remove utils/cr_demo #14265)
- DAOS-15659 test: fix local ftest prefix (DAOS-15659 test: fix local ftest prefix #14173)
- DAOS-15713 chk: fix kinds of coverity issues (DAOS-15713 chk: fix kinds of coverity issues #14242)
- DAOS-15661 object: set correct map version for layout create (DAOS-15661 object: set correct map version for layout create #14222)
- DAOS-15616 test: Update dfuse/find.py to work in a python venv (DAOS-15616 test: Update dfuse/find.py to work in a python venv #14262)
- Build(deps): Bump golang.org/x/net from 0.17.0 to 0.23.0 in /src/control (Build(deps): Bump golang.org/x/net from 0.17.0 to 0.23.0 in /src/control #14197)
- DAOS-15655 test: stop passing server group (DAOS-15655 test: stop passing server group #14201)
- DAOS-15781 test: fix pool_acl and pool_groups (DAOS-15774 test: fix pool_acl and pool_groups #14284)
- DAOS-4139 Coverity: fix Unchecked return value[2555519] (DAOS-4139 Coverity: fix Unchecked return value[2555519] #14232)
- DAOS-15655 test: fix set_daos_params conflict (DAOS-15655 test: fix set_daos_params conflict #14286)
- DAOS-8331 metrics: Support client metrics dump without agent
Rename D_CLIENT_METRICS_DUMP_PATH to D_CLIENT_METRICS_DUMP_DIR and update the logic to create a unique file for each process under that directory. Required-githooks: true Change-Id: If7854c9906fad213a12b94e09fc9974392f5bcde Signed-off-by: Michael MacDonald <[email protected]>
Required-githooks: true Change-Id: I094f15fc76b2fcee1618e66a7793061db84c235b Signed-off-by: Michael MacDonald <[email protected]>
Required-githooks: true Change-Id: Id88cfa64bea35b3d85348d45a14e214c1d76f582
Required-githooks: true Change-Id: I68ee55be724f6ba50a8d4ec9a18e7ec4a226c1a7
required_src was added to avoid conflicts on the file during feature development. It is not necessary any longer (and wrong since ddb has moved from src to src/utils now). Signed-off-by: Johann Lombardi <[email protected]>
When setting these previously I thought they only appeared in debugger output so they have names which are only meaningful in that context, but the thread names are also visable in ps and top and having a process called "main" does not make sense here. Do not rename the main dfuse thread, and use a dfuse prefix for other thread names. Signed-off-by: Ashley Pittman <[email protected]>
That will drop partial modification, remove the pinned DTX entry, evict related stale cache. Signed-off-by: Fan Yong <[email protected]>
Avoid using storage: auto on vm tests until DAOS-15233 can be addressed. Signed-off-by: Phil Henderson <[email protected]>
Add registering calls for a container destroy for each TestContainer object created by the test. Using the register cleanup method will ensure proper order of operations when tearing down the test case. Signed-off-by: Phil Henderson <[email protected]>
vos: Add version param to pool create In a DAOS pool using the old pool global version, we need to create new VOS pools using the old DF version. See the Jira ticket for the details. This patch adds a version parameter to vos_pool_create and vos_pool_create_ex. rsvc: Create rsvc with VOS DF version (#14156) If a pool with an old layout version is served by a DAOS version with a new default layout version, for instance, a 2.4-layout pool served by DAOS 2.5, then any new VOS pools created for this DAOS pool must use the old layout, or downgrading back to the old DAOS version would become impossible. Signed-off-by: Li Wei <[email protected]>
- SEP is currently not supported by any active provider. - Remove how we expose SEP as it's setting is based on sockets provider limitations Signed-off-by: Alexander A Oganezov <[email protected]> Co-authored-by: Kris Jacque <[email protected]>
…ce stats (#14168) NEW devices should be ignored Rather than causing a failure, situation occurs when number of targets is less than the number of SSDs. Signed-off-by: Tom Nabarro <[email protected]>
Add missing ':avocado: recursive' from test class docstrings. Signed-off-by: Phil Henderson <[email protected]>
Support filenames with spaces when generating stack traces from core files detected after running tests. Signed-off-by: Phil Henderson <[email protected]>
- Fix mem leak for coverity 2555536 Signed-off-by: Alexander A Oganezov <[email protected]>
The coverity tool gets confused about the use of assert in the debug version of the logging macros and can think a lock is being unlocked twice which it reports as a API usage error. Disable the complex macros for coverity to reduce the instances of false positives in the tool. Fixes coverity ID 1975167 and others. Signed-off-by: Ashley Pittman <[email protected]>
Inject more faults in the non-baseline workload loops (10% / 20% fault rate change to 33% / 50%), so there is more separation in baseline loop timing compared to the fault-injection loops timing. Also, turn down engine logging during execution of the timed metadata workloads in co_op_dup_timing(). Restore to the originally-configured setting after the timed operations. This is done with the additoin of a new tests dmg helper function, dmg_server_set_logmasks(), called from co_op_dup_timing(). Signed-off-by: Kenneth Cain <[email protected]>
Originally use parameters "-g 11 -t 7 -o 3 -a 3 -d 3" for daos_gen_io_conf will generate 437 cmd lines that includes 54 exclude/add cmd each will trigger one rebuild. The total time 2100 Second possibly not enough to run those cmds (most time spend for the 54 rebuilds). This patch reduce the parameters "-g 11 -t 4 -o 3 -a 2 -d 2" will generate 181 cmd lines includes 24 exclude/rebuild cmds to reduce testing time. Reduce the timeout value accordingly. Signed-off-by: Xuezhao Liu <[email protected]>
The recovery/container_list_consolidation.py test orphans a container so we need to indicate to the TestContainer object that we don't need to call a daos container destroy during tearDown. Signed-off-by: Phil Henderson <[email protected]>
Fixes coverity ID 2555535 Signed-off-by: Ashley Pittman <[email protected]>
"dmg system cleanup" will cleanup the pools and containers so skip teardown cleanup. Signed-off-by: Dalton Bohning <[email protected]>
CID: 2555531 Unchecked return value Signed-off-by: Tom Nabarro <[email protected]>
…#14239) For certain situations a zero value NVMe namespace ID will be returned in dmg output, in this case it should be omitted from display output as valid values are non-zero. Signed-off-by: Tom Nabarro <[email protected]>
In SV overwerite case, the btr_update_record() will defer free the original record and allocate new record for record replacing, however, btr_node_tx_add() is mistakenly skipped in btr_update(), that leads to: 1. In md-on-ssd mode, tree node changes are missed in WAL. 2. In pmem mode, tree node snapshot is missed in undo log. Signed-off-by: Niu Yawei <[email protected]>
Signed-off-by: Mohamad Chaarawi <[email protected]>
…stem (#14172) improve the daos_init() and pool_connect() process to reuse the attach info instead of doing agent drpc upcalls multiple times. Signed-off-by: Mohamad Chaarawi <[email protected]>
Signed-off-by: Ashley Pittman <[email protected]>
Regular mode of the interception library (libpil4dfs) uses fake file descriptors (fd) allocated in user space. In case of some libc functions are not intercepted, applications could get errors or even crash due to the fake fd. Compatibility mode is introduced to alleviate such issues and increase compatibility. open, openat, and opendir etc rely on dfuse to get real fd from dfuse and return them to applications with better compatibility but with some degraded performance. Environmental variable "D_IL_COMPATIBLE=1" turns on compatible mode. Regular mode is set as default if "D_IL_COMPATIBLE" is unset. Signed-off-by: Lei Huang <[email protected]>
Add a functional running fio with different configuration: - using pil4dfs or dfs engine - using pthread or fork - using block size of 256KiB or 1MiB The performance between fork and pil4dfs are compared. This PR also add a new python utils class allowing to dynamically retrieve the CPU configurations of the client nodes. Signed-off-by: Cedric Koch-Hofer <[email protected]>
- Add a lib/hardware package to collect fabric interface information through CART API. - Remove custom OFI and UCX packages and dependencies. - Update Go githook to ignore deleted files. * Compensate for DAOS-15588 For systems without Infiniband, getting info for verbs produces a Mercury error. For all other providers, including UCX verbs, it returns no error and instead returns no results. We'll simulate that behavior here until the underlying bug is fixed. Signed-off-by: Kris Jacque <[email protected]>
CID: 2555541 2555529 2555524 2555517 2555545 2555527 Signed-off-by: Fan Yong <[email protected]>
In obj_layout_create, it get pl_map by pl_map_find() without holding dp_map_lock, and then set "omd_ver = dc_pool_get_version(pool)". The map version of the pl_map possibly not same as dc_pool_get_version() if another thread refreshed the dc_pool's pool map. Signed-off-by: Xuezhao Liu <[email protected]>
Move the self-executed part of dfuse/find.py into its own file. Also enable running python files setup using the ExecutableCommand with the python executable. Signed-off-by: Phil Henderson <[email protected]>
…rol (#14197) Bumps [golang.org/x/net](https://github.com/golang/net) from 0.17.0 to 0.23.0. - [Commits](golang/net@v0.17.0...v0.23.0) --- updated-dependencies: - dependency-name: golang.org/x/net dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Stop passing --dfs.group to ior and mdtest since they get this information from the agent. Signed-off-by: Dalton Bohning <[email protected]>
Skip container cleanup in teardown since the pool is destroyed explicitly first. Signed-off-by: Dalton Bohning <[email protected]>
Signed-off-by: Samir Raval <[email protected]>
Fix an uncaught merge confclit between #14100 and #14201 Signed-off-by: Dalton Bohning <[email protected]>
Enable scenarios where client telemetry is collected and dumped to a CSV without agent config changes or involvement. Setting D_CLIENT_METRICS_DUMP_DIR in the client process environment will enable client telemetry dump to the specified directory even if the agent is not configured to export telemetry. Features: telemetry Required-githooks: true Change-Id: I243d11a2e00059ef3115d392d63c523048477122 Signed-off-by: Michael MacDonald <[email protected]>
Errors are component not formatted correctly,Ticket number prefix incorrect,PR title is malformatted. See https://daosio.atlassian.net/wiki/spaces/DC/pages/11133911069/Commit+Comments,Unable to load ticket data |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.