Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mjmac/DAOS 8331 no agent #14288

Closed
wants to merge 47 commits into from
Closed

mjmac/DAOS 8331 no agent #14288

wants to merge 47 commits into from

Conversation

mjmac
Copy link
Contributor

@mjmac mjmac commented Apr 30, 2024

mjmac and others added 30 commits April 19, 2024 17:43
Rename D_CLIENT_METRICS_DUMP_PATH to D_CLIENT_METRICS_DUMP_DIR
and update the logic to create a unique file for each process
under that directory.

Required-githooks: true
Change-Id: If7854c9906fad213a12b94e09fc9974392f5bcde
Signed-off-by: Michael MacDonald <[email protected]>
Required-githooks: true

Change-Id: I094f15fc76b2fcee1618e66a7793061db84c235b
Signed-off-by: Michael MacDonald <[email protected]>
Required-githooks: true

Change-Id: Id88cfa64bea35b3d85348d45a14e214c1d76f582
Required-githooks: true

Change-Id: I68ee55be724f6ba50a8d4ec9a18e7ec4a226c1a7
required_src was added to avoid conflicts on the file
during feature development. It is not necessary any longer
(and wrong since ddb has moved from src to src/utils now).

Signed-off-by: Johann Lombardi <[email protected]>
When setting these previously I thought they only appeared in
debugger output so they have names which are only meaningful
in that context, but the thread names are also visable in
ps and top and having a process called "main" does not make
sense here.

Do not rename the main dfuse thread, and use a dfuse prefix
for other thread names.

Signed-off-by: Ashley Pittman <[email protected]>
That will drop partial modification, remove the pinned DTX entry,
evict related stale cache.

Signed-off-by: Fan Yong <[email protected]>
Avoid using storage: auto on vm tests until DAOS-15233 can be addressed.

Signed-off-by: Phil Henderson <[email protected]>
Add registering calls for a container destroy for each TestContainer
object created by the test.  Using the register cleanup method will
ensure proper order of operations when tearing down the test case.

Signed-off-by: Phil Henderson <[email protected]>
vos: Add version param to pool create

In a DAOS pool using the old pool global version, we need to create new
VOS pools using the old DF version. See the Jira ticket for the details.
This patch adds a version parameter to vos_pool_create and
vos_pool_create_ex.

rsvc: Create rsvc with VOS DF version (#14156)

If a pool with an old layout version is served by a DAOS version with a
new default layout version, for instance, a 2.4-layout pool served by
DAOS 2.5, then any new VOS pools created for this DAOS pool must use the
old layout, or downgrading back to the old DAOS version would become
impossible.

Signed-off-by: Li Wei <[email protected]>
- SEP is currently not supported by any active provider.
- Remove how we expose SEP as it's setting is based on sockets provider limitations

Signed-off-by: Alexander A Oganezov <[email protected]>
Co-authored-by: Kris Jacque <[email protected]>
…ce stats (#14168)

NEW devices should be ignored Rather than causing a failure, situation
occurs when number of targets is less than the number of SSDs.

Signed-off-by: Tom Nabarro <[email protected]>
Add missing ':avocado: recursive' from test class docstrings.

Signed-off-by: Phil Henderson <[email protected]>
Support filenames with spaces when generating stack traces from core
files detected after running tests.

Signed-off-by: Phil Henderson <[email protected]>
- Fix mem leak for coverity 2555536

Signed-off-by: Alexander A Oganezov <[email protected]>
The coverity tool gets confused about the use of assert in the
debug version of the logging macros and can think a lock is
being unlocked twice which it reports as a API usage error.

Disable the complex macros for coverity to reduce the instances
of false positives in the tool.

Fixes coverity ID 1975167 and others.

Signed-off-by: Ashley Pittman <[email protected]>
Inject more faults in the non-baseline workload loops
(10% / 20% fault rate change to 33% / 50%), so there is
more separation in baseline loop timing compared to the
fault-injection loops timing.

Also, turn down engine logging during execution of the timed
metadata workloads in co_op_dup_timing(). Restore to the
originally-configured setting after the timed operations.
This is done with the additoin of a new tests dmg helper function,
dmg_server_set_logmasks(), called from co_op_dup_timing().

Signed-off-by: Kenneth Cain <[email protected]>
Originally use parameters "-g 11 -t 7 -o 3 -a 3 -d 3" for daos_gen_io_conf
will generate 437 cmd lines that includes 54 exclude/add cmd each will
trigger one rebuild. The total time 2100 Second possibly not enough to
run those cmds (most time spend for the 54 rebuilds).
This patch reduce the parameters "-g 11 -t 4 -o 3 -a 2 -d 2" will
generate 181 cmd lines includes 24 exclude/rebuild cmds to reduce
testing time.
Reduce the timeout value accordingly.

Signed-off-by: Xuezhao Liu <[email protected]>
The recovery/container_list_consolidation.py test orphans a container so
we need to indicate to the TestContainer object that we don't need to
call a daos container destroy during tearDown.

Signed-off-by: Phil Henderson <[email protected]>
Fixes coverity ID 2555535

Signed-off-by: Ashley Pittman <[email protected]>
"dmg system cleanup" will cleanup the pools and containers so skip
teardown cleanup.

Signed-off-by: Dalton Bohning <[email protected]>
CID: 2555531 Unchecked return value

Signed-off-by: Tom Nabarro <[email protected]>
…#14239)

For certain situations a zero value NVMe namespace ID will be returned
in dmg output, in this case it should be omitted from display output as
valid values are non-zero.

Signed-off-by: Tom Nabarro <[email protected]>
In SV overwerite case, the btr_update_record() will defer free
the original record and allocate new record for record replacing,
however, btr_node_tx_add() is mistakenly skipped in btr_update(),
that leads to:
1. In md-on-ssd mode, tree node changes are missed in WAL.
2. In pmem mode, tree node snapshot is missed in undo log.

Signed-off-by: Niu Yawei <[email protected]>
…stem (#14172)

improve the daos_init() and pool_connect() process to reuse the attach info instead of doing agent drpc upcalls multiple times.

Signed-off-by: Mohamad Chaarawi <[email protected]>
Regular mode of the interception library (libpil4dfs) uses fake file descriptors (fd) allocated in user space. In case of some libc functions are not intercepted, applications could get errors or even crash due to the fake fd. Compatibility mode is introduced to alleviate such issues and increase compatibility. open, openat, and opendir etc rely on dfuse to get real fd from dfuse and return them to applications with better compatibility but with some degraded performance.
Environmental variable "D_IL_COMPATIBLE=1" turns on compatible mode. Regular mode is set as default if "D_IL_COMPATIBLE" is unset.

Signed-off-by: Lei Huang <[email protected]>
Add a functional running fio with different configuration:
- using pil4dfs or dfs engine
- using pthread or fork
- using block size of 256KiB or 1MiB
The performance between fork and pil4dfs are compared.

This PR also add a new python utils class allowing to dynamically retrieve the CPU configurations of the client nodes.

Signed-off-by: Cedric Koch-Hofer <[email protected]>
- Add a lib/hardware package to collect fabric interface
  information through CART API.
- Remove custom OFI and UCX packages and dependencies.
- Update Go githook to ignore deleted files.

* Compensate for DAOS-15588

For systems without Infiniband, getting info for verbs produces a Mercury
error. For all other providers, including UCX verbs, it returns no error
and instead returns no results. We'll simulate that behavior here until
the underlying bug is fixed.

Signed-off-by: Kris Jacque <[email protected]>
Nasf-Fan and others added 9 commits April 30, 2024 18:00
CID: 2555541 2555529 2555524 2555517 2555545 2555527

Signed-off-by: Fan Yong <[email protected]>
In obj_layout_create, it get pl_map by pl_map_find() without holding
dp_map_lock, and then set "omd_ver = dc_pool_get_version(pool)".
The map version of the pl_map possibly not same as dc_pool_get_version()
if another thread refreshed the dc_pool's pool map.

Signed-off-by: Xuezhao Liu <[email protected]>
Move the self-executed part of dfuse/find.py into its own file.  Also
enable running python files setup using the ExecutableCommand with the
python executable.

Signed-off-by: Phil Henderson <[email protected]>
…rol (#14197)

Bumps [golang.org/x/net](https://github.com/golang/net) from 0.17.0 to 0.23.0.
- [Commits](golang/net@v0.17.0...v0.23.0)

---
updated-dependencies:
- dependency-name: golang.org/x/net
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Stop passing --dfs.group to ior and mdtest since they get this information from the agent.

Signed-off-by: Dalton Bohning <[email protected]>
Skip container cleanup in teardown since the pool is destroyed
explicitly first.

Signed-off-by: Dalton Bohning <[email protected]>
Fix an uncaught merge confclit between #14100 and #14201

Signed-off-by: Dalton Bohning <[email protected]>
Enable scenarios where client telemetry is collected and dumped
to a CSV without agent config changes or involvement.

Setting D_CLIENT_METRICS_DUMP_DIR in the client process
environment will enable client telemetry dump to the
specified directory even if the agent is not configured
to export telemetry.

Features: telemetry
Required-githooks: true
Change-Id: I243d11a2e00059ef3115d392d63c523048477122
Signed-off-by: Michael MacDonald <[email protected]>
@mjmac mjmac requested review from a team as code owners April 30, 2024 22:33
Copy link

Errors are component not formatted correctly,Ticket number prefix incorrect,PR title is malformatted. See https://daosio.atlassian.net/wiki/spaces/DC/pages/11133911069/Commit+Comments,Unable to load ticket data
https://daosio.atlassian.net/browse/mjmac/DAOS

@mjmac mjmac closed this Apr 30, 2024
@mjmac mjmac deleted the mjmac/DAOS-8331-no_agent branch April 30, 2024 22:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.