Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge upstream/release/2.6 into upstream/google/2.6 #15107

Merged
merged 13 commits into from
Sep 10, 2024

Conversation

mjmac
Copy link
Contributor

@mjmac mjmac commented Sep 9, 2024

phender and others added 12 commits September 4, 2024 19:02
…15071)

The dfuse/ioctl_pool_handles.py test is overloading the VM so reduce the number of engine targets.

Signed-off-by: Phil Henderson <[email protected]>
It is possible that the DTX modified nothing when stop currnet backend
transaction. Under such case, we may not generate persistent DTX entry.
Then need to bypass such case before checking on-disk DTX entry status.

The patch makes some clean and removed redundant metrics for committed
DTX entries.

Enhance vos_dtx_deregister_record() to handle GC case.

Signed-off-by: Fan Yong <[email protected]>
display_memory_info was added to debug an issue when starting the servers,
but resolved by #14295.
It is no longer needed and consumes too much log space and time.

Signed-off-by: Dalton Bohning <[email protected]>
…#15054)

For EC object update via CPD RPC, when calculate the bitmap to skip
some iods for current EC data shard, we may input NULL for "*skips"
parameter. It may cause the old logic in obj_get_iods_offs_by_oid()
to generate some undefined DRAM for "skips" bitmap. Such bitmap may
be over-written by others, as to subsequent obj_bulk_transfer() may
be misguided.

The patch also fixes a bug inside obj_bulk_transfer() that cast any
input RPC as UPDATE/FETCH by force.

Signed-off-by: Fan Yong <[email protected]>
* DAOS-15863 container: fix a race for container cache

while destroying a container, cont_child_destroy_one() releases
its own refcount before waiting, if another ULT releases its
refcount, which is the last one, wakes up the waiting ULT and frees
it ds_cont_child straightaway, because no one else has refcount.

When the waiting ULT is waken up, it will try to change the already
freed ds_cont_child.

This patch changes the LRU eviction logic and fixes this race.

Signed-off-by: Liang Zhen <[email protected]>
Signed-off-by: Jeff Olivier <[email protected]>
Co-authored-by: Jeff Olivier <[email protected]>
…ace (#15050) (#15080)

Allow selecting a default interface that is running at a different speed
on different hosts.  Primarily this is to support selecting the ib0
interface by default when the launch node has a slower ib0 interface
than the cluster hosts.

Signed-off-by: Phil Henderson <[email protected]>
…5057)

* DAOS-16467 rebuild: add DAOS_PW_RF ENV for massive failure case

Allow user to set DAOS_PW_RF as pw_rf (pool wise RF).
If SWIM detected engine failure is going to break pw_rf, don't change
pool map, also don't trigger rebuild.
With critical log message to ask administrator to bring back those
engines in top priority (just "system start --ranks=xxx", need not to
reintegrate those engines).

a few functions renamed to avoid confuse -
pool_map_find_nodes() -> pool_map_find_ranks()
pool_map_find_node_by_rank() -> pool_map_find_dom_by_rank()
pool_map_node_nr() -> pool_map_rank_nr()

Signed-off-by: Xuezhao Liu <[email protected]>
…5084)

Client with stale pool map may try to send RPC to a DOWN target, if the
target was brought DOWN due to faulty NVMe device, the ds_pool_child could
have been stopped on the NVMe faulty reaction, We'd ensure proper error
code is returned for such case.

Signed-off-by: Niu Yawei <[email protected]>
Fix coverity 2555843 explict null dereferenced.

Signed-off-by: Niu Yawei <[email protected]>
Tag first release candidate for 2.6.1.

Signed-off-by: Phil Henderson <[email protected]>
…/2.6

Change-Id: If9f2160c76e5961a8e76e92e9ff5e3ddc1d489b4
Copy link

github-actions bot commented Sep 9, 2024

Errors are component not formatted correctly,Ticket number prefix incorrect,PR title is malformatted. See https://daosio.atlassian.net/wiki/spaces/DC/pages/11133911069/Commit+Comments,Unable to load ticket data
https://daosio.atlassian.net/browse/Merge

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15107/1/execution/node/343/log

@daosbuild1
Copy link
Collaborator

Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15107/1/execution/node/368/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15107/1/execution/node/299/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15107/1/execution/node/346/log

@@ -30,3 +30,4 @@ spdk=https://github.com/spdk/spdk/commit/b0aba3fcd5aceceea530a702922153bc7566497
ofi=https://github.com/ofiwg/libfabric/commit/d827c6484cc5bf67dfbe395890e258860c3f0979.diff
mercury=https://raw.githubusercontent.com/daos-stack/mercury/857f1d5d2ca72d4c1b8d7be5e7fd26d6292b495f/na_ucx_am_send_retry.patch,https://github.com/mercury-hpc/mercury/commit/b8c26fd86281f3b0883c31bd2d0cb467a12b860d.diff,https://github.com/mercury-hpc/mercury/commit/a35589c3d1134d9c80640e78247e210162ac4a3c.diff,https://github.com/mercury-hpc/mercury/commit/fa4abbb6273d975b2ef17ac4e561fd4255d384db.diff
fuse=https://github.com/libfuse/libfuse/commit/c9905341ea34ff9acbc11b3c53ba8bcea35eeed8.diff
mercury=https://raw.githubusercontent.com/daos-stack/mercury/481297621bafbbcac4cc6f8feab3f1b6f8b14b59/na_ucx_keyres_epchk.patch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a bad merge and we don't need the ucx patch. I believe we have patches though because I reverted 2.4.0rc5

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, wait, this is going to just overwrite line 31.

@daosbuild1
Copy link
Collaborator

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15107/2/testReport/

… resolve. (#15077)"

Not needed in our build (b/364929445).

This reverts commit 3d9e2d0.
@jolivier23 jolivier23 merged commit 5d14963 into google/2.6 Sep 10, 2024
49 of 50 checks passed
@jolivier23 jolivier23 deleted the mjmac/google/2.6 branch September 10, 2024 02:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

10 participants