Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-13559 vos: MD-on-SSD phase2 landing #15429

Merged
merged 67 commits into from
Nov 4, 2024

Conversation

NiuYawei
Copy link
Contributor

@NiuYawei NiuYawei commented Oct 30, 2024

Landing MD-on-SSD phase2 branch to master.

Signed-off-by: Tom Nabarro [email protected]
Signed-off-by: Sherin T George [email protected]
Signed-off-by: Niu Yawei [email protected]

Before requesting gatekeeper:

  • Two review approvals and any prior change requests have been resolved.
  • Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
  • Features: (or Test-tag*) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.
  • Commit messages follows the guidelines outlined here.
  • Any tests skipped by the ticket being addressed have been run and passed in the PR.

Gatekeeper:

  • You are the appropriate gatekeeper to be landing the patch.
  • The PR has 2 reviews by people familiar with the code, including appropriate owners.
  • Githooks were used. If not, request that user install them and check copyright dates.
  • Checkpatch issues are resolved. Pay particular attention to ones that will show up on future PRs.
  • All builds have passed. Check non-required builds for any new compiler warnings.
  • Sufficient testing is done. Check feature pragmas and test tags and that tests skipped for the ticket are run and now pass with the changes.
  • If applicable, the PR has addressed any potential version compatibility issues.
  • Check the target branch. If it is master branch, should the PR go to a feature branch? If it is a release branch, does it have merge approval in the JIRA ticket.
  • Extra checks if forced landing is requested
    • Review comments are sufficiently resolved, particularly by prior reviewers that requested changes.
    • No new NLT or valgrind warnings. Check the classic view.
    • Quick-build or Quick-functional is not used.
  • Fix the commit message upon landing. Check the standard here. Edit it to create a single commit. If necessary, ask submitter for a new summary.

sherintg and others added 30 commits October 11, 2023 17:28
- New umem macros are exported to do the allocation within
  memory bucket. umem internally now calls the modified backend
  allocator routines with memory bucket id passed as argument.
- umem_get_mb_evictable() and dav_get_zone_evictable() are
  added to support allocator returning preferred zone to be
  used as evictable memory bucket for current allocations. Right
  now these routines always return zero.
- The dav heap runtime is cleaned up to make provision for
  memory bucket implementation.

Signed-off-by: Sherin T George <[email protected]>
Four sets of umem cache APIs will be exported for md-on-ssd phase II:

1. Cache initialization & finalization
   - umem_cache_alloc()
   - umem_cache_free()

2. Cache map, load and pin
   - umem_cache_map();
   - umem_cache_load();
   - umem_cache_pin();
   - umem_cache_unpin();

3. Offset and memory address converting
   - umem_cache_off2ptr();
   - umem_cache_ptr2off();

4. Misc
   - umem_cache_commit();
   - umem_cache_reserve();

Required-githooks: true

Signed-off-by: Niu Yawei <[email protected]>
The phase-2 DAV allocator is placed under the subdirectory
src/common/dav_v2. This allocator is built as a standalone shared
library and linked to the libdaos_common_pmem library.
The umem will now support one more mode DAOS_MD_BMEM_V2. Setting
this mode in umem instance will result in using phase-2 DAV allocator
interfaces.

Signed-off-by: Sherin T George <[email protected]>
…#13032)

Use meta blob size if set when creating tgts on rank

Signed-off-by: Tom Nabarro <[email protected]>
Fixed a race involving dav_reserve which violated the rule of
“checkpointing must be done after WAL is committed”. Additionally
removed the atomic copy functionality with UMEM_COMMIT_DEFER flag.

Signed-off-by: Sherin T George <[email protected]>
- Added strict tx_id checking in touch_page().
- Fixed two merge glitches in pool_child_recreate() and DAV2 open.

Signed-off-by: Niu Yawei <[email protected]>
- Remove the redundant check in umem_cache_map(), that could
  mistakenly fail the call when mapping an evict-able page.
- Fix need_reserve() to deal with the case when
  "max_ne_pgs - ne_pgs < UMEM_CACHE_RSRVD_PAGES".
- Add callback being called when page loaded, allocator could build
  runtime & perform valgrind chores through this callback.

Signed-off-by: Niu Yawei <[email protected]>
- The phase-2 allocator will now support evictable and non-evictable
  memory buckets. The new allocator can be enabled using
  DAOS_MD_ON_SSD_MODE=3.
- Unit tests added to test the functionality.

Signed-off-by: Sherin T George <[email protected]>
In md-on-ssd phase 2, the scm_sz (VOS file size) could be smaller
than the meta_sz (meta blob size), then we need to store an extra
scm_sz in SMD, so that on engine start, this scm_sz could be
retrieved from SMD for VOS file re-creation.

To make the SMD compatible with pmem & md-on-ssd phase 1, a new
table named "meta_pool_ex" is introduced for storing scm_sz.

Signed-off-by: Niu Yawei <[email protected]>
Required-githooks: true

Signed-off-by: Tom Nabarro <[email protected]>
Use the user specified backend type when possible, if user specify BMEM
V1 backend and try to create a pool with "meta_size > scm_size", turn
to use BMEM V2 instead.

Store the per-pool backend type in meta blob header for pool open.

Signed-off-by: Niu Yawei <[email protected]>
Show MD-on-SSD specific output on pool create and add new syntax to
specify ratio between SSD capacity reserved for MD in new DAOS pool
and the (static) size of memory reserved for MD in the form of VOS
index files (previously held on SCM but now in tmpfs on ramdisk).
Memory-file size is now printed when creating a pool in MD-on--SSD
mode.

The new --{meta,data}-size params can be specified in decimal or
binary units e.g. GB or GiB and refer to per-rank allocations. These
manual size parameters are only for advanced use cases and in most
situations the --size (X%|XTB|XTiB) syntax is recommended when
creating a pool. --meta-size param is bytes to use for metadata on
SSD and --data-size is for data on SSD (similar to --nvme-size).

The new --mem-ratio param is specified as a percentage with up to two
decimal places precision. This defines the proportion of the metadata
capacity reserved on SSD (i.e. --meta-size) that will be used when
allocating the VOS-index (one blob and one memory file per target).

Enable MD-on-SSD phase2 pool creation requires envar
DAOS_MD_ON_SSD_MODE=3 to be set in server config file.

Signed-off-by: Tom Nabarro <[email protected]>
- The 80% rule for NE buckets will not be applied if the scm_sz is
  almost equal to meta_sz.
- Corrected the check for toggling between V1 and V2 store type
  when scm_sz passed  is zero.
- Added assert to catch incorrect computation of chunk_id if zone
  counts are not set during boot correctly.

Signed-off-by: Sherin T George <[email protected]>
@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15429/2/display/redirect

tanabarr
tanabarr previously approved these changes Nov 1, 2024
@tanabarr
Copy link
Contributor

tanabarr commented Nov 1, 2024

@brianjmurrell do the build related changes look okay?

debian/changelog Outdated
@@ -135,7 +142,7 @@ daos (2.5.100-12) unstable; urgency=medium

-- Jerome Soumagne <[email protected]> Wed, 15 Nov 2023 10:30:00 -0600

daos (2.5.100-10) unstable; urgency=medium
daos (2.5.100-11) unstable; urgency=medium
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be reverted

Suggested change
daos (2.5.100-11) unstable; urgency=medium
daos (2.5.100-10) unstable; urgency=medium

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@tanabarr
Copy link
Contributor

tanabarr commented Nov 1, 2024

@tanabarr any thoughts on the test failures? The two NLT memcheck failures looks are related to Go, are they known issue? The other two list_verbose.py failures are related to the new 'mem_file_bytes', will you fix that after this landed?

Do you have the memcheck suppressions from master branch? The race_amd64.s usually are handled by adding a new suppression but I'm surprised it would be different or MD on SSD branch.

yes the suppression 's file seems to be the same on both branches

Test-tag: pr daily_regression hw,medium,ListVerboseTest
Skip-func-hw-test-medium-md-on-ssd: false
Skip-func-hw-test-large-md-on-ssd: false
Allow-unstable-test: true
Required-githooks: true

Signed-off-by: Tom Nabarro <[email protected]>
@tanabarr
Copy link
Contributor

tanabarr commented Nov 1, 2024

@NiuYawei @gnailzenh I've pushed again with the following fixes because the previous CI run failed to run the hardware stages because of Jenkins issues.

  • Fixed changelog merge mistake identified by @phender
  • Updated test tags to
    Test-tag: pr daily_regression hw,medium,ListVerboseTest
    Skip-func-hw-test-medium-md-on-ssd: false
    Skip-func-hw-test-large-md-on-ssd: false
    Allow-unstable-test: true
  • Removed two unnecessary changes on feature branch

Copy link
Contributor

@phender phender left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ftest changes look good.

@tanabarr tanabarr self-requested a review November 1, 2024 21:31
@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15429/3/display/redirect

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15429/3/display/redirect

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15429/3/display/redirect

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Large MD on SSD completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15429/3/display/redirect

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15429/3/display/redirect

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15429/4/display/redirect

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15429/4/display/redirect

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Large MD on SSD completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15429/4/display/redirect

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15429/4/display/redirect

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15429/4/execution/node/584/log

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15429/5/display/redirect

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15429/5/display/redirect

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15429/5/execution/node/1064/log

@tanabarr
Copy link
Contributor

tanabarr commented Nov 4, 2024

@NiuYawei @gnailzenh @phender
I've been trying to get coverage over the weekend by rerunning to avoid the intermittent issue that stopping tests from starting. https://build.hpdd.intel.com/blue/organizations/jenkins/daos-stack%2Fdaos/detail/PR-15429/

Following builds all on tip of PR branch without any code changes between.

Build 4:

  • Functional Hardware Large DID NOT RUN
  • Functional Hardware Large MD-on-SSD DID NOT RUN
  • Functional Hardware Medium failed with a single known issue DAOS-16768
  • Functional Hardware Medium MD-on-SSD DID NOT RUN
  • Functional Hardware Verbs Provider DID NOT RUN

Build 5:

  • Functional Hardware Large DID NOT RUN
  • Functional Hardware Large MD-on-SSD PASSED
  • Functional Hardware Medium MD-on-SSD failed because of faulty hardware in cluster on wolf-217
  • Functional Hardware Verbs Provider DID NOT RUN

Build 7:

  • Functional Hardware Large ???
  • Functional Hardware Medium MD-on-SSD ???
  • Functional Hardware Verbs Provider ???

Summary:

  • Functional Hardware Large ???
  • Functional Hardware Large MD-on-SSD PASSED
  • Functional Hardware Medium ACCEPTABLE
  • Functional Hardware Medium MD-on-SSD ???
  • Functional Hardware Verbs Provider ???

I don't think there is huge value in waiting for the verbs provider test results but maybe the other two from build 7?

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15429/7/display/redirect

@gnailzenh gnailzenh merged commit cb8b19a into master Nov 4, 2024
53 of 61 checks passed
@gnailzenh gnailzenh deleted the niu/vos_on_blob_p2/p2_landing branch November 4, 2024 13:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

8 participants