-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-13559 vos: MD-on-SSD phase2 landing #15429
Conversation
- New umem macros are exported to do the allocation within memory bucket. umem internally now calls the modified backend allocator routines with memory bucket id passed as argument. - umem_get_mb_evictable() and dav_get_zone_evictable() are added to support allocator returning preferred zone to be used as evictable memory bucket for current allocations. Right now these routines always return zero. - The dav heap runtime is cleaned up to make provision for memory bucket implementation. Signed-off-by: Sherin T George <[email protected]>
Required-githooks: true
Four sets of umem cache APIs will be exported for md-on-ssd phase II: 1. Cache initialization & finalization - umem_cache_alloc() - umem_cache_free() 2. Cache map, load and pin - umem_cache_map(); - umem_cache_load(); - umem_cache_pin(); - umem_cache_unpin(); 3. Offset and memory address converting - umem_cache_off2ptr(); - umem_cache_ptr2off(); 4. Misc - umem_cache_commit(); - umem_cache_reserve(); Required-githooks: true Signed-off-by: Niu Yawei <[email protected]>
The phase-2 DAV allocator is placed under the subdirectory src/common/dav_v2. This allocator is built as a standalone shared library and linked to the libdaos_common_pmem library. The umem will now support one more mode DAOS_MD_BMEM_V2. Setting this mode in umem instance will result in using phase-2 DAV allocator interfaces. Signed-off-by: Sherin T George <[email protected]>
Required-githooks: true
…#13032) Use meta blob size if set when creating tgts on rank Signed-off-by: Tom Nabarro <[email protected]>
Fixed a race involving dav_reserve which violated the rule of “checkpointing must be done after WAL is committed”. Additionally removed the atomic copy functionality with UMEM_COMMIT_DEFER flag. Signed-off-by: Sherin T George <[email protected]>
Required-githooks: true
Required-githooks: true
Required-githooks: true
- Added strict tx_id checking in touch_page(). - Fixed two merge glitches in pool_child_recreate() and DAV2 open. Signed-off-by: Niu Yawei <[email protected]>
- Remove the redundant check in umem_cache_map(), that could mistakenly fail the call when mapping an evict-able page. - Fix need_reserve() to deal with the case when "max_ne_pgs - ne_pgs < UMEM_CACHE_RSRVD_PAGES". - Add callback being called when page loaded, allocator could build runtime & perform valgrind chores through this callback. Signed-off-by: Niu Yawei <[email protected]>
Required-githooks: true
Required-githooks: true
- The phase-2 allocator will now support evictable and non-evictable memory buckets. The new allocator can be enabled using DAOS_MD_ON_SSD_MODE=3. - Unit tests added to test the functionality. Signed-off-by: Sherin T George <[email protected]>
In md-on-ssd phase 2, the scm_sz (VOS file size) could be smaller than the meta_sz (meta blob size), then we need to store an extra scm_sz in SMD, so that on engine start, this scm_sz could be retrieved from SMD for VOS file re-creation. To make the SMD compatible with pmem & md-on-ssd phase 1, a new table named "meta_pool_ex" is introduced for storing scm_sz. Signed-off-by: Niu Yawei <[email protected]>
Required-githooks: true
Required-githooks: true Signed-off-by: Tom Nabarro <[email protected]>
Use the user specified backend type when possible, if user specify BMEM V1 backend and try to create a pool with "meta_size > scm_size", turn to use BMEM V2 instead. Store the per-pool backend type in meta blob header for pool open. Signed-off-by: Niu Yawei <[email protected]>
Show MD-on-SSD specific output on pool create and add new syntax to specify ratio between SSD capacity reserved for MD in new DAOS pool and the (static) size of memory reserved for MD in the form of VOS index files (previously held on SCM but now in tmpfs on ramdisk). Memory-file size is now printed when creating a pool in MD-on--SSD mode. The new --{meta,data}-size params can be specified in decimal or binary units e.g. GB or GiB and refer to per-rank allocations. These manual size parameters are only for advanced use cases and in most situations the --size (X%|XTB|XTiB) syntax is recommended when creating a pool. --meta-size param is bytes to use for metadata on SSD and --data-size is for data on SSD (similar to --nvme-size). The new --mem-ratio param is specified as a percentage with up to two decimal places precision. This defines the proportion of the metadata capacity reserved on SSD (i.e. --meta-size) that will be used when allocating the VOS-index (one blob and one memory file per target). Enable MD-on-SSD phase2 pool creation requires envar DAOS_MD_ON_SSD_MODE=3 to be set in server config file. Signed-off-by: Tom Nabarro <[email protected]>
- The 80% rule for NE buckets will not be applied if the scm_sz is almost equal to meta_sz. - Corrected the check for toggling between V1 and V2 store type when scm_sz passed is zero. - Added assert to catch incorrect computation of chunk_id if zone counts are not set during boot correctly. Signed-off-by: Sherin T George <[email protected]>
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15429/2/display/redirect |
@brianjmurrell do the build related changes look okay? |
debian/changelog
Outdated
@@ -135,7 +142,7 @@ daos (2.5.100-12) unstable; urgency=medium | |||
|
|||
-- Jerome Soumagne <[email protected]> Wed, 15 Nov 2023 10:30:00 -0600 | |||
|
|||
daos (2.5.100-10) unstable; urgency=medium | |||
daos (2.5.100-11) unstable; urgency=medium |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be reverted
daos (2.5.100-11) unstable; urgency=medium | |
daos (2.5.100-10) unstable; urgency=medium |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
yes the suppression 's file seems to be the same on both branches |
Test-tag: pr daily_regression hw,medium,ListVerboseTest Skip-func-hw-test-medium-md-on-ssd: false Skip-func-hw-test-large-md-on-ssd: false Allow-unstable-test: true Required-githooks: true Signed-off-by: Tom Nabarro <[email protected]>
@NiuYawei @gnailzenh I've pushed again with the following fixes because the previous CI run failed to run the hardware stages because of Jenkins issues.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ftest changes look good.
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15429/3/display/redirect |
Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15429/3/display/redirect |
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15429/3/display/redirect |
Test stage Functional Hardware Large MD on SSD completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15429/3/display/redirect |
Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15429/3/display/redirect |
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15429/4/display/redirect |
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15429/4/display/redirect |
Test stage Functional Hardware Large MD on SSD completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15429/4/display/redirect |
Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15429/4/display/redirect |
Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15429/4/execution/node/584/log |
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15429/5/display/redirect |
Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15429/5/display/redirect |
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15429/5/execution/node/1064/log |
@NiuYawei @gnailzenh @phender Following builds all on tip of PR branch without any code changes between. Build 4:
Build 5:
Build 7:
Summary:
I don't think there is huge value in waiting for the verbs provider test results but maybe the other two from build 7? |
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15429/7/display/redirect |
Landing MD-on-SSD phase2 branch to master.
Signed-off-by: Tom Nabarro [email protected]
Signed-off-by: Sherin T George [email protected]
Signed-off-by: Niu Yawei [email protected]
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: