DAOS-13703 umem: umem cache APIs for phase II #13138

NiuYawei · 2023-10-07T02:51:01Z

Four sets of umem cache APIs will be exported for md-on-ssd phase II:

Cache initialization & finalization
- umem_cache_alloc()
- umem_cache_free()
Cache map, load and pin
- umem_cache_map();
- umem_cache_load();
- umem_cache_pin();
- umem_cache_unpin();
Offset and memory address converting
- umem_cache_off2ptr();
- umem_cache_ptr2off();
Misc
- umem_cache_commit();

Test-nvme: auto_md_on_ssd

Required-githooks: true

Before requesting gatekeeper:

Two review approvals and any prior change requests have been resolved.
Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
Features: (or Test-tag*) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.
Commit messages follows the guidelines outlined here.
Any tests skipped by the ticket being addressed have been run and passed in the PR.

Gatekeeper:

github-actions · 2023-10-07T02:51:20Z

Bug-tracker data:
Ticket title is 'umem cache and memory bucket allocator'
Status is 'In Progress'
Labels: 'md_on_ssd2'
Errors are Unknown component
https://daosio.atlassian.net/browse/DAOS-13703

daosbuild1

LGTM. No errors found by checkpatch.

daosbuild1

LGTM. No errors found by checkpatch.

daosbuild1 · 2023-10-08T04:39:44Z

Test stage Functional Hardware Medium Verbs Provider completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13138/2/testReport/

daosbuild1

LGTM. No errors found by checkpatch.

Four sets of umem cache APIs will be exported for md-on-ssd phase II: 1. Cache initialization & finalization - umem_cache_alloc() - umem_cache_free() 2. Cache map, load and pin - umem_cache_map(); - umem_cache_load(); - umem_cache_pin(); - umem_cache_unpin(); 3. Offset and memory address converting - umem_cache_off2ptr(); - umem_cache_ptr2off(); 4. Misc - umem_cache_commit(); Test-nvme: auto_md_on_ssd Required-githooks: true Signed-off-by: Niu Yawei <[email protected]>

daosbuild1

LGTM. No errors found by checkpatch.

sherintg

The design and code looks well thought out. I have few comments/requirements from allocator perspective. Can you pls address the same?

sherintg · 2023-10-09T11:12:39Z

src/common/mem.c

-	uint64_t           num_pages;
-	int                rc = 0;
-	int                idx;
+	uint32_t idx = offset >> cache->ca_page_shift;


The allocator will use the first 4k bytes for heap header and this will not be part of the memory bucket. The actual buckets starts at offset 4096. To handle this can we add one more field cache->ca_base_offset which indicates the offset where zone0 starts and change the above id calculation as (offset - cache->ca_base_offset) >> cache->ca_page_shift. Similar change is required at other places where the offset is used for translation.

You mean the this 4k header is part of bucket 0 but would never be accessed by VOS (The offset returned to VOS starts from 4k)? Then I don't see why umem cache needs this tweak for page ID calculation, I think it just need to ensure never load/flush the first 4k of page 0, right?

If bucket 0 starts from 4k (the 4k header isn't part of bucket 0), then allocator should just use the start of bucket 0 as exported LBA 0, did I miss anything?

src/common/mem.c

Address review comments from Sherin: 1. Handle the allocator header in umem cache, the offset of first bucket will be passed in to umem by umem_cache_alloc(). 2. Support single evict-able page for umem_cache_map(). 3. Introduce umem_cache_reserve() for DAV, it needs be called before starting a non-nested transaction to deal with the potential non-evictable zone grow. Test-nvme: auto_md_on_ssd Required-githooks: true Signed-off-by: Niu Yawei <[email protected]>

daosbuild1

LGTM. No errors found by checkpatch.

Bump the number of reserved pages to 4 as per Sherin's request. Test-nvme: auto_md_on_ssd Required-githooks: true Signed-off-by: Niu Yawei <[email protected]>

Fix description of umem_cache_reserve(). Test-nvme: auto_md_on_ssd Required-githooks: true Signed-off-by: Niu Yawei <[email protected]>

daosbuild1

LGTM. No errors found by checkpatch.

* DAOS-13701: Memory bucket allocator API definition (#13152) - New umem macros are exported to do the allocation within memory bucket. umem internally now calls the modified backend allocator routines with memory bucket id passed as argument. - umem_get_mb_evictable() and dav_get_zone_evictable() are added to support allocator returning preferred zone to be used as evictable memory bucket for current allocations. Right now these routines always return zero. - The dav heap runtime is cleaned up to make provision for memory bucket implementation. * DAOS-13703 umem: umem cache APIs for phase II (#13138) Four sets of umem cache APIs will be exported for md-on-ssd phase II: 1. Cache initialization & finalization - umem_cache_alloc() - umem_cache_free() 2. Cache map, load and pin - umem_cache_map(); - umem_cache_load(); - umem_cache_pin(); - umem_cache_unpin(); 3. Offset and memory address converting - umem_cache_off2ptr(); - umem_cache_ptr2off(); 4. Misc - umem_cache_commit(); - umem_cache_reserve(); * DAOS-14491: Retain support for phase-1 DAV heap (#13158) The phase-2 DAV allocator is placed under the subdirectory src/common/dav_v2. This allocator is built as a standalone shared library and linked to the libdaos_common_pmem library. The umem will now support one more mode DAOS_MD_BMEM_V2. Setting this mode in umem instance will result in using phase-2 DAV allocator interfaces. * DAOS-15681 bio: store scm_sz in SMD (#14330) In md-on-ssd phase 2, the scm_sz (VOS file size) could be smaller than the meta_sz (meta blob size), then we need to store an extra scm_sz in SMD, so that on engine start, this scm_sz could be retrieved from SMD for VOS file re-creation. To make the SMD compatible with pmem & md-on-ssd phase 1, a new table named "meta_pool_ex" is introduced for storing scm_sz. * DAOS-14422 control: Update pool create UX for MD-on-SSD phase2 (#14740) Show MD-on-SSD specific output on pool create and add new syntax to specify ratio between SSD capacity reserved for MD in new DAOS pool and the (static) size of memory reserved for MD in the form of VOS index files (previously held on SCM but now in tmpfs on ramdisk). Memory-file size is now printed when creating a pool in MD-on--SSD mode. The new --{meta,data}-size params can be specified in decimal or binary units e.g. GB or GiB and refer to per-rank allocations. These manual size parameters are only for advanced use cases and in most situations the --size (X%|XTB|XTiB) syntax is recommended when creating a pool. --meta-size param is bytes to use for metadata on SSD and --data-size is for data on SSD (similar to --nvme-size). The new --mem-ratio param is specified as a percentage with up to two decimal places precision. This defines the proportion of the metadata capacity reserved on SSD (i.e. --meta-size) that will be used when allocating the VOS-index (one blob and one memory file per target). Enable MD-on-SSD phase2 pool creation requires envar DAOS_MD_ON_SSD_MODE=3 to be set in server config file. * DAOS-14317 vos: initial changes for the phase2 object pre-load (#15001) - Introduced new durable format 'vos_obj_p2_df' for the md-on-ssd phase2 object, at most 4 evict-able bucket IDs could be stored. - Changed vos_obj_hold() & vos_obj_release() to pin or unpin object respectively. - Changed the private data of VOS dkey/akey/value trees from 'vos_pool' to 'vos_object', the private data will be used for allocating/reserving from the evict-able bucket. - Move the vos_obj_hold() call from vos_update_end() to vos_update_begin() for the phase2 pool, reserve value from the object evict-able bucket. * DAOS-14316 vos: object preload for GC (#15059) - Use the reserved vos_gc_item.it_args to store 2 bucket IDs for GC_OBJ, GC_DKEY and GC_AKEY, so that GC drain will be able to tell the what buckets need be pinned by looking up bucket numbers stored in vos_obj_df. - Once GC drain needs to pin a different bucket, it will have to commit current tx; unpin current bucket; pin required bucket; start new tx; - Forge a dummy object as the private data for the btree opened by GC, so that the 'ti_destroy' hack could be removed. - Store evict-able bucket ID persistently for newly created object, this was missed in prior PR. * DAOS-14315 vos: Pin objects for DTX commit & CPD RPC (#15118) Introduced two new VOS APIs vos_pin_objects() & vos_unpin_objects() for pin or unpin objects. Changed DTX commit/abort & CPD RPC handler code to ensure objects pinned before starting local transaction. - Bug fix in vos_pmemobj_create(), the actual scm_size should be passed to bio_mc_create(). - Use vos_obj_acquire() instead of vos_obj_hold() in vos_update_begin() to avoid the complication of object ilog adding in ts_set. We could simplify it in future cleanup PRs. - Handle concurrent object bucket alloting & loading. * DAOS-16160 control: Update pool create --size % opt for MD-on-SSD p2 (#14957) Update calculation of usable pool META and DATA component sizes for MD-on-SSD phase-2 mode; when meta-blob-size > vos-file-size. - Use mem-ratio when making NVMe size adjustments to calculate usable pool capacity from raw stats. - Use mem-ratio when auto-sizing to determine META component from percentage of usable rank-RAM-disk capacity. - Apportion cluster count reductions to SSDs based on number of assigned targets to take account of target striping across a tier. - Fix pool query ftest. - Improve test coverage for meta and rdb size calculations. * DAOS-16763 common: Tunable to control max NEMB (#15422) A new tunable, DAOS_MD_ON_SSD_NEMB_PCT is introuced, to define the percentage of memory cache that non-evictable memory buckets can expand to. This tunable will be read during pool creation and persisted, ensuring that each time the pool is reopened, it retains the value set during its creation. Signed-off-by: Niu Yawei <[email protected]> Signed-off-by: Tom Nabarro <[email protected]> Signed-off-by: Sherin T George <[email protected]> Co-authored-by: Tom Nabarro <[email protected]> Co-authored-by: sherintg <[email protected]>

NiuYawei requested a review from a team as a code owner October 7, 2023 02:51

NiuYawei mentioned this pull request Oct 7, 2023

DAOS-13703 umem: define the umem cache APIs #13030

Closed

18 tasks

daosbuild1 reviewed Oct 7, 2023

View reviewed changes

NiuYawei force-pushed the niu/vos_on_blob_p2/DAOS-13703 branch from c9fd2ca to f086077 Compare October 7, 2023 07:17

daosbuild1 reviewed Oct 7, 2023

View reviewed changes

NiuYawei force-pushed the niu/vos_on_blob_p2/DAOS-13703 branch from f086077 to 7e80476 Compare October 8, 2023 09:20

daosbuild1 reviewed Oct 8, 2023

View reviewed changes

NiuYawei requested review from sherintg, jolivier23, wangshilong and gnailzenh October 8, 2023 09:22

NiuYawei force-pushed the niu/vos_on_blob_p2/DAOS-13703 branch from 7e80476 to e85d05f Compare October 9, 2023 03:00

daosbuild1 reviewed Oct 9, 2023

View reviewed changes

sherintg requested changes Oct 9, 2023

View reviewed changes

NiuYawei requested a review from sherintg October 11, 2023 04:39

daosbuild1 reviewed Oct 11, 2023

View reviewed changes

NiuYawei added 2 commits October 11, 2023 02:24

DAOS-13703 umem: bump the number of reserved pages to 4

1e0dcab

Bump the number of reserved pages to 4 as per Sherin's request. Test-nvme: auto_md_on_ssd Required-githooks: true Signed-off-by: Niu Yawei <[email protected]>

DAOS-13703 umem: Fix comment

6884af9

Fix description of umem_cache_reserve(). Test-nvme: auto_md_on_ssd Required-githooks: true Signed-off-by: Niu Yawei <[email protected]>

daosbuild1 reviewed Oct 11, 2023

View reviewed changes

jolivier23 approved these changes Oct 11, 2023

View reviewed changes

sherintg approved these changes Oct 13, 2023

View reviewed changes

NiuYawei merged commit 4294ce7 into feature/vos_on_blob_p2 Oct 13, 2023

NiuYawei deleted the niu/vos_on_blob_p2/DAOS-13703 branch October 13, 2023 06:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DAOS-13703 umem: umem cache APIs for phase II #13138

DAOS-13703 umem: umem cache APIs for phase II #13138

NiuYawei commented Oct 7, 2023 •

edited

Loading

github-actions bot commented Oct 7, 2023

daosbuild1 left a comment

daosbuild1 left a comment

daosbuild1 commented Oct 8, 2023

daosbuild1 left a comment

daosbuild1 left a comment

sherintg left a comment

sherintg Oct 9, 2023

NiuYawei Oct 9, 2023

daosbuild1 left a comment

daosbuild1 left a comment

DAOS-13703 umem: umem cache APIs for phase II #13138

DAOS-13703 umem: umem cache APIs for phase II #13138

Conversation

NiuYawei commented Oct 7, 2023 • edited Loading

Before requesting gatekeeper:

Gatekeeper:

github-actions bot commented Oct 7, 2023

daosbuild1 left a comment

Choose a reason for hiding this comment

daosbuild1 left a comment

Choose a reason for hiding this comment

daosbuild1 commented Oct 8, 2023

daosbuild1 left a comment

Choose a reason for hiding this comment

daosbuild1 left a comment

Choose a reason for hiding this comment

sherintg left a comment

Choose a reason for hiding this comment

sherintg Oct 9, 2023

Choose a reason for hiding this comment

NiuYawei Oct 9, 2023

Choose a reason for hiding this comment

daosbuild1 left a comment

Choose a reason for hiding this comment

daosbuild1 left a comment

Choose a reason for hiding this comment

NiuYawei commented Oct 7, 2023 •

edited

Loading