Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-13559 vos: MD-on-SSD phase2 landing #15429

Merged
merged 67 commits into from
Nov 4, 2024
Merged
Show file tree
Hide file tree
Changes from 65 commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
4951549
DAOS-13701: Memory bucket allocator API definition (#13152)
sherintg Oct 11, 2023
33d05b3
Merge branch 'master' into feature/vos_on_blob_p2
NiuYawei Oct 13, 2023
4294ce7
DAOS-13703 umem: umem cache APIs for phase II (#13138)
NiuYawei Oct 13, 2023
04acb8c
Merge branch 'master' into feature/vos_on_blob_p2
NiuYawei Oct 19, 2023
d99eae7
DAOS-14491: Retain support for phase-1 DAV heap (#13158)
sherintg Oct 25, 2023
db21e5f
Merge branch 'master' into feature/vos_on_blob_p2
NiuYawei Oct 25, 2023
20404b2
DAOS-14223 mgmt: Use meta blob size if set when creating tgts on rank…
tanabarr Oct 25, 2023
83d8652
DAOS-14510 umem: dav_reserve now commits WAL immediately (#13234)
sherintg Oct 26, 2023
bf5bab2
Merge branch 'master' into feature/vos_on_blob_p2
NiuYawei Nov 2, 2023
c33daaa
Merge branch 'master' into feature/vos_on_blob_p2
NiuYawei Nov 15, 2023
b670e30
Merge branch 'master' into feature/vos_on_blob_p2
NiuYawei Nov 30, 2023
0ca62d9
Merge branch 'master' into feature/vos_on_blob_p2
NiuYawei Dec 8, 2023
7b7a04c
Merge branch 'master' into feature/vos_on_blob_p2
NiuYawei Dec 13, 2023
3fd7290
DAOS-14317 umem: add tx_id checking in touch_page() (#13511)
NiuYawei Dec 20, 2023
33865c8
Merge branch 'master' into feature/vos_on_blob_p2
NiuYawei Jan 17, 2024
7481bba
Merge branch 'master' into feature/vos_on_blob_p2
NiuYawei Feb 20, 2024
c64d008
Merge branch 'master' into feature/vos_on_blob_p2
NiuYawei Mar 21, 2024
28b3fb1
DAOS-14317 umem: few fixes for umem cache (#14090)
NiuYawei Apr 8, 2024
c770cde
Merge branch 'master' into feature/vos_on_blob_p2
NiuYawei Apr 17, 2024
cae6592
Merge branch 'master' into feature/vos_on_blob_p2
NiuYawei May 7, 2024
4753dde
Merge branch 'master' into feature/vos_on_blob_p2
NiuYawei May 23, 2024
29826e2
DAOS-14363 umem: Phase-2 allocator with MB support (#14151)
sherintg May 28, 2024
9f3b928
DAOS-15681 bio: store scm_sz in SMD (#14330)
NiuYawei Jun 4, 2024
38024fb
Merge branch 'master' into feature/vos_on_blob_p2
NiuYawei Jul 10, 2024
f9a8ae8
Merge master branch into feature/vos_on_blob_p2
tanabarr Aug 6, 2024
35f7652
Merge remote-tracking branch 'origin/master' into tanabarr/feature/vo…
tanabarr Aug 6, 2024
0ba3bcf
DAOS-16278 vos: per pool backend type (#14946)
NiuYawei Aug 20, 2024
3be7857
DAOS-14422 control: Update pool create UX for MD-on-SSD phase2 (#14740)
tanabarr Aug 20, 2024
b955fdf
Merge remote-tracking branch 'origin/master' into feature/vos_on_blob_p2
tanabarr Aug 20, 2024
4fb7d87
DAOS-14416 umem: Handle scm_sz ~ meta_sz with the v2 allocator (#14977)
sherintg Aug 22, 2024
f6c4816
Merge remote-tracking branch 'origin/master' into feature/vos_on_blob_p2
NiuYawei Aug 22, 2024
49141e5
Merge remote-tracking branch 'origin/master' into feature/vos_on_blob_p2
NiuYawei Aug 30, 2024
741fb9f
DAOS-14317 vos: initial changes for the phase2 object pre-load (#15001)
NiuYawei Sep 3, 2024
16df8c9
DAOS-14316 vos: object preload for GC (#15059)
NiuYawei Sep 6, 2024
aa2cba4
DAOS-14422 control: Update pool query UX for MD-on-SSD phase2 (#14844)
tanabarr Sep 8, 2024
d0a295c
Merge remote-tracking branch 'origin/master' into feature/vos_on_blob_p2
tanabarr Sep 8, 2024
ec58263
DAOS-14313 vos: Various changes for VOS local tx (#15082)
NiuYawei Sep 9, 2024
9f49593
Correct master branch merge conflict by removing
tanabarr Sep 9, 2024
3864300
DAOS-16489 common: bmem_v2 API to fetch MB stats (#15061)
sherintg Sep 12, 2024
be4ad88
DAOS-14315 vos: Pin objects for DTX commit & CPD RPC (#15118)
NiuYawei Sep 14, 2024
b7ec6f7
DAOS-16562 vos: umem cache metrics (#15155)
NiuYawei Sep 23, 2024
af71f88
DAOS-16591 mgmt, vos, common: Align scm/meta size (#15146)
sherintg Sep 24, 2024
b7c83cf
DAOS-16569 vos: fix few defects (#15175)
NiuYawei Sep 25, 2024
b9ee9e3
DAOS-16633 common: SEGV in heap_mbrt_incrmb_usage() (#15197)
sherintg Sep 26, 2024
b19a915
DAOS-16569 test: phase2 VOS unit tests (#15195)
NiuYawei Sep 29, 2024
331c077
Merge remote-tracking branch 'origin/master' into feature/vos_on_blob_p2
tanabarr Oct 2, 2024
6a54028
DAOS-16640 vos: NE usage on space pressure checking (#15219)
NiuYawei Oct 8, 2024
e266b38
DAOS-16668 bio: avoid nonexist error message on pool destroy (#15273)
NiuYawei Oct 10, 2024
63a6c24
DAOS-16633 common: SEGV in heap_mbrt_incrmb_usage() (#15271)
sherintg Oct 10, 2024
ebf5dc1
DAOS-16668 umem: race on page evicting (#15268)
NiuYawei Oct 11, 2024
f09755a
Merge remote-tracking branch 'origin/master' into feature/vos_on_blob_p2
tanabarr Oct 14, 2024
597ff01
Fix typo in resolving merge conflicts
tanabarr Oct 14, 2024
1dc5250
DAOS-16160 control: Update pool create --size % opt for MD-on-SSD p2 …
tanabarr Oct 16, 2024
bb1945b
DAOS-16725 common: Fast failpath for allocation in EMB (#15332)
sherintg Oct 21, 2024
c6660dc
DAOS-16160 control: Fix count calcs for MD-on-SSD dev tgt IDs (#15333)
tanabarr Oct 21, 2024
05e9175
DAOS-16668 vos: remove unnecessary object pin (#15363)
NiuYawei Oct 23, 2024
7b18e61
DAOS-16668 vos: few defects on phase2 dtx_commit() (#15353)
NiuYawei Oct 23, 2024
61404e3
DAOS-16692 vos: optimize GC for phase2 pool (#15331)
NiuYawei Oct 23, 2024
b8a115e
DAOS-16740 common: Change in EMB selection for new objects. (#15372)
sherintg Oct 24, 2024
726d23a
Merge branch 'master' into feature/vos_on_blob_p2
tanabarr Oct 24, 2024
ea00873
Merge remote-tracking branch 'origin/master' into feature/vos_on_blob_p2
tanabarr Oct 25, 2024
41f8a32
DAOS-16712 test: Fix list_verbose mem_file_bytes comparison (#15344)
tanabarr Oct 25, 2024
5187fa9
DAOS-16712 test: Add mdonssd cond check to list_verbose (#15405)
tanabarr Oct 31, 2024
54c018a
DAOS-16763 common: Tunable to control max NEMB (#15422)
sherintg Oct 31, 2024
4f2f322
DAOS-13559 rpm: bump changelog & release version
NiuYawei Nov 1, 2024
7f7a979
Merge remote-tracking branch 'origin/master' into niu/vos_on_blob_p2/…
tanabarr Oct 30, 2024
8d6b451
fix changelog merge issue
tanabarr Nov 1, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -876,7 +876,7 @@ pipeline {
}
steps {
job_step_update(
unitTest(timeout_time: 60,
unitTest(timeout_time: 180,
unstash_opt: true,
ignore_failure: true,
inst_repos: prRepos(),
Expand Down
9 changes: 8 additions & 1 deletion debian/changelog
Copy link
Contributor

@brianjmurrell brianjmurrell Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add an entry to this file to recognize the update to debian/daos-server.install.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"file file"? could you please elaborate

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the following entry sufficient? (see the "Add DAV v2 lib" below).

Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
daos (2.7.100-10) unstable; urgency=medium

[ Sherin T George ]
* Add DAV v2 lib

-- Sherin T George <[email protected]> Fri, 1 Nov 2024 11:54:00 +0530

daos (2.7.100-9) unstable; urgency=medium
[ Brian J. Murrell ]
* Remove Build-Depends: for UCX as they were obsoleted as of e01970d
Expand Down Expand Up @@ -135,7 +142,7 @@ daos (2.5.100-12) unstable; urgency=medium

-- Jerome Soumagne <[email protected]> Wed, 15 Nov 2023 10:30:00 -0600

daos (2.5.100-10) unstable; urgency=medium
daos (2.5.100-11) unstable; urgency=medium
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be reverted

Suggested change
daos (2.5.100-11) unstable; urgency=medium
daos (2.5.100-10) unstable; urgency=medium

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

[ Phillip Henderson ]
* Move verify_perms.py location

Expand Down
1 change: 1 addition & 0 deletions debian/daos-server.install
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ usr/lib64/daos_srv/libbio.so
usr/lib64/daos_srv/libplacement.so
usr/lib64/daos_srv/libpipeline.so
usr/lib64/libdaos_common_pmem.so
usr/lib64/libdav_v2.so
usr/share/daos/control/setup_spdk.sh
usr/lib/systemd/system/daos_server.service
usr/lib/sysctl.d/10-daos_server.conf
Expand Down
190 changes: 190 additions & 0 deletions docs/admin/pool_operations.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ Its subcommands can be grouped into the following areas:
* An upgrade command to upgrade a pool's format version
after a DAOS software upgrade.


### Creating a Pool

A DAOS pool can be created through the `dmg pool create` command.
Expand Down Expand Up @@ -170,6 +171,195 @@ on pool size, but also on number of targets, target size, object class,
storage redundancy factor, etc.


#### Creating a pool in MD-on-SSD mode

In MD-on-SSD mode, a pool is made up of a single component in memory (RAM-disk
associated with each engine) and three components on storage (NVMe SSD). The
components in storage are related to "roles" WAL, META and DATA and roles are
assigned to hardware devices in the
[server configuration file](https://docs.daos.io/v2.6/admin/deployment/#server-configuration-file).

In MD-on-SSD mode pools are by default created with equal allocations for
metadata-in-memory and metadata-on-SSD but it is possible to change this. To
create a pool with a metadata-on-SSD allocation size that is double what is
allocated in memory, set `dmg pool create --mem-ratio` option to `50%`. This
implies that the ratio of metadata on memory and on storage should be 0.5 and
therefore metadata-on-SSD allocation is twice that of metadata-in-memory.

A MD-on-SSD pool created with a `--mem-ratio` between 0 and 100 percent is
said to be operating in "phase-2" mode.

#### MD-on-SSD phase-2 pool create examples

These examples cover the recommended way to create a pool in MD-on-SSD phase-2
mode using the `--size` percentage option.

The following example is run on a single host with dual engines where bdev
roles META and DATA are not shared. Two pools are created with VOS index file
size equal to half the meta-blob size (`--mem-ratio 50%`). Both pools use
roughly half the original capacity available (first using 50% and the second
100% of the remainder).

Rough calculations: `dmg storage scan` shows that for each rank, one 800GB SSD
is assigned for each tier (first: WAL+META, second: DATA). `df -h /mnt/daos*`
reports usable ramdisk capacity for each rank is 66GiB.
- Expected Data storage would then be 400GB for a 50% capacity first pool and
100% capacity second pool per-rank.
- Expected Meta storage at 50% mem-ratio would be `66GiB*2 = 132GiB == 141GB`
giving ~70GB for 50% first and 100% second pools.
- Expected Memory file size (aggregated) is `66GiB/2 = 35GB` for 50% first and
100% second pools.

```bash
$ dmg pool create bob --size 50% --mem-ratio 50%

Pool created with 14.86%,85.14% storage tier ratio
--------------------------------------------------
UUID : 47060d94-c689-4981-8c89-011beb063f8f
Service Leader : 0
Service Ranks : [0-1]
Storage Ranks : [0-1]
Total Size : 940 GB
Metadata Storage : 140 GB (70 GB / rank)
Data Storage : 800 GB (400 GB / rank)
Memory File Size : 70 GB (35 GB / rank)

$ dmg pool create bob2 --size 100% --mem-ratio 50%

Pool created with 14.47%,85.53% storage tier ratio
--------------------------------------------------
UUID : bdbef091-f0f8-411d-8995-f91c4efc690f
Service Leader : 1
Service Ranks : [0-1]
Storage Ranks : [0-1]
Total Size : 935 GB
Metadata Storage : 135 GB (68 GB / rank)
Data Storage : 800 GB (400 GB / rank)
Memory File Size : 68 GB (34 GB / rank)

$ dmg pool query bob

Pool 47060d94-c689-4981-8c89-011beb063f8f, ntarget=32, disabled=0, leader=0, version=1, state=Ready
Pool health info:
- Rebuild idle, 0 objs, 0 recs
Pool space info:
- Target count:32
- Total memory-file size: 70 GB
- Metadata storage:
Total size: 140 GB
Free: 131 GB, min:4.1 GB, max:4.1 GB, mean:4.1 GB
- Data storage:
Total size: 800 GB
Free: 799 GB, min:25 GB, max:25 GB, mean:25 GB

$ dmg pool query bob2

Pool bdbef091-f0f8-411d-8995-f91c4efc690f, ntarget=32, disabled=0, leader=1, version=1, state=Ready
Pool health info:
- Rebuild idle, 0 objs, 0 recs
Pool space info:
- Target count:32
- Total memory-file size: 68 GB
- Metadata storage:
Total size: 135 GB
Free: 127 GB, min:4.0 GB, max:4.0 GB, mean:4.0 GB
- Data storage:
Total size: 800 GB
Free: 799 GB, min:25 GB, max:25 GB, mean:25 GB
```

The following examples are with a single host with dual engines where bdev
roles WAL, META and DATA are shared.

Single pool with VOS index file size equal to the meta-blob size (`--mem-ratio
100%`).

```bash
$ dmg pool create bob --size 100% --mem-ratio 100%

Pool created with 5.93%,94.07% storage tier ratio
-------------------------------------------------
UUID : bad54f1d-8976-428b-a5dd-243372dfa65c
Service Leader : 1
Service Ranks : [0-1]
Storage Ranks : [0-1]
Total Size : 2.4 TB
Metadata Storage : 140 GB (70 GB / rank)
Data Storage : 2.2 TB (1.1 TB / rank)
Memory File Size : 140 GB (70 GB / rank)

```

Rough calculations: 1.2TB of usable space is returned from storage scan and
because roles are shared required META (70GB) is reserved so only 1.1TB is
provided for data.

Logging shows:
```bash
DEBUG 2024/09/24 15:44:38.554431 pool.go:1139: added smd device c7da7391-9077-4eb6-9f4a-a3d656166236 (rank 1, ctrlr 0000:d8:00.0, roles "data,meta,wal") as usable: device state="NORMAL", smd-size 623 GB (623307128832), ctrlr-total-free 623 GB (623307128832)
DEBUG 2024/09/24 15:44:38.554516 pool.go:1139: added smd device 18c7bf45-7586-49ba-93c0-cbc08caed901 (rank 1, ctrlr 0000:d9:00.0, roles "data,meta,wal") as usable: device state="NORMAL", smd-size 554 GB (554050781184), ctrlr-total-free 1.2 TB (1177357910016)
DEBUG 2024/09/24 15:44:38.554603 pool.go:1246: based on minimum available ramdisk capacity of 70 GB and mem-ratio 1.00 with 70 GB of reserved metadata capacity, the maximum per-rank sizes for a pool are META=70 GB (69792169984 B) DATA=1.1 TB (1107565740032 B)
```

Now the same as above but with a single pool with VOS index file size equal to
a quarter of the meta-blob size (`--mem-ratio 25%`).

```bash
$ dmg pool create bob --size 100% --mem-ratio 25%

Pool created with 23.71%,76.29% storage tier ratio
--------------------------------------------------
UUID : 999ecf55-474e-4476-9f90-0b4c754d4619
Service Leader : 0
Service Ranks : [0-1]
Storage Ranks : [0-1]
Total Size : 2.4 TB
Metadata Storage : 558 GB (279 GB / rank)
Data Storage : 1.8 TB (898 GB / rank)
Memory File Size : 140 GB (70 GB / rank)

```

Rough calculations: 1.2TB of usable space is returned from storage scan and
because roles are shared required META (279GB) is reserved so only ~900GB is
provided for data.

Logging shows:
```bash
DEBUG 2024/09/24 16:16:00.172719 pool.go:1246: based on minimum available ramdisk capacity of 70 GB and mem-ratio 0.25 with 279 GB of reserved metadata capacity, the maximum per-rank sizes for a pool are META=279 GB (279168679936 B) DATA=898 GB (898189230080 B)
```

Now with 6 ranks and a single pool with VOS index file size equal to a half of
the meta-blob size (`--mem-ratio 50%`).

```bash
$ dmg pool create bob --size 100% --mem-ratio 50%

Pool created with 11.86%,88.14% storage tier ratio
--------------------------------------------------
UUID : 4fa38199-23a9-4b4d-aa9a-8b9838cad1d6
Service Leader : 1
Service Ranks : [0-2,4-5]
Storage Ranks : [0-5]
Total Size : 7.1 TB
Metadata Storage : 838 GB (140 GB / rank)
Data Storage : 6.2 TB (1.0 TB / rank)
Memory File Size : 419 GB (70 GB / rank)

```

Rough calculations: 1177 GB of usable space is returned from storage scan and
because roles are shared required META (140 GB) is reserved so only 1037 GB is
provided for data (per-rank).

Logging shows:
```bash
DEBUG 2024/09/24 16:40:41.570331 pool.go:1139: added smd device c921c7b9-5f5c-4332-a878-0ebb8191c160 (rank 1, ctrlr 0000:d8:00.0, roles "data,meta,wal") as usable: device state="NORMAL", smd-size 623 GB (623307128832), ctrlr-total-free 623 GB (623307128832)
DEBUG 2024/09/24 16:40:41.570447 pool.go:1139: added smd device a071c3cf-5de1-4911-8549-8c5e8f550554 (rank 1, ctrlr 0000:d9:00.0, roles "data,meta,wal") as usable: device state="NORMAL", smd-size 554 GB (554050781184), ctrlr-total-free 1.2 TB (1177357910016)
DEBUG 2024/09/24 16:40:41.570549 pool.go:1246: based on minimum available ramdisk capacity of 70 GB and mem-ratio 0.50 with 140 GB of reserved metadata capacity, the maximum per-rank sizes for a pool are META=140 GB (139584339968 B) DATA=1.0 TB (1037773570048 B)
```


### Listing Pools

To see a list of the pools in the DAOS system:
Expand Down
47 changes: 35 additions & 12 deletions src/bio/bio_context.c
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/**
* (C) Copyright 2018-2023 Intel Corporation.
* (C) Copyright 2018-2024 Intel Corporation.
*
* SPDX-License-Identifier: BSD-2-Clause-Patent
*/
Expand Down Expand Up @@ -457,7 +457,8 @@ int bio_mc_destroy(struct bio_xs_context *xs_ctxt, uuid_t pool_id, enum bio_mc_f

static int
bio_blob_create(uuid_t uuid, struct bio_xs_context *xs_ctxt, uint64_t blob_sz,
enum smd_dev_type st, enum bio_mc_flags flags, spdk_blob_id *blob_id)
enum smd_dev_type st, enum bio_mc_flags flags, spdk_blob_id *blob_id,
uint64_t scm_sz)
{
struct blob_msg_arg bma = { 0 };
struct blob_cp_arg *ba = &bma.bma_cp_arg;
Expand Down Expand Up @@ -541,9 +542,10 @@ bio_blob_create(uuid_t uuid, struct bio_xs_context *xs_ctxt, uint64_t blob_sz,
blob_sz);
else
rc = smd_pool_add_tgt(uuid, xs_ctxt->bxc_tgt_id, ba->bca_id, st,
blob_sz);
blob_sz, scm_sz);
} else {
rc = smd_pool_add_tgt(uuid, xs_ctxt->bxc_tgt_id, ba->bca_id, st, blob_sz);
rc = smd_pool_add_tgt(uuid, xs_ctxt->bxc_tgt_id, ba->bca_id, st, blob_sz,
0);
}

if (rc != 0) {
Expand Down Expand Up @@ -611,14 +613,14 @@ __bio_ioctxt_open(struct bio_io_context **pctxt, struct bio_xs_context *xs_ctxt,
/*
* Calculate a reasonable WAL size based on following assumptions:
* - Single target update IOPS can be up to 65k;
* - Each TX consumes 2 WAL blocks in average;
* - Each TX consumes 2 WAL blocks on average;
* - Checkpointing interval is 5 seconds, and the WAL should have at least
* half free space before next checkpoint;
*/
uint64_t
default_wal_sz(uint64_t meta_sz)
{
uint64_t wal_sz = (6ULL << 30); /* 6GB */
uint64_t wal_sz = (6ULL << 30); /* 6GiB */

/* The WAL size could be larger than meta size for tiny pool */
if ((meta_sz * 2) <= wal_sz)
Expand All @@ -627,8 +629,8 @@ default_wal_sz(uint64_t meta_sz)
return wal_sz;
}

int bio_mc_create(struct bio_xs_context *xs_ctxt, uuid_t pool_id, uint64_t meta_sz,
uint64_t wal_sz, uint64_t data_sz, enum bio_mc_flags flags)
int bio_mc_create(struct bio_xs_context *xs_ctxt, uuid_t pool_id, uint64_t scm_sz, uint64_t meta_sz,
uint64_t wal_sz, uint64_t data_sz, enum bio_mc_flags flags, uint8_t backend_type)
{
int rc = 0, rc1;
spdk_blob_id data_blobid = SPDK_BLOBID_INVALID;
Expand All @@ -637,12 +639,13 @@ int bio_mc_create(struct bio_xs_context *xs_ctxt, uuid_t pool_id, uint64_t meta_
struct bio_meta_context *mc = NULL;
struct meta_fmt_info *fi = NULL;
struct bio_xs_blobstore *bxb;
uint32_t meta_flags = 0;

D_ASSERT(xs_ctxt != NULL);
if (data_sz > 0 && bio_nvme_configured(SMD_DEV_TYPE_DATA)) {
D_ASSERT(!(flags & BIO_MC_FL_RDB));
rc = bio_blob_create(pool_id, xs_ctxt, data_sz, SMD_DEV_TYPE_DATA, flags,
&data_blobid);
&data_blobid, 0);
if (rc)
return rc;
}
Expand All @@ -656,9 +659,28 @@ int bio_mc_create(struct bio_xs_context *xs_ctxt, uuid_t pool_id, uint64_t meta_
meta_sz, default_cluster_sz());
rc = -DER_INVAL;
goto delete_data;
} else if (meta_sz < scm_sz) {
D_ERROR("Meta blob size("DF_U64") is less than scm size("DF_U64")\n",
meta_sz, scm_sz);
rc = -DER_INVAL;
goto delete_data;
} else if (scm_sz == meta_sz) {
scm_sz = 0;
}

/* scm_sz < meta_sz case */
if (scm_sz != 0) {
if (flags & BIO_MC_FL_RDB) {
D_ERROR("RDB doesn't allow scm_sz("DF_U64") != meta_sz("DF_U64")\n",
scm_sz, meta_sz);
rc = -DER_INVAL;
goto delete_data;
}
meta_flags |= META_HDR_FL_EVICTABLE;
}

rc = bio_blob_create(pool_id, xs_ctxt, meta_sz, SMD_DEV_TYPE_META, flags, &meta_blobid);
rc = bio_blob_create(pool_id, xs_ctxt, meta_sz, SMD_DEV_TYPE_META, flags, &meta_blobid,
scm_sz);
if (rc)
goto delete_data;

Expand All @@ -671,7 +693,7 @@ int bio_mc_create(struct bio_xs_context *xs_ctxt, uuid_t pool_id, uint64_t meta_
if (wal_sz == 0 || wal_sz < default_cluster_sz())
wal_sz = default_wal_sz(meta_sz);

rc = bio_blob_create(pool_id, xs_ctxt, wal_sz, SMD_DEV_TYPE_WAL, flags, &wal_blobid);
rc = bio_blob_create(pool_id, xs_ctxt, wal_sz, SMD_DEV_TYPE_WAL, flags, &wal_blobid, 0);
if (rc)
goto delete_meta;

Expand Down Expand Up @@ -717,8 +739,9 @@ int bio_mc_create(struct bio_xs_context *xs_ctxt, uuid_t pool_id, uint64_t meta_
fi->fi_wal_size = wal_sz;
fi->fi_data_size = data_sz;
fi->fi_vos_id = xs_ctxt->bxc_tgt_id;
fi->fi_backend_type = backend_type;

rc = meta_format(mc, fi, true);
rc = meta_format(mc, fi, meta_flags, true);
if (rc)
D_ERROR("Unable to format newly created blob for xs:%p pool:"DF_UUID"\n",
xs_ctxt, DP_UUID(pool_id));
Expand Down
9 changes: 6 additions & 3 deletions src/bio/bio_wal.c
Original file line number Diff line number Diff line change
Expand Up @@ -1861,13 +1861,15 @@ bio_wal_checkpoint(struct bio_meta_context *mc, uint64_t tx_id, uint64_t *purged

void
bio_meta_get_attr(struct bio_meta_context *mc, uint64_t *capacity, uint32_t *blk_sz,
uint32_t *hdr_blks)
uint32_t *hdr_blks, uint8_t *backend_type, bool *evictable)
{
/* The mc could be NULL when md on SSD not enabled & data blob not existing */
if (mc != NULL) {
*blk_sz = mc->mc_meta_hdr.mh_blk_bytes;
*capacity = mc->mc_meta_hdr.mh_tot_blks * (*blk_sz);
*hdr_blks = mc->mc_meta_hdr.mh_hdr_blks;
*backend_type = mc->mc_meta_hdr.mh_backend_type;
*evictable = mc->mc_meta_hdr.mh_flags & META_HDR_FL_EVICTABLE;
}
}

Expand Down Expand Up @@ -2022,7 +2024,7 @@ get_wal_gen(uuid_t pool_id, uint32_t tgt_id)
}

int
meta_format(struct bio_meta_context *mc, struct meta_fmt_info *fi, bool force)
meta_format(struct bio_meta_context *mc, struct meta_fmt_info *fi, uint32_t flags, bool force)
{
struct meta_header *meta_hdr = &mc->mc_meta_hdr;
struct wal_super_info *si = &mc->mc_wal_info;
Expand Down Expand Up @@ -2068,7 +2070,8 @@ meta_format(struct bio_meta_context *mc, struct meta_fmt_info *fi, bool force)
meta_hdr->mh_hdr_blks = META_HDR_BLKS;
meta_hdr->mh_tot_blks = (fi->fi_meta_size / META_BLK_SZ) - META_HDR_BLKS;
meta_hdr->mh_vos_id = fi->fi_vos_id;
meta_hdr->mh_flags = META_HDR_FL_EMPTY;
meta_hdr->mh_flags = (flags | META_HDR_FL_EMPTY);
meta_hdr->mh_backend_type = fi->fi_backend_type;

rc = write_header(mc, mc->mc_meta, meta_hdr, sizeof(*meta_hdr), &meta_hdr->mh_csum);
if (rc) {
Expand Down
Loading
Loading