zdb and zpool disagree on DDT size; how do I estimate additional memory requirements for DDT based on their output? #7323

gf-mse · 2018-03-21T07:49:39Z

First of all, I am having this problem for a fairly old ZFS version 0.6.5.6 . However, I have searched closed issues and can't seem to find anything related yet ; so any hints are very much welcome.

System information

Type	Version/Name
Distribution Name	Ubuntu
Distribution Version	LTS 16.04
Linux Kernel	4.4.0-21-generic
Architecture	x86-64 / 64-bit
ZFS Version	0.6.5.6-0ubuntu3
SPL Version	0.6.5.6-0ubuntu1

Describe the problem you're observing

zdb -DD $poolname and zpool status -D $poolname show different data
output for both utilities for essentially the same dataset does not appear to be stable

Describe how to reproduce the problem

see the listings below

Include any warning/errors/backtraces from the system logs

none attached

I am trying to estimate memory requirements for enabling deduplication on our datasets, based on a large known statistics for the number of used blocks for existing data and an estimated size of a per-block DDT entry.

To estimate the latter I created a test pool pool1 with dedup=verify and am trying to make sense of zdb -bDD pool1 and zpool -DD pool1 output.

However, while they seem to agree on the block count, the reported numbers for "on disk" / "in core" seem to differ dramatically -- probably due to slightly different ways it is calculated in the code:

So I wonder why are they different, and how do I use either to estimate DDT memory requirements for the ZFS version at hand.

Now I apologize for some lengthy listings -- please just look at "on disk" / "in core" lines there, and feel free to ignore the rest:

(1) zpool create -f pool1 -m /local -O dedup=verify -O atime=off 'wwn-0x6b8ca3a0f7ce210019f34ed0155b160f-part2', then just add some data:

# zdb -DDD pool1
DDT-sha256-zap-unique: 1018 entries, size 273 on disk, 156 in core

bucket              allocated                       referenced          
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     1     1018    127M    127M    127M     1018    127M    127M    127M


DDT histogram (aggregated over all DDTs):

bucket              allocated                       referenced          
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     1     1018    127M    127M    127M     1018    127M    127M    127M
 Total     1018    127M    127M    127M     1018    127M    127M    127M

dedup = 1.00, compress = 1.00, copies = 1.00, dedup * compress / copies = 1.00


# zpool status -D pool1
  pool: pool1
 state: ONLINE
  scan: none requested
config: 

        NAME                                            STATE     READ WRITE CKSUM
        pool1                                           ONLINE       0     0     0
          wwn-0x6b8ca3a0f7ce210019f34ed0155b160f-part2  ONLINE       0     0     0

errors: No known data errors

 dedup: DDT entries 1018, size 129 on disk, 72 in core

bucket              allocated                       referenced
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     1     1018    127M    127M    127M     1018    127M    127M    127M
 Total     1018    127M    127M    127M     1018    127M    127M    127M

(2) add more data and observe that DDT is filling up:
( not showing the second DDT dump, since both times it's produced by the same zpool_dump_ddt() call ):

# zdb -DD pool1
DDT-sha256-zap-duplicate: 216 entries, size 277 on disk, 170 in core
DDT-sha256-zap-unique: 2657 entries, size 276 on disk, 155 in core

DDT histogram (aggregated over all DDTs):

bucket              allocated                       referenced
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     1    2.59K    316M    316M    316M    2.59K    316M    316M    316M
     2      216   25.5M   25.5M   25.5M      434   51.0M   51.0M   51.0M
 Total    2.81K    342M    342M    342M    3.02K    367M    367M    367M

dedup = 1.07, compress = 1.00, copies = 1.00, dedup * compress / copies = 1.07



# zpool status -D pool1
  pool: pool1
 state: ONLINE
  scan: none requested
config: 

        NAME                                            STATE     READ WRITE CKSUM
        pool1                                           ONLINE       0     0     0
          wwn-0x6b8ca3a0f7ce210019f34ed0155b160f-part2  ONLINE       0     0     0

errors: No known data errors

 dedup: DDT entries 2873, size 217 on disk, 116 in core

(3) snapshotted the previous state, and then deleted all data:

# zdb -DD pool1
DDT-sha256-zap-duplicate: 216 entries, size 1784 on disk, 2104 in core
DDT-sha256-zap-unique: 2657 entries, size 271 on disk, 155 in core

DDT histogram (aggregated over all DDTs):

bucket              allocated                       referenced          
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     1    2.59K    316M    316M    316M    2.59K    316M    316M    316M
     2      216   25.5M   25.5M   25.5M      434   51.0M   51.0M   51.0M
 Total    2.81K    342M    342M    342M    3.02K    367M    367M    367M

dedup = 1.07, compress = 1.00, copies = 1.00, dedup * compress / copies = 1.07


# zpool status -D pool1
  pool: pool1
 state: ONLINE
  scan: none requested
config:

        NAME                                            STATE     READ WRITE CKSUM
        pool1                                           ONLINE       0     0     0
          wwn-0x6b8ca3a0f7ce210019f34ed0155b160f-part2  ONLINE       0     0     0

errors: No known data errors

 dedup: DDT entries 2873, size 390 on disk, 302 in core

(4) wrote it back:


# zdb -DD pool1
DDT-sha256-zap-duplicate: 2873 entries, size 276 on disk, 158 in core

DDT histogram (aggregated over all DDTs):

bucket              allocated                       referenced          
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     2    2.59K    316M    316M    316M    5.19K    632M    632M    632M
     4      216   25.5M   25.5M   25.5M      868    102M    102M    102M
 Total    2.81K    342M    342M    342M    6.04K    734M    734M    734M

dedup = 2.15, compress = 1.00, copies = 1.00, dedup * compress / copies = 2.15


# zpool status -D pool1

...

 dedup: DDT entries 2873, size 378 on disk, 290 in core

(5) Finally, deleted all data, took another snapshot ( of the empty space ), and add the same data back again:

# zdb -DD pool1
DDT-sha256-zap-duplicate: 2873 entries, size 278 on disk, 158 in core

DDT histogram (aggregated over all DDTs):

bucket              allocated                       referenced          
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     2    2.59K    316M    316M    316M    5.19K    632M    632M    632M
     4      216   25.5M   25.5M   25.5M      868    102M    102M    102M
 Total    2.81K    342M    342M    342M    6.04K    734M    734M    734M

dedup = 2.15, compress = 1.00, copies = 1.00, dedup * compress / copies = 2.15


# zpool status -D pool1
  pool: pool1
 state: ONLINE
  scan: none requested
config:

        NAME                                            STATE     READ WRITE CKSUM
        pool1                                           ONLINE       0     0     0
          wwn-0x6b8ca3a0f7ce210019f34ed0155b160f-part2  ONLINE       0     0     0

errors: No known data errors

 dedup: DDT entries 2873, size 402 on disk, 302 in core

Please note that these reported numbers do not stick together at all:

in the first case for 1018 entries the "in core" size shall be either 156 ( zdb ) or 72 ( zpool ) ;
in the second case 216 entries of size 170 and 2657 entries of size 155 (zdb) do not average to 2873 entries of size 116 (zpool) : 216 + 2657 == 2873, but the average can't get smaller than min(170, 155) ;
comparing (2) and (3) -- how comes that the "deduplicated" part for those 216 entries -- and basically the same data -- grew up tenfold?

DDT-sha256-zap-duplicate: 216 entries, size 277 on disk, 170 in core
DDT-sha256-zap-unique: 2657 entries, size 276 on disk, 155 in core
---
DDT-sha256-zap-duplicate: 216 entries, size 1784 on disk, 2104 in core
DDT-sha256-zap-unique: 2657 entries, size 271 on disk, 155 in core

for (4) and (5) the results are almost the same, although one may observe that zdb statistics is more stable.

The text was updated successfully, but these errors were encountered:

gf-mse · 2018-03-23T09:32:40Z

Looking at the code at the first link, which essentially is

        error = ddt_object_info(ddt, type, class, &doi);
	error = ddt_object_count(ddt, type, class, &count);

	dspace = doi.doi_physical_blocks_512 << 9;
	mspace = doi.doi_fill_count * doi.doi_data_block_size;

	ddt_object_name(ddt, type, class, name);

	(void) printf("%s: %llu entries, size %llu on disk, %llu in core\n",
	    name,
	    (u_longlong_t)count,
	    (u_longlong_t)(dspace / count),
            (u_longlong_t)(mspace / count));

it feels like zdb in fact collects some statistics, and then just rounds it up using C integer division.

I have started with original ( and old ) instructions instructions from Oracle, which basically say "multiply the number of DDT entries to 320". I assume that the value of 320 has changed since, but had some hope that I can get an idea of that change. ( I have also tried to run zdb dedup simulations on our real data, as recommended by the article. In our case, zdb reliably crashes on our data -- a different story -- so I resorted to this. )

However I still struggle to understand why the same data which shall produce basically the same DDT may reflect in a ten-fold difference in statistics.

As for other considerations -- I understand that there are plenty. Saso Kiselkov seems to cover most of them in his talk on OpenZFS Conference -- see first few minutes, and I understand that one should consider not only memory/space requirements, but also performance, and even data integrity.

However, from what I gather, understanding memory footprint is critical for dedup users, since "undo" options are painful or limited ( basically resend all streams back with dedup=off ).

As for #5182 -- I am putting very high hopes in it and from what I see it just went through some review by Matt Ahrens ( finally -- as Don presented it some three years ago, I think ), so it looks like it started its way to trunk. ( And I'm still not sure how do I restore DDT if e.g. the dedicated DDT storage fails. But this shall be probably discussed in a different thread. )

gf-mse · 2018-03-23T17:24:12Z

There are mixed reports, including some positive.

Was that 1 TB SSD the (only) primary storage -- or an L2ARC device?

Anyway, I would still be interested to know how do we estimate how much memory we need -- both for RAM-only and L2 SSD configurations.

stale · 2020-08-25T06:35:15Z

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

stale bot added the Status: Stale No recent activity for issue label Aug 25, 2020

stale bot closed this as completed Nov 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zdb and zpool disagree on DDT size; how do I estimate additional memory requirements for DDT based on their output? #7323

zdb and zpool disagree on DDT size; how do I estimate additional memory requirements for DDT based on their output? #7323

gf-mse commented Mar 21, 2018

gf-mse commented Mar 23, 2018 •

edited

Loading

gf-mse commented Mar 23, 2018 •

edited

Loading

stale bot commented Aug 25, 2020

zdb and zpool disagree on DDT size; how do I estimate additional memory requirements for DDT based on their output? #7323

zdb and zpool disagree on DDT size; how do I estimate additional memory requirements for DDT based on their output? #7323

Comments

gf-mse commented Mar 21, 2018

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

gf-mse commented Mar 23, 2018 • edited Loading

gf-mse commented Mar 23, 2018 • edited Loading

stale bot commented Aug 25, 2020

gf-mse commented Mar 23, 2018 •

edited

Loading

gf-mse commented Mar 23, 2018 •

edited

Loading