Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zdb and zpool disagree on DDT size; how do I estimate additional memory requirements for DDT based on their output? #7323

Closed
gf-mse opened this issue Mar 21, 2018 · 3 comments
Labels
Status: Stale No recent activity for issue

Comments

@gf-mse
Copy link

gf-mse commented Mar 21, 2018

First of all, I am having this problem for a fairly old ZFS version 0.6.5.6 . However, I have searched closed issues and can't seem to find anything related yet ; so any hints are very much welcome.


System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version LTS 16.04
Linux Kernel 4.4.0-21-generic
Architecture x86-64 / 64-bit
ZFS Version 0.6.5.6-0ubuntu3
SPL Version 0.6.5.6-0ubuntu1

Describe the problem you're observing

  • zdb -DD $poolname and zpool status -D $poolname show different data
  • output for both utilities for essentially the same dataset does not appear to be stable

Describe how to reproduce the problem

  • see the listings below

Include any warning/errors/backtraces from the system logs

  • none attached

I am trying to estimate memory requirements for enabling deduplication on our datasets, based on a large known statistics for the number of used blocks for existing data and an estimated size of a per-block DDT entry.

To estimate the latter I created a test pool pool1 with dedup=verify and am trying to make sense of zdb -bDD pool1 and zpool -DD pool1 output.

However, while they seem to agree on the block count, the reported numbers for "on disk" / "in core" seem to differ dramatically -- probably due to slightly different ways it is calculated in the code:

So I wonder why are they different, and how do I use either to estimate DDT memory requirements for the ZFS version at hand.

Now I apologize for some lengthy listings -- please just look at "on disk" / "in core" lines there, and feel free to ignore the rest:


(1) zpool create -f pool1 -m /local -O dedup=verify -O atime=off 'wwn-0x6b8ca3a0f7ce210019f34ed0155b160f-part2', then just add some data:

# zdb -DDD pool1
DDT-sha256-zap-unique: 1018 entries, size 273 on disk, 156 in core

bucket              allocated                       referenced          
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     1     1018    127M    127M    127M     1018    127M    127M    127M


DDT histogram (aggregated over all DDTs):

bucket              allocated                       referenced          
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     1     1018    127M    127M    127M     1018    127M    127M    127M
 Total     1018    127M    127M    127M     1018    127M    127M    127M

dedup = 1.00, compress = 1.00, copies = 1.00, dedup * compress / copies = 1.00


# zpool status -D pool1
  pool: pool1
 state: ONLINE
  scan: none requested
config: 

        NAME                                            STATE     READ WRITE CKSUM
        pool1                                           ONLINE       0     0     0
          wwn-0x6b8ca3a0f7ce210019f34ed0155b160f-part2  ONLINE       0     0     0

errors: No known data errors

 dedup: DDT entries 1018, size 129 on disk, 72 in core

bucket              allocated                       referenced
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     1     1018    127M    127M    127M     1018    127M    127M    127M
 Total     1018    127M    127M    127M     1018    127M    127M    127M


(2) add more data and observe that DDT is filling up:
( not showing the second DDT dump, since both times it's produced by the same zpool_dump_ddt() call ):

# zdb -DD pool1
DDT-sha256-zap-duplicate: 216 entries, size 277 on disk, 170 in core
DDT-sha256-zap-unique: 2657 entries, size 276 on disk, 155 in core

DDT histogram (aggregated over all DDTs):

bucket              allocated                       referenced
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     1    2.59K    316M    316M    316M    2.59K    316M    316M    316M
     2      216   25.5M   25.5M   25.5M      434   51.0M   51.0M   51.0M
 Total    2.81K    342M    342M    342M    3.02K    367M    367M    367M

dedup = 1.07, compress = 1.00, copies = 1.00, dedup * compress / copies = 1.07



# zpool status -D pool1
  pool: pool1
 state: ONLINE
  scan: none requested
config: 

        NAME                                            STATE     READ WRITE CKSUM
        pool1                                           ONLINE       0     0     0
          wwn-0x6b8ca3a0f7ce210019f34ed0155b160f-part2  ONLINE       0     0     0

errors: No known data errors

 dedup: DDT entries 2873, size 217 on disk, 116 in core


(3) snapshotted the previous state, and then deleted all data:

# zdb -DD pool1
DDT-sha256-zap-duplicate: 216 entries, size 1784 on disk, 2104 in core
DDT-sha256-zap-unique: 2657 entries, size 271 on disk, 155 in core

DDT histogram (aggregated over all DDTs):

bucket              allocated                       referenced          
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     1    2.59K    316M    316M    316M    2.59K    316M    316M    316M
     2      216   25.5M   25.5M   25.5M      434   51.0M   51.0M   51.0M
 Total    2.81K    342M    342M    342M    3.02K    367M    367M    367M

dedup = 1.07, compress = 1.00, copies = 1.00, dedup * compress / copies = 1.07


# zpool status -D pool1
  pool: pool1
 state: ONLINE
  scan: none requested
config:

        NAME                                            STATE     READ WRITE CKSUM
        pool1                                           ONLINE       0     0     0
          wwn-0x6b8ca3a0f7ce210019f34ed0155b160f-part2  ONLINE       0     0     0

errors: No known data errors

 dedup: DDT entries 2873, size 390 on disk, 302 in core


(4) wrote it back:


# zdb -DD pool1
DDT-sha256-zap-duplicate: 2873 entries, size 276 on disk, 158 in core

DDT histogram (aggregated over all DDTs):

bucket              allocated                       referenced          
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     2    2.59K    316M    316M    316M    5.19K    632M    632M    632M
     4      216   25.5M   25.5M   25.5M      868    102M    102M    102M
 Total    2.81K    342M    342M    342M    6.04K    734M    734M    734M

dedup = 2.15, compress = 1.00, copies = 1.00, dedup * compress / copies = 2.15


# zpool status -D pool1

...

 dedup: DDT entries 2873, size 378 on disk, 290 in core


(5) Finally, deleted all data, took another snapshot ( of the empty space ), and add the same data back again:

# zdb -DD pool1
DDT-sha256-zap-duplicate: 2873 entries, size 278 on disk, 158 in core

DDT histogram (aggregated over all DDTs):

bucket              allocated                       referenced          
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     2    2.59K    316M    316M    316M    5.19K    632M    632M    632M
     4      216   25.5M   25.5M   25.5M      868    102M    102M    102M
 Total    2.81K    342M    342M    342M    6.04K    734M    734M    734M

dedup = 2.15, compress = 1.00, copies = 1.00, dedup * compress / copies = 2.15


# zpool status -D pool1
  pool: pool1
 state: ONLINE
  scan: none requested
config:

        NAME                                            STATE     READ WRITE CKSUM
        pool1                                           ONLINE       0     0     0
          wwn-0x6b8ca3a0f7ce210019f34ed0155b160f-part2  ONLINE       0     0     0

errors: No known data errors

 dedup: DDT entries 2873, size 402 on disk, 302 in core


Please note that these reported numbers do not stick together at all:

  • in the first case for 1018 entries the "in core" size shall be either 156 ( zdb ) or 72 ( zpool ) ;
  • in the second case 216 entries of size 170 and 2657 entries of size 155 (zdb) do not average to 2873 entries of size 116 (zpool) : 216 + 2657 == 2873, but the average can't get smaller than min(170, 155) ;
  • comparing (2) and (3) -- how comes that the "deduplicated" part for those 216 entries -- and basically the same data -- grew up tenfold?
DDT-sha256-zap-duplicate: 216 entries, size 277 on disk, 170 in core
DDT-sha256-zap-unique: 2657 entries, size 276 on disk, 155 in core
---
DDT-sha256-zap-duplicate: 216 entries, size 1784 on disk, 2104 in core
DDT-sha256-zap-unique: 2657 entries, size 271 on disk, 155 in core
  • for (4) and (5) the results are almost the same, although one may observe that zdb statistics is more stable.
@gf-mse
Copy link
Author

gf-mse commented Mar 23, 2018

Looking at the code at the first link, which essentially is

        error = ddt_object_info(ddt, type, class, &doi);
	error = ddt_object_count(ddt, type, class, &count);

	dspace = doi.doi_physical_blocks_512 << 9;
	mspace = doi.doi_fill_count * doi.doi_data_block_size;

	ddt_object_name(ddt, type, class, name);

	(void) printf("%s: %llu entries, size %llu on disk, %llu in core\n",
	    name,
	    (u_longlong_t)count,
	    (u_longlong_t)(dspace / count),
            (u_longlong_t)(mspace / count));

it feels like zdb in fact collects some statistics, and then just rounds it up using C integer division.

I have started with original ( and old ) instructions instructions from Oracle, which basically say "multiply the number of DDT entries to 320". I assume that the value of 320 has changed since, but had some hope that I can get an idea of that change. ( I have also tried to run zdb dedup simulations on our real data, as recommended by the article. In our case, zdb reliably crashes on our data -- a different story -- so I resorted to this. )

However I still struggle to understand why the same data which shall produce basically the same DDT may reflect in a ten-fold difference in statistics.


As for other considerations -- I understand that there are plenty. Saso Kiselkov seems to cover most of them in his talk on OpenZFS Conference -- see first few minutes, and I understand that one should consider not only memory/space requirements, but also performance, and even data integrity.

However, from what I gather, understanding memory footprint is critical for dedup users, since "undo" options are painful or limited ( basically resend all streams back with dedup=off ).


As for #5182 -- I am putting very high hopes in it and from what I see it just went through some review by Matt Ahrens ( finally -- as Don presented it some three years ago, I think ), so it looks like it started its way to trunk. ( And I'm still not sure how do I restore DDT if e.g. the dedicated DDT storage fails. But this shall be probably discussed in a different thread. )

@gf-mse
Copy link
Author

gf-mse commented Mar 23, 2018

There are mixed reports, including some positive.

Was that 1 TB SSD the (only) primary storage -- or an L2ARC device?

Anyway, I would still be interested to know how do we estimate how much memory we need -- both for RAM-only and L2 SSD configurations.

@stale
Copy link

stale bot commented Aug 25, 2020

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Aug 25, 2020
@stale stale bot closed this as completed Nov 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Stale No recent activity for issue
Projects
None yet
Development

No branches or pull requests

2 participants
@gf-mse and others