Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zdb -R :d decompress flag is doing everything wrong. zdb_read_block() #4984

Closed
JuliaVixen opened this issue Aug 18, 2016 · 3 comments
Closed
Labels
Type: Feature Feature request or new feature

Comments

@JuliaVixen
Copy link

JuliaVixen commented Aug 18, 2016

This might be a bug report for the illumos team, rather than zfsonlinux, but I'm sure that this report will eventually get to the right person.

I've got a couple of zpools with a single checksum error, on a single file on them, for whatever reason. And I'm trying to just dump a copy of the corrupt file out with zdb, so I can compare it with my good (from backup) copy to find out what changed. Curiously, zdb doesn't just have an option to dump out all the data contents of a file if given a znode or blkptr.

So, normally, the trick is to just look up the inode, I mean znode for the file (ls -i), and get the DVA's of the indirect blocks.
zdb -ddddd pool 123456
(And some -bbbbbb too if you want to get more detail.)
Then you write a very short perl script, or shell script to chop out all of the L0 DVA[0]=<0:23afca2bc000:134000> parts, and feed them back into zdb -R kinda like this:
zdb -R pool 0:23afca2bc000:134000:r > blocks/0_23afca2bc000_134000.bin
And then cat all the blocks back together, if you didn't just use ">>" instead.

BUT!

If you have compression turned on, these raw chunks of file will be compressed. Well, there's a "d" flag you can add, which should decompress the data, right? So...

I speculate that when you hand zdb just a raw DVA to read, it has no idea about it's logical size, or physical size, or compression algorithm, or checksum, because all of that info is in the block pointer, and it doesn't know which block pointer is pointing at this DVA. All zdb knows is you gave it an address.

So, when you do zdb -R pool 0:d919ae8c000:1c000:dr, zdb will brute-force attempt to decompress that block with every single compression algorithm known to zfs, until one of the decompression functions returns without an error... AND! also attempt the decompression from every possible starting offset in the block. (This explains why CPU load goes up to 100% for twenty minutes to dump a few MB of data.)

Once upon a time, this would probably, eventually, give you a correctly decompressed block... And then came LZ4...

Actually, the problem isn't with LZ4, it's actually with the zero-run-length-encoder. Basically, any random data you hand to ZLE, is always doing to be decompressable without error. So this brute-force, try-every-decompression-function, method will always return your block, ZLE decompressed, as soon as it tries that.... which it will try, before trying LZ4.

It goes in this order (from include/sys/zio.h):

enum zio_compress {
        ZIO_COMPRESS_INHERIT = 0,
        ZIO_COMPRESS_ON,
        ZIO_COMPRESS_OFF,
        ZIO_COMPRESS_LZJB,
        ZIO_COMPRESS_EMPTY,
        ZIO_COMPRESS_GZIP_1,
        ZIO_COMPRESS_GZIP_2,
        ZIO_COMPRESS_GZIP_3,
        ZIO_COMPRESS_GZIP_4,
        ZIO_COMPRESS_GZIP_5,
        ZIO_COMPRESS_GZIP_6,
        ZIO_COMPRESS_GZIP_7,
        ZIO_COMPRESS_GZIP_8,
        ZIO_COMPRESS_GZIP_9,
        ZIO_COMPRESS_ZLE,
        ZIO_COMPRESS_LZ4,
        ZIO_COMPRESS_FUNCTIONS
};

And the particular brute-force loop is this one from cmd/zdb/zdb.c zdb_read_block()

                enum zio_compress c;
                void *pbuf2 = umem_alloc(SPA_MAXBLOCKSIZE, UMEM_NOFAIL);
                void *lbuf2 = umem_alloc(SPA_MAXBLOCKSIZE, UMEM_NOFAIL);

                bcopy(pbuf, pbuf2, psize);

                VERIFY(random_get_pseudo_bytes((uint8_t *)pbuf + psize,
                    SPA_MAXBLOCKSIZE - psize) == 0);

                VERIFY(random_get_pseudo_bytes((uint8_t *)pbuf2 + psize,
                    SPA_MAXBLOCKSIZE - psize) == 0);

                for (lsize = SPA_MAXBLOCKSIZE; lsize > psize;
                    lsize -= SPA_MINBLOCKSIZE) {
                        for (c = 0; c < ZIO_COMPRESS_FUNCTIONS; c++) {
                                if (zio_decompress_data(c, pbuf, lbuf,
                                    psize, lsize) == 0 &&
                                    zio_decompress_data(c, pbuf2, lbuf2,
                                    psize, lsize) == 0 &&
                                    bcmp(lbuf, lbuf2, lsize) == 0)
                                        break;
                        }
                        if (c != ZIO_COMPRESS_FUNCTIONS)
                                break;
                        lsize -= SPA_MINBLOCKSIZE;
                }

A month ago, I hacked zdb.c to just do LZ4 first, and that mostly worked -- the output block was much more correct. But, then I also needed to pass it the psize and lsize to really get the output block right, and by this point I'm thinking maybe it would be better to just modify zdb_read_block() to take a blkptr as input, and extract all this stuff from there. And then I went on a trip, and I haven't gotten back to working on this at all...

There is an unimplemented "c" checksum check flag, which would also be super-duper useful for me in this situation, because then I can just find the individual block which is corrupt, and only need to examine just that... Of course, this is probably unimplemented, because you'll either need to pass the whole checksum to zdb on the command line... or get zdb to look it up from a user supplied blkptr.

@behlendorf behlendorf added the Type: Feature Feature request or new feature label Aug 19, 2016
@tuxoko
Copy link
Contributor

tuxoko commented Aug 24, 2016

I also notice the possible problem with ZLE when I was trying to figure out the 16MB/infinite loop issues (#4955/#4956). Though in the master, lsize will start from psize so zle is much less likely to success on pseudo random data.

@loli10K loli10K mentioned this issue Feb 6, 2018
13 tasks
Nasf-Fan pushed a commit to Nasf-Fan/zfs that referenced this issue Feb 13, 2018
There are some issues in the zdb -R decompression implementation.

The first is that ZLE can easily decompress non-ZLE streams. So we add
ZDB_NO_ZLE env to make zdb skip ZLE.

The second is the random bytes appended to pabd, pbuf2 stuff. This serve
no purpose at all, those bytes shouldn't be read during decompression
anyway. Instead, we randomize lbuf2, so that we can make sure
decompression fill exactly to lsize by bcmp lbuf and lbuf2.

The last one is the condition to detect fail is wrong.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: loli10K <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes openzfs#7099
Closes openzfs#4984
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Mar 7, 2018
There are some issues in the zdb -R decompression implementation.

The first is that ZLE can easily decompress non-ZLE streams. So we add
ZDB_NO_ZLE env to make zdb skip ZLE.

The second is the random bytes appended to pabd, pbuf2 stuff. This serve
no purpose at all, those bytes shouldn't be read during decompression
anyway. Instead, we randomize lbuf2, so that we can make sure
decompression fill exactly to lsize by bcmp lbuf and lbuf2.

The last one is the condition to detect fail is wrong.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: loli10K <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes openzfs#7099
Closes openzfs#4984
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Mar 7, 2018
This is a squashed patchset for zfs-0.7.7.  The individual commits are
in the tonyhutter:zfs-0.7.7-hutter branch.  I squashed the commits so
that buildbot wouldn't have to run against each one, and because
github/builbot seem to have a maximum limit of 30 commits they can
test from a PR.

- Linux 4.16 compat: get_disk_and_module() openzfs#7264
- Change checksum & IO delay ratelimit values openzfs#7252
- Increment zil_itx_needcopy_bytes properly openzfs#6988  openzfs#7176
- Fix some typos openzfs#7237
- Fix zpool(8) list example to match actual format openzfs#7244
- Add SMART self-test results to zpool status -c openzfs#7178
- Add scrub after resilver zed script openzfs#4662  openzfs#7086
- Fix free memory calculation on v3.14+ openzfs#7170
- Report duration and error in mmp_history entries openzfs#7190
- Do not initiate MMP writes while pool is suspended openzfs#7182
- Linux 4.16 compat: use correct *_dec_and_test() openzfs#7179  openzfs#7211
- Allow modprobe to fail when called within systemd openzfs#7174
- Add SMART attributes for SSD and NVMe openzfs#7183  openzfs#7193
- Correct count_uberblocks in mmp.kshlib openzfs#7191
- Fix config issues: frame size and headers openzfs#7169
- Clarify zinject(8) explanation of -e openzfs#7172
- OpenZFS 8857 - zio_remove_child() panic due to already destroyed
  parent zio openzfs#7168
- 'zfs receive' fails with "dataset is busy" openzfs#7129  openzfs#7154
- contrib/initramfs: add missing conf.d/zfs openzfs#7158
- mmp should use a fixed tag for spa_config locks openzfs#6530  openzfs#7155
- Handle zap_add() failures in mixed case mode openzfs#7011 openzfs#7054
- Fix zdb -ed on objset for exported pool openzfs#7099 openzfs#6464
- Fix zdb -E segfault openzfs#7099
- Fix zdb -R decompression openzfs#7099  openzfs#4984
- Fix racy assignment of zcb.zcb_haderrors openzfs#7099
- Fix zle_decompress out of bound access openzfs#7099
- Fix zdb -c traverse stop on damaged objset root openzfs#7099
- Linux 4.11 compat: avoid refcount_t name conflict openzfs#7148
- Linux 4.16 compat: inode_set_iversion() openzfs#7148
- OpenZFS 8966 - Source file zfs_acl.c, function zfs_aclset_common
  contains a use after end of the lifetime of a local variable openzfs#7141
- Remove deprecated zfs_arc_p_aggressive_disable openzfs#7135
- Fix default libdir for Debian/Ubuntu openzfs#7083  openzfs#7101
- Bug fix in qat_compress.c for vmalloc addr check openzfs#7125
- Fix systemd_ RPM macros usage on Debian-based distributions openzfs#7074
  openzfs#7100
- Emit an error message before MMP suspends pool openzfs#7048
- ZTS: Fix create-o_ashift test case openzfs#6924  openzfs#6977
- Fix --with-systemd on Debian-based distributions (openzfs#6963) openzfs#6591  openzfs#6963
- Remove vn_rename and vn_remove dependency openzfs/spl#648 openzfs#6753
- Add support for "--enable-code-coverage" option openzfs#6670
- Make "-fno-inline" compile option more accessible openzfs#6605
- Add configure option to enable gcov analysis openzfs#6642
- Implement --enable-debuginfo to force debuginfo openzfs#2734
- Make --enable-debug fail when given bogus args openzfs#2734

Signed-off-by: Tony Hutter <[email protected]>
Requires-spl: refs/pull/690/head
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Mar 12, 2018
There are some issues in the zdb -R decompression implementation.

The first is that ZLE can easily decompress non-ZLE streams. So we add
ZDB_NO_ZLE env to make zdb skip ZLE.

The second is the random bytes appended to pabd, pbuf2 stuff. This serve
no purpose at all, those bytes shouldn't be read during decompression
anyway. Instead, we randomize lbuf2, so that we can make sure
decompression fill exactly to lsize by bcmp lbuf and lbuf2.

The last one is the condition to detect fail is wrong.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: loli10K <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes openzfs#7099
Closes openzfs#4984
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Mar 13, 2018
This is a squashed patchset for zfs-0.7.7.  The individual commits are
in the tonyhutter:zfs-0.7.7-hutter branch.  I squashed the commits so
that buildbot wouldn't have to run against each one, and because
github/builbot seem to have a maximum limit of 30 commits they can
test from a PR.

- Fix MMP write frequency for large pools openzfs#7205 openzfs#7289
- Handle zio_resume and mmp => off openzfs#7286
- Fix zfs-kmod builds when using rpm >= 4.14 openzfs#7284
- zdb and inuse tests don't pass with real disks openzfs#6939 openzfs#7261
- Take user namespaces into account in policy checks openzfs#6800 openzfs#7270
- Detect long config lock acquisition in mmp openzfs#7212
- Linux 4.16 compat: get_disk_and_module() openzfs#7264
- Change checksum & IO delay ratelimit values openzfs#7252
- Increment zil_itx_needcopy_bytes properly openzfs#6988 openzfs#7176
- Fix some typos openzfs#7237
- Fix zpool(8) list example to match actual format openzfs#7244
- Add SMART self-test results to zpool status -c openzfs#7178
- Add scrub after resilver zed script openzfs#4662 openzfs#7086
- Fix free memory calculation on v3.14+ openzfs#7170
- Report duration and error in mmp_history entries openzfs#7190
- Do not initiate MMP writes while pool is suspended openzfs#7182
- Linux 4.16 compat: use correct *_dec_and_test()
- Allow modprobe to fail when called within systemd openzfs#7174
- Add SMART attributes for SSD and NVMe openzfs#7183 openzfs#7193
- Correct count_uberblocks in mmp.kshlib openzfs#7191
- Fix config issues: frame size and headers openzfs#7169
- Clarify zinject(8) explanation of -e openzfs#7172
- OpenZFS 8857 - zio_remove_child() panic due to already destroyed parent zio openzfs#7168
- 'zfs receive' fails with "dataset is busy" openzfs#7129 openzfs#7154
- contrib/initramfs: add missing conf.d/zfs openzfs#7158
- mmp should use a fixed tag for spa_config locks openzfs#6530 openzfs#7155
- Handle zap_add() failures in mixed case mode openzfs#7011 openzfs#7054
- Fix zdb -ed on objset for exported pool openzfs#7099 openzfs#6464
- Fix zdb -E segfault openzfs#7099
- Fix zdb -R decompression openzfs#7099 openzfs#4984
- Fix racy assignment of zcb.zcb_haderrors openzfs#7099
- Fix zle_decompress out of bound access openzfs#7099
- Fix zdb -c traverse stop on damaged objset root openzfs#7099
- Linux 4.11 compat: avoid refcount_t name conflict openzfs#7148
- Linux 4.16 compat: inode_set_iversion() openzfs#7148
- OpenZFS 8966 - Source file zfs_acl.c, function zfs_aclset_common contains a use after end of the lifetime of a local variable openzfs#7141
- Remove deprecated zfs_arc_p_aggressive_disable openzfs#7135
- Fix default libdir for Debian/Ubuntu openzfs#7083 openzfs#7101
- Bug fix in qat_compress.c for vmalloc addr check openzfs#7125
- Fix systemd_ RPM macros usage on Debian-based distributions openzfs#7074 openzfs#7100
- Emit an error message before MMP suspends pool openzfs#7048
- ZTS: Fix create-o_ashift test case openzfs#6924 openzfs#6977
- Fix --with-systemd on Debian-based distributions (openzfs#6963) openzfs#6591 openzfs#6963
- Remove vn_rename and vn_remove dependency openzfs/spl#648 openzfs#6753
- Add support for "--enable-code-coverage" option openzfs#6670
- Make "-fno-inline" compile option more accessible openzfs#6605
- Add configure option to enable gcov analysis openzfs#6642
- Implement --enable-debuginfo to force debuginfo openzfs#2734
- Make --enable-debug fail when given bogus args openzfs#2734

Signed-off-by: Tony Hutter <[email protected]>
Requires-spl: refs/pull/690/head
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Mar 13, 2018
This is a squashed patchset for zfs-0.7.7.  The individual commits are
in the tonyhutter:zfs-0.7.7-hutter branch.  I squashed the commits so
that buildbot wouldn't have to run against each one, and because
github/builbot seem to have a maximum limit of 30 commits they can
test from a PR.

- Fix MMP write frequency for large pools openzfs#7205 openzfs#7289
- Handle zio_resume and mmp => off openzfs#7286
- Fix zfs-kmod builds when using rpm >= 4.14 openzfs#7284
- zdb and inuse tests don't pass with real disks openzfs#6939 openzfs#7261
- Take user namespaces into account in policy checks openzfs#6800 openzfs#7270
- Detect long config lock acquisition in mmp openzfs#7212
- Linux 4.16 compat: get_disk_and_module() openzfs#7264
- Change checksum & IO delay ratelimit values openzfs#7252
- Increment zil_itx_needcopy_bytes properly openzfs#6988 openzfs#7176
- Fix some typos openzfs#7237
- Fix zpool(8) list example to match actual format openzfs#7244
- Add SMART self-test results to zpool status -c openzfs#7178
- Add scrub after resilver zed script openzfs#4662 openzfs#7086
- Fix free memory calculation on v3.14+ openzfs#7170
- Report duration and error in mmp_history entries openzfs#7190
- Do not initiate MMP writes while pool is suspended openzfs#7182
- Linux 4.16 compat: use correct *_dec_and_test()
- Allow modprobe to fail when called within systemd openzfs#7174
- Add SMART attributes for SSD and NVMe openzfs#7183 openzfs#7193
- Correct count_uberblocks in mmp.kshlib openzfs#7191
- Fix config issues: frame size and headers openzfs#7169
- Clarify zinject(8) explanation of -e openzfs#7172
- OpenZFS 8857 - zio_remove_child() panic due to already destroyed
  parent zio openzfs#7168
- 'zfs receive' fails with "dataset is busy" openzfs#7129 openzfs#7154
- contrib/initramfs: add missing conf.d/zfs openzfs#7158
- mmp should use a fixed tag for spa_config locks openzfs#6530 openzfs#7155
- Handle zap_add() failures in mixed case mode openzfs#7011 openzfs#7054
- Fix zdb -ed on objset for exported pool openzfs#7099 openzfs#6464
- Fix zdb -E segfault openzfs#7099
- Fix zdb -R decompression openzfs#7099 openzfs#4984
- Fix racy assignment of zcb.zcb_haderrors openzfs#7099
- Fix zle_decompress out of bound access openzfs#7099
- Fix zdb -c traverse stop on damaged objset root openzfs#7099
- Linux 4.11 compat: avoid refcount_t name conflict openzfs#7148
- Linux 4.16 compat: inode_set_iversion() openzfs#7148
- OpenZFS 8966 - Source file zfs_acl.c, function zfs_aclset_common
  contains a use after end of the lifetime of a local variable openzfs#7141
- Remove deprecated zfs_arc_p_aggressive_disable openzfs#7135
- Fix default libdir for Debian/Ubuntu openzfs#7083 openzfs#7101
- Bug fix in qat_compress.c for vmalloc addr check openzfs#7125
- Fix systemd_ RPM macros usage on Debian-based distributions openzfs#7074
  openzfs#7100
- Emit an error message before MMP suspends pool openzfs#7048
- ZTS: Fix create-o_ashift test case openzfs#6924 openzfs#6977
- Fix --with-systemd on Debian-based distributions (openzfs#6963) openzfs#6591 openzfs#6963
- Remove vn_rename and vn_remove dependency openzfs/spl#648 openzfs#6753
- Add support for "--enable-code-coverage" option openzfs#6670
- Make "-fno-inline" compile option more accessible openzfs#6605
- Add configure option to enable gcov analysis openzfs#6642
- Implement --enable-debuginfo to force debuginfo openzfs#2734
- Make --enable-debug fail when given bogus args openzfs#2734

Signed-off-by: Tony Hutter <[email protected]>
Requires-spl: refs/pull/690/head
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Mar 13, 2018
There are some issues in the zdb -R decompression implementation.

The first is that ZLE can easily decompress non-ZLE streams. So we add
ZDB_NO_ZLE env to make zdb skip ZLE.

The second is the random bytes appended to pabd, pbuf2 stuff. This serve
no purpose at all, those bytes shouldn't be read during decompression
anyway. Instead, we randomize lbuf2, so that we can make sure
decompression fill exactly to lsize by bcmp lbuf and lbuf2.

The last one is the condition to detect fail is wrong.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: loli10K <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes openzfs#7099
Closes openzfs#4984
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Mar 13, 2018
This is a squashed patchset for zfs-0.7.7.  The individual commits are
in the tonyhutter:zfs-0.7.7-hutter branch.  I squashed the commits so
that buildbot wouldn't have to run against each one, and because
github/builbot seem to have a maximum limit of 30 commits they can
test from a PR.

- Fix MMP write frequency for large pools openzfs#7205 openzfs#7289
- Handle zio_resume and mmp => off openzfs#7286
- Fix zfs-kmod builds when using rpm >= 4.14 openzfs#7284
- zdb and inuse tests don't pass with real disks openzfs#6939 openzfs#7261
- Take user namespaces into account in policy checks openzfs#6800 openzfs#7270
- Detect long config lock acquisition in mmp openzfs#7212
- Linux 4.16 compat: get_disk_and_module() openzfs#7264
- Change checksum & IO delay ratelimit values openzfs#7252
- Increment zil_itx_needcopy_bytes properly openzfs#6988 openzfs#7176
- Fix some typos openzfs#7237
- Fix zpool(8) list example to match actual format openzfs#7244
- Add SMART self-test results to zpool status -c openzfs#7178
- Add scrub after resilver zed script openzfs#4662 openzfs#7086
- Fix free memory calculation on v3.14+ openzfs#7170
- Report duration and error in mmp_history entries openzfs#7190
- Do not initiate MMP writes while pool is suspended openzfs#7182
- Linux 4.16 compat: use correct *_dec_and_test()
- Allow modprobe to fail when called within systemd openzfs#7174
- Add SMART attributes for SSD and NVMe openzfs#7183 openzfs#7193
- Correct count_uberblocks in mmp.kshlib openzfs#7191
- Fix config issues: frame size and headers openzfs#7169
- Clarify zinject(8) explanation of -e openzfs#7172
- OpenZFS 8857 - zio_remove_child() panic due to already destroyed
  parent zio openzfs#7168
- 'zfs receive' fails with "dataset is busy" openzfs#7129 openzfs#7154
- contrib/initramfs: add missing conf.d/zfs openzfs#7158
- mmp should use a fixed tag for spa_config locks openzfs#6530 openzfs#7155
- Handle zap_add() failures in mixed case mode openzfs#7011 openzfs#7054
- Fix zdb -ed on objset for exported pool openzfs#7099 openzfs#6464
- Fix zdb -E segfault openzfs#7099
- Fix zdb -R decompression openzfs#7099 openzfs#4984
- Fix racy assignment of zcb.zcb_haderrors openzfs#7099
- Fix zle_decompress out of bound access openzfs#7099
- Fix zdb -c traverse stop on damaged objset root openzfs#7099
- Linux 4.11 compat: avoid refcount_t name conflict openzfs#7148
- Linux 4.16 compat: inode_set_iversion() openzfs#7148
- OpenZFS 8966 - Source file zfs_acl.c, function zfs_aclset_common
  contains a use after end of the lifetime of a local variable openzfs#7141
- Remove deprecated zfs_arc_p_aggressive_disable openzfs#7135
- Fix default libdir for Debian/Ubuntu openzfs#7083 openzfs#7101
- Bug fix in qat_compress.c for vmalloc addr check openzfs#7125
- Fix systemd_ RPM macros usage on Debian-based distributions openzfs#7074
  openzfs#7100
- Emit an error message before MMP suspends pool openzfs#7048
- ZTS: Fix create-o_ashift test case openzfs#6924 openzfs#6977
- Fix --with-systemd on Debian-based distributions (openzfs#6963) openzfs#6591 openzfs#6963
- Remove vn_rename and vn_remove dependency openzfs/spl#648 openzfs#6753
- Fix "--enable-code-coverage" debug build openzfs#6674
- Update codecov.yml openzfs#6669
- Add support for "--enable-code-coverage" option openzfs#6670
- Make "-fno-inline" compile option more accessible openzfs#6605
- Add configure option to enable gcov analysis openzfs#6642
- Implement --enable-debuginfo to force debuginfo openzfs#2734
- Make --enable-debug fail when given bogus args openzfs#2734

Signed-off-by: Tony Hutter <[email protected]>
Requires-spl: refs/pull/690/head
tonyhutter pushed a commit that referenced this issue Mar 19, 2018
There are some issues in the zdb -R decompression implementation.

The first is that ZLE can easily decompress non-ZLE streams. So we add
ZDB_NO_ZLE env to make zdb skip ZLE.

The second is the random bytes appended to pabd, pbuf2 stuff. This serve
no purpose at all, those bytes shouldn't be read during decompression
anyway. Instead, we randomize lbuf2, so that we can make sure
decompression fill exactly to lsize by bcmp lbuf and lbuf2.

The last one is the condition to detect fail is wrong.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: loli10K <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes #7099
Closes #4984
@roopa01
Copy link

roopa01 commented Nov 12, 2020

This might be a bug report for the illumos team, rather than zfsonlinux, but I'm sure that this report will eventually get to the right person.

I've got a couple of zpools with a single checksum error, on a single file on them, for whatever reason. And I'm trying to just dump a copy of the corrupt file out with zdb, so I can compare it with my good (from backup) copy to find out what changed. Curiously, zdb doesn't just have an option to dump out all the data contents of a file if given a znode or blkptr.

So, normally, the trick is to just look up the inode, I mean znode for the file (ls -i), and get the DVA's of the indirect blocks.
zdb -ddddd pool 123456
(And some -bbbbbb too if you want to get more detail.)
Then you write a very short perl script, or shell script to chop out all of the L0 DVA[0]=<0:23afca2bc000:134000> parts, and feed them back into zdb -R kinda like this:
zdb -R pool 0:23afca2bc000:134000:r > blocks/0_23afca2bc000_134000.bin
And then cat all the blocks back together, if you didn't just use ">>" instead.

BUT!

If you have compression turned on, these raw chunks of file will be compressed. Well, there's a "d" flag you can add, which should decompress the data, right? So...

I speculate that when you hand zdb just a raw DVA to read, it has no idea about it's logical size, or physical size, or compression algorithm, or checksum, because all of that info is in the block pointer, and it doesn't know which block pointer is pointing at this DVA. All zdb knows is you gave it an address.

So, when you do zdb -R pool 0:d919ae8c000:1c000:dr, zdb will brute-force attempt to decompress that block with every single compression algorithm known to zfs, until one of the decompression functions returns without an error... AND! also attempt the decompression from every possible starting offset in the block. (This explains why CPU load goes up to 100% for twenty minutes to dump a few MB of data.)

Once upon a time, this would probably, eventually, give you a correctly decompressed block... And then came LZ4...

Actually, the problem isn't with LZ4, it's actually with the zero-run-length-encoder. Basically, any random data you hand to ZLE, is always doing to be decompressable without error. So this brute-force, try-every-decompression-function, method will always return your block, ZLE decompressed, as soon as it tries that.... which it will try, before trying LZ4.

It goes in this order (from include/sys/zio.h):

enum zio_compress {
        ZIO_COMPRESS_INHERIT = 0,
        ZIO_COMPRESS_ON,
        ZIO_COMPRESS_OFF,
        ZIO_COMPRESS_LZJB,
        ZIO_COMPRESS_EMPTY,
        ZIO_COMPRESS_GZIP_1,
        ZIO_COMPRESS_GZIP_2,
        ZIO_COMPRESS_GZIP_3,
        ZIO_COMPRESS_GZIP_4,
        ZIO_COMPRESS_GZIP_5,
        ZIO_COMPRESS_GZIP_6,
        ZIO_COMPRESS_GZIP_7,
        ZIO_COMPRESS_GZIP_8,
        ZIO_COMPRESS_GZIP_9,
        ZIO_COMPRESS_ZLE,
        ZIO_COMPRESS_LZ4,
        ZIO_COMPRESS_FUNCTIONS
};

And the particular brute-force loop is this one from cmd/zdb/zdb.c zdb_read_block()

                enum zio_compress c;
                void *pbuf2 = umem_alloc(SPA_MAXBLOCKSIZE, UMEM_NOFAIL);
                void *lbuf2 = umem_alloc(SPA_MAXBLOCKSIZE, UMEM_NOFAIL);

                bcopy(pbuf, pbuf2, psize);

                VERIFY(random_get_pseudo_bytes((uint8_t *)pbuf + psize,
                    SPA_MAXBLOCKSIZE - psize) == 0);

                VERIFY(random_get_pseudo_bytes((uint8_t *)pbuf2 + psize,
                    SPA_MAXBLOCKSIZE - psize) == 0);

                for (lsize = SPA_MAXBLOCKSIZE; lsize > psize;
                    lsize -= SPA_MINBLOCKSIZE) {
                        for (c = 0; c < ZIO_COMPRESS_FUNCTIONS; c++) {
                                if (zio_decompress_data(c, pbuf, lbuf,
                                    psize, lsize) == 0 &&
                                    zio_decompress_data(c, pbuf2, lbuf2,
                                    psize, lsize) == 0 &&
                                    bcmp(lbuf, lbuf2, lsize) == 0)
                                        break;
                        }
                        if (c != ZIO_COMPRESS_FUNCTIONS)
                                break;
                        lsize -= SPA_MINBLOCKSIZE;
                }

A month ago, I hacked zdb.c to just do LZ4 first, and that mostly worked -- the output block was much more correct. But, then I also needed to pass it the psize and lsize to really get the output block right, and by this point I'm thinking maybe it would be better to just modify zdb_read_block() to take a blkptr as input, and extract all this stuff from there. And then I went on a trip, and I haven't gotten back to working on this at all...

There is an unimplemented "c" checksum check flag, which would also be super-duper useful for me in this situation, because then I can just find the individual block which is corrupt, and only need to examine just that... Of course, this is probably unimplemented, because you'll either need to pass the whole checksum to zdb on the command line... or get zdb to look it up from a user supplied blkptr.

@roopa01
Copy link

roopa01 commented Nov 12, 2020

I have same issue on decompressing lz4 compressed data block. ZDB does not decompress data block gives
error " ZLE decompression was selected. If you suspect the results are wrong,
try avoiding ZLE by setting and exporting ZDB_NO_ZLE="true" ".
I have exporting this variable but issue is not resolved .How can i resolve this issue. I am working on ubuntu 20.04 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature Feature request or new feature
Projects
None yet
Development

No branches or pull requests

4 participants