Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regression with "Add display of checksums to zdb -R" #9644

Closed
ikozhukhov opened this issue Nov 28, 2019 · 3 comments · Fixed by #9657
Closed

regression with "Add display of checksums to zdb -R" #9644

ikozhukhov opened this issue Nov 28, 2019 · 3 comments · Fixed by #9657
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@ikozhukhov
Copy link
Contributor

System information

Type Version/Name
Distribution Name DilOS
Distribution Version
Linux Kernel
Architecture
ZFS Version
SPL Version

Describe the problem you're observing

regression with "Add display of checksums to zdb -R" in 894f669

this commit introduce test zdb_checksum.ksh and in failed on DilOS with:

root@zt1:/var/tmp/test_results/20191128T005140/zdb/zdb_checksum# cat merged
ASSERTION: Verify zdb -R generates the correct checksum.
SUCCESS: zpool create -f testpool mirror c0t600144F0A2C9046800005DDBE49D0001d0 c0t600144F0A2C9046800005DDBE4B70002d0 c0t600144F0A2C9046800005DDBE4C80003d0
SUCCESS: zfs create testpool/testfs
SUCCESS: zfs set mountpoint=/var/tmp/test_results/testdir testpool/testfs
NOTE: file /var/tmp/test_results/testdir/file1 has object number 2
NOTE: block 0 of /var/tmp/test_results/testdir/file1 has a DVA of 0:80009200:20000
NOTE: expecting cksum 52d2d2d28000:14b4de1e09694000:882840503b9b8000:ae2837af9cb4a000
/opt/zfs-tests/tests/functional/cli_root/zdb/zdb_checksum.ksh: line 60: 230219: Memory fault(coredump)
zdb -R failed to print the correct checksum

try to parse core file we can see:

root@zt1:/var/tmp/test_results/20191128T005140# mdb core
Loading modules: [ libumem.so.1 libc.so.1 libzpool.so.1 libnvpair.so.1 libavl.so.1 libcmdutils.so.1 ld.so.1 ]
> $C
fffffc7fffdfe000 libzpool.so.1`zio_checksum_compute+0x250(f812af0, 6, ef67a90, 20000)
fffffc7fffdfe170 zdb_read_block+0xcd9(fffffc7fffdfe8c8, f7a5000)
fffffc7fffdfe270 main+0x79a(4, fffffc7fffdfe2c8)
fffffc7fffdfe2a0 _start_crt+0x83()
fffffc7fffdfe2b0 _start+0x18()
> ::status
debugging core file of zdb (64-bit) from zt1
file: /usr/sbin/zdb
initial argv: zdb -R testpool 0:80009200:20000:c
threading model: native threads
status: process terminated by SIGSEGV (Segmentation Fault), addr=fffffc7fe6e9ab20
> f812af0::print zio_t
{
    io_bookmark = {
        zb_objset = 0xdeadbeefdeadbeef
        zb_object = 0xdeadbeefdeadbeef
        zb_level = 0xdeadbeefdeadbeef
        zb_blkid = 0xdeadbeefdeadbeef
    }
    io_prop = {
        zp_checksum = -0t559038737 (???)
        zp_compress = -0t559038737 (???)
        zp_type = -0t559038737 (???)
        zp_level = 0xef
        zp_copies = 0xbe
        zp_dedup = 0xdeadbeef (B_TRUE|0xffffffffdeadbeee)
        zp_dedup_verify = 0xdeadbeef (B_TRUE|0xffffffffdeadbeee)
        zp_nopwrite = 0xdeadbeef (B_TRUE|0xffffffffdeadbeee)
        zp_encrypt = 0xdeadbeef (B_TRUE|0xffffffffdeadbeee)
        zp_byteorder = 0xdeadbeef (B_TRUE|0xffffffffdeadbeee)
        zp_salt = [ 0xef, 0xbe, 0xad, 0xde, 0xef, 0xbe, 0xad, 0xde ]
        zp_iv = [ 0xef, 0xbe, 0xad, 0xde, 0xef, 0xbe, 0xad, 0xde, 0xef, 0xbe, 0xad, 0xde ]
        zp_mac = [ 0xef, 0xbe, 0xad, 0xde, 0xef, 0xbe, 0xad, 0xde, 0xef, 0xbe, 0xad, 0xde, 0xef, 0xbe, 0xad, 0xde ]
        zp_zpl_smallblk = 0xdeadbeef
    }
    io_type = -0t559038737 (???)
    io_child_type = -0t559038737 (???)
    io_trim_flags = 0xdeadbeef (ZIO_TRIM_SECURE|0xffffffffdeadbeee)
    io_cmd = 0xdeadbeef
    io_priority = -0t559038737 (???)
    io_reexecute = 0xef
    io_state = [ 0xbe, 0xad ]
    io_txg = 0xdeadbeefdeadbeef
    io_spa = 0xdeadbeefdeadbeef
    io_bp = 0xdeadbeefdeadbeef
    io_bp_override = 0xdeadbeefdeadbeef
    io_bp_copy = {
        blk_dva = [
            {
                dva_word = [ 0xdeadbeefdeadbeef, 0xdeadbeefdeadbeef ]
            },
            {
                dva_word = [ 0xdeadbeefdeadbeef, 0xdeadbeefdeadbeef ]
            },
            {
                dva_word = [ 0xdeadbeefdeadbeef, 0xdeadbeefdeadbeef ]
            },
        ]
        blk_prop = 0xdeadbeefdeadbeef
        blk_pad = [ 0xdeadbeefdeadbeef, 0xdeadbeefdeadbeef ]
        blk_phys_birth = 0xdeadbeefdeadbeef
        blk_birth = 0xdeadbeefdeadbeef
        blk_fill = 0xdeadbeefdeadbeef
        blk_cksum = {
            zc_word = [ 0xdeadbeefdeadbeef, 0xdeadbeefdeadbeef, 0xdeadbeefdeadbeef, 0xdeadbeefdeadbeef ]
        }
    }
    io_parent_list = {
        list_size = 0xdeadbeefdeadbeef
        list_offset = 0xdeadbeefdeadbeef
        list_head = {
            list_next = 0xdeadbeefdeadbeef
            list_prev = 0xdeadbeefdeadbeef
        }
    }
    io_child_list = {
        list_size = 0xdeadbeefdeadbeef
        list_offset = 0xdeadbeefdeadbeef
        list_head = {
            list_next = 0xdeadbeefdeadbeef
            list_prev = 0xdeadbeefdeadbeef
        }
    }
    io_logical = 0xdeadbeefdeadbeef
    io_transform_stack = 0xdeadbeefdeadbeef
    io_ready = 0xdeadbeefdeadbeef
    io_children_ready = 0xdeadbeefdeadbeef
    io_physdone = 0xdeadbeefdeadbeef
    io_done = 0xdeadbeefdeadbeef
    io_private = 0xdeadbeefdeadbeef
    io_prev_space_delta = 0xdeadbeefdeadbeef
    io_bp_orig = {
        blk_dva = [
            {
                dva_word = [ 0xdeadbeefdeadbeef, 0xdeadbeefdeadbeef ]
            },
            {
                dva_word = [ 0xdeadbeefdeadbeef, 0xdeadbeefdeadbeef ]
            },
            {
                dva_word = [ 0xdeadbeefdeadbeef, 0xdeadbeefdeadbeef ]
            },
        ]
        blk_prop = 0xdeadbeefdeadbeef
        blk_pad = [ 0xdeadbeefdeadbeef, 0xdeadbeefdeadbeef ]
        blk_phys_birth = 0xdeadbeefdeadbeef
        blk_birth = 0xdeadbeefdeadbeef
        blk_fill = 0xdeadbeefdeadbeef
        blk_cksum = {
            zc_word = [ 0xdeadbeefdeadbeef, 0xdeadbeefdeadbeef, 0xdeadbeefdeadbeef, 0xdeadbeefdeadbeef ]
        }
    }
    io_lsize = 0xdeadbeefdeadbeef
    io_abd = 0xdeadbeefdeadbeef
    io_orig_abd = 0xdeadbeefdeadbeef
    io_size = 0xdeadbeefdeadbeef
    io_orig_size = 0xdeadbeefdeadbeef
    io_vd = 0xdeadbeefdeadbeef
    io_vsd = 0xdeadbeefdeadbeef
    io_vsd_ops = 0xdeadbeefdeadbeef
    io_metaslab_class = 0xdeadbeefdeadbeef
    io_offset = 0xdeadbeefdeadbeef
    io_timestamp = 0xdeadbeefdeadbeef
    io_queued_timestamp = 0xdeadbeefdeadbeef
    io_target_timestamp = 0xdeadbeefdeadbeef
    io_dispatched = 0xdeadbeefdeadbeef
    io_delta = 0xdeadbeefdeadbeef
    io_delay = 0xdeadbeefdeadbeef
    io_queue_node = {
        avl_child = [ 0xdeadbeefdeadbeef, 0xdeadbeefdeadbeef ]
        avl_pcb = 0xdeadbeefdeadbeef
    }
    io_offset_node = {
        avl_child = [ 0xdeadbeefdeadbeef, 0xdeadbeefdeadbeef ]
        avl_pcb = 0xdeadbeefdeadbeef
    }
    io_alloc_node = {
        avl_child = [ 0xdeadbeefdeadbeef, 0xdeadbeefdeadbeef ]
        avl_pcb = 0xdeadbeefdeadbeef
    }
    io_alloc_list = {
        zal_list = {
            list_size = 0xdeadbeefdeadbeef
            list_offset = 0xdeadbeefdeadbeef
            list_head = {
                list_next = 0xdeadbeefdeadbeef
                list_prev = 0xdeadbeefdeadbeef
            }
        }
        zal_size = 0xdeadbeefdeadbeef
    }
    io_flags = -0t559038737 (???)
    io_stage = 0xdeadbeef (ZIO_STAGE_{OPEN|READ_BP_INIT|WRITE_BP_INIT|FREE_BP_INIT|WRITE_COMPRESS|ENCRYPT|CHECKSUM_GENERATE|DDT_READ_START|DDT_READ_DONE|DDT_WRITE|DDT_FREE|GANG_ASSEMBLE|DVA_THROTTLE|DVA_ALLOCATE|DVA_CLAIM|READY|VDEV_IO_DONE|CHECKSUM_VERIFY}|0xffffffffde000000)
    io_pipeline = 0xdeadbeef (ZIO_STAGE_{OPEN|READ_BP_INIT|WRITE_BP_INIT|FREE_BP_INIT|WRITE_COMPRESS|ENCRYPT|CHECKSUM_GENERATE|DDT_READ_START|DDT_READ_DONE|DDT_WRITE|DDT_FREE|GANG_ASSEMBLE|DVA_THROTTLE|DVA_ALLOCATE|DVA_CLAIM|READY|VDEV_IO_DONE|CHECKSUM_VERIFY}|0xffffffffde000000)
    io_orig_flags = -0t559038737 (???)
    io_orig_stage = 0xdeadbeef (ZIO_STAGE_{OPEN|READ_BP_INIT|WRITE_BP_INIT|FREE_BP_INIT|WRITE_COMPRESS|ENCRYPT|CHECKSUM_GENERATE|DDT_READ_START|DDT_READ_DONE|DDT_WRITE|DDT_FREE|GANG_ASSEMBLE|DVA_THROTTLE|DVA_ALLOCATE|DVA_CLAIM|READY|VDEV_IO_DONE|CHECKSUM_VERIFY}|0xffffffffde000000)
    io_orig_pipeline = 0xdeadbeef (ZIO_STAGE_{OPEN|READ_BP_INIT|WRITE_BP_INIT|FREE_BP_INIT|WRITE_COMPRESS|ENCRYPT|CHECKSUM_GENERATE|DDT_READ_START|DDT_READ_DONE|DDT_WRITE|DDT_FREE|GANG_ASSEMBLE|DVA_THROTTLE|DVA_ALLOCATE|DVA_CLAIM|READY|VDEV_IO_DONE|CHECKSUM_VERIFY}|0xffffffffde000000)
    io_pipeline_trace = 0xdeadbeef (ZIO_STAGE_{OPEN|READ_BP_INIT|WRITE_BP_INIT|FREE_BP_INIT|WRITE_COMPRESS|ENCRYPT|CHECKSUM_GENERATE|DDT_READ_START|DDT_READ_DONE|DDT_WRITE|DDT_FREE|GANG_ASSEMBLE|DVA_THROTTLE|DVA_ALLOCATE|DVA_CLAIM|READY|VDEV_IO_DONE|CHECKSUM_VERIFY}|0xffffffffde000000)
    io_error = 0xdeadbeef
    io_child_error = [ 0xdeadbeef, 0xdeadbeef, 0xdeadbeef, 0xdeadbeef ]
    io_children = [
        [ 0xdeadbeefdeadbeef, 0xdeadbeefdeadbeef ]
        [ 0xdeadbeefdeadbeef, 0xdeadbeefdeadbeef ]
        [ 0xdeadbeefdeadbeef, 0xdeadbeefdeadbeef ]
        [ 0xdeadbeefdeadbeef, 0xdeadbeefdeadbeef ]
    ]
    io_child_count = 0xdeadbeefdeadbeef
    io_phys_children = 0xdeadbeefdeadbeef
    io_parent_count = 0xdeadbeefdeadbeef
    io_stall = 0xdeadbeefdeadbeef
    io_gang_leader = 0xdeadbeefdeadbeef
    io_gang_tree = 0xdeadbeefdeadbeef
    io_executor = 0xdeadbeefdeadbeef
    io_waiter = 0xdeadbeefdeadbeef
    io_lock = {
        m_owner = 0xdeadbeefdeadbeef
        initialized = 0xdeadbeef (B_TRUE|0xffffffffdeadbeee)
        m_lock = {
            flags = {
                flag1 = 0xbeef
                flag2 = 0xad
                ceiling = 0xde
                mbcp_type_un = {
                    bcptype = 0xbeef
                    mtype_rcount = {
                        count_type1 = 0xef
                        count_type2 = 0xbe
                    }
                }
                magic = 0xdead
            }
            lock = {
                lock64 = {
                    pad = [ 0xef, 0xbe, 0xad, 0xde, 0xef, 0xbe, 0xad, 0xde ]
                }
                lock32 = {
                    ownerpid = 0xdeadbeef
                    lockword = 0xdeadbeef
                }
                owner64 = 0xdeadbeefdeadbeef
            }
            data = 0xdeadbeefdeadbeef
        }
    }
    io_cv = {
        flags = {
            flag = [ 0xef, 0xbe, 0xad, 0xde ]
            type = 0xbeef
            magic = 0xdead
        }
        data = 0xdeadbeefdeadbeef
    }
    io_allocator = 0xdeadbeef
    io_cksum_report = 0xdeadbeefdeadbeef
    io_ena = 0xdeadbeefdeadbeef
    io_zoneid = 0xdeadbeef
    io_tqent = {
        tqent_next = 0xdeadbeefdeadbeef
        tqent_prev = 0xdeadbeefdeadbeef
        tqent_func = 0xdeadbeefdeadbeef
        tqent_arg = 0xdeadbeefdeadbeef
        tqent_flags = 0xdeadbeefdeadbeef
    }
}
> $C
fffffc7fffdfe000 libzpool.so.1`zio_checksum_compute+0x250(f812af0, 6, ef67a90, 20000)
fffffc7fffdfe170 zdb_read_block+0xcd9(fffffc7fffdfe8c8, f7a5000)
fffffc7fffdfe270 main+0x79a(4, fffffc7fffdfe2c8)
fffffc7fffdfe2a0 _start_crt+0x83()
fffffc7fffdfe2b0 _start+0x18()
> f812af0::whatis
f812af0 is freed from zio_cache:
            ADDR          BUFADDR        TIMESTAMP           THREAD
                            CACHE          LASTLOG         CONTENTS
         f8169a0          f812af0      1ac9672cc8e                1
                           4d2028                0           44e230
                 libumem.so.1`umem_cache_free_debug+0xf9
                 libumem.so.1`umem_cache_free+0x47
                 libzpool.so.1`zio_destroy+0x5f
                 libzpool.so.1`zio_wait+0x1ac
                 zdb_read_block+0xcbc
                 main+0x79a
                 _start_crt+0x83
                 _start+0x18

>

based on info, first address with zio_t *zio has been freed in function

void
zio_checksum_compute(zio_t *zio, enum zio_checksum checksum,
    abd_t *abd, uint64_t size)

Describe how to reproduce the problem

run ZTS and check logs:

Test: /opt/zfs-tests/tests/functional/clean_mirror/setup.ksh (run as root) [00:00] [PASS]
Test: /opt/zfs-tests/tests/functional/clean_mirror/clean_mirror_001_pos.ksh (run as root) [00:03]
Test: /opt/zfs-tests/tests/functional/clean_mirror/clean_mirror_002_pos.ksh (run as root) [00:03]
Test: /opt/zfs-tests/tests/functional/clean_mirror/clean_mirror_003_pos.ksh (run as root) [00:12]
Test: /opt/zfs-tests/tests/functional/clean_mirror/clean_mirror_004_pos.ksh (run as root) [00:12]
Test: /opt/zfs-tests/tests/functional/clean_mirror/cleanup.ksh (run as root) [00:00] [PASS]
Test: /opt/zfs-tests/tests/functional/cli_root/zdb/zdb_001_neg.ksh (run as root) [01:18] [PASS]
Test: /opt/zfs-tests/tests/functional/cli_root/zdb/zdb_002_pos.ksh (run as root) [00:20] [PASS]
Test: /opt/zfs-tests/tests/functional/cli_root/zdb/zdb_003_pos.ksh (run as root) [00:05] [PASS]
Test: /opt/zfs-tests/tests/functional/cli_root/zdb/zdb_004_pos.ksh (run as root) [00:05] [PASS]
Test: /opt/zfs-tests/tests/functional/cli_root/zdb/zdb_005_pos.ksh (run as root) [00:07] [PASS]
Test: /opt/zfs-tests/tests/functional/cli_root/zdb/zdb_006_pos.ksh (run as root) [00:38] [PASS]
Test: /opt/zfs-tests/tests/functional/cli_root/zdb/zdb_checksum.ksh (run as root) [00:11] [FAIL]
Test: /opt/zfs-tests/tests/functional/cli_root/zfs/setup.ksh (run as root) [00:00] [PASS]
Test: /opt/zfs-tests/tests/functional/cli_root/zfs/zfs_001_neg.ksh (run as root) [00:03] [PASS]
Test: /opt/zfs-tests/tests/functional/cli_root/zfs/zfs_002_pos.ksh (run as root) [00:07] [FAIL]
Test: /opt/zfs-tests/tests/functional/cli_root/zfs/zfs_003_neg.ksh (run as root) [00:00] [PASS]
Test: /opt/zfs-tests/tests/functional/cli_root/zfs/cleanup.ksh (run as root) [00:00] [PASS]

Include any warning/errors/backtraces from the system logs

@ikozhukhov
Copy link
Contributor Author

cc @behlendorf

@ikozhukhov
Copy link
Contributor Author

cc @PaulZ-98

@PaulZ-98
Copy link
Contributor

Thanks for reporting this. I will have to use another zio for zio_checksum_compute instead of the destroyed one.

PaulZ-98 added a commit to datto/zfs that referenced this issue Dec 2, 2019
The checksum display code of zdb_read_block uses a zio
to read in the block and then calls zio_checksum_compute.
Use a new zio in the call to zio_checksum_compute not the zio
from the read which has been destroyed by zio_wait.

Signed-off-by: Paul Zuchowski <[email protected]>
Fixes openzfs#9644
@behlendorf behlendorf added the Type: Defect Incorrect behavior (e.g. crash, hang) label Dec 2, 2019
PaulZ-98 added a commit to datto/zfs that referenced this issue Dec 2, 2019
The checksum display code of zdb_read_block uses a zio
to read in the block and then calls zio_checksum_compute.
Use a new zio in the call to zio_checksum_compute not the zio
from the read which has been destroyed by zio_wait.

Signed-off-by: Paul Zuchowski <[email protected]>
Fixes openzfs#9644
PaulZ-98 added a commit to datto/zfs that referenced this issue Dec 2, 2019
The checksum display code of zdb_read_block uses a zio
to read in the block and then calls zio_checksum_compute.
Use a new zio in the call to zio_checksum_compute not the zio
from the read which has been destroyed by zio_wait.

Signed-off-by: Paul Zuchowski <[email protected]>
Fixes openzfs#9644
behlendorf pushed a commit that referenced this issue Dec 3, 2019
The checksum display code of zdb_read_block uses a zio
to read in the block and then calls zio_checksum_compute.
Use a new zio in the call to zio_checksum_compute not the zio
from the read which has been destroyed by zio_wait.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Igor Kozhukhov <[email protected]>
Signed-off-by: Paul Zuchowski <[email protected]>
Closes #9644
Closes #9657
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Dec 26, 2019
The checksum display code of zdb_read_block uses a zio
to read in the block and then calls zio_checksum_compute.
Use a new zio in the call to zio_checksum_compute not the zio
from the read which has been destroyed by zio_wait.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Igor Kozhukhov <[email protected]>
Signed-off-by: Paul Zuchowski <[email protected]>
Closes openzfs#9644
Closes openzfs#9657
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Dec 27, 2019
The checksum display code of zdb_read_block uses a zio
to read in the block and then calls zio_checksum_compute.
Use a new zio in the call to zio_checksum_compute not the zio
from the read which has been destroyed by zio_wait.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Igor Kozhukhov <[email protected]>
Signed-off-by: Paul Zuchowski <[email protected]>
Closes openzfs#9644
Closes openzfs#9657
tonyhutter pushed a commit that referenced this issue Jan 23, 2020
The checksum display code of zdb_read_block uses a zio
to read in the block and then calls zio_checksum_compute.
Use a new zio in the call to zio_checksum_compute not the zio
from the read which has been destroyed by zio_wait.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Igor Kozhukhov <[email protected]>
Signed-off-by: Paul Zuchowski <[email protected]>
Closes #9644
Closes #9657
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants