ZFS hangs on kernel error: VERIFY3(0 == dmu_bonus_hold_by_dnode #12001

vicsn · 2021-05-05T08:42:48Z

System information

Type	Version/Name
Distribution Name	Debian
Distribution Version	Buster
Linux Kernel	5.10.0-0.bpo.4-amd64 / 4.19.0-16
Architecture	amd64
ZFS Version	2.0.3-1~bpo10+1
SPL Version	2.0.3-1~bpo10+1

Zpool properties:

              -o ashift=12
              -O checksum=sha256
              -O compression=lz4
              -O atime=off
              -O normalization=formD
              -O xattr=sa
              -O relatime=on

Example test machine properties:

CPU model name : AMD Ryzen Threadripper 2990WX 32-Core Processor
6 HDDs of model: Gigabyte Technology Co., Ltd. X399 AORUS PRO/X399 AORUS PRO-CF, BIOS F2g 05/08/2019

Describe the problem you're observing

During an incremental send using zfs send -I | receive -F, ZFS hangs. Any subsequent call to ZFS or e.g. opening any file on the ZFS filesystem, will hang indefinitely.

Note that we also experienced: #12014 - perhaps it is related.

Describe how to reproduce the problem

This is happening frequently for us, on multiple machines, all using HDDs. The machines act as backup targets so most activity is receiving snapshots all day. Other zfs related processes on the machines are: snapshots are taken and pruned multiple times per day and a scrub happens once a week. We use encrypted filesystems and a single zpool. Historically the error has triggered for us between 1 and 48 hours.

Total data usage is 2.12T, with zpool iostat output showing:

              capacity     operations     bandwidth 
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
data        4.37T  10.2T     17    172   967K  5.46M

Current mitigation

Downgrading to zfs version 0.8.4-1~bpo10+1 seems to have resolved the issue - at the time of writing we had about a ~~week~~ month without ZFS hanging.

Include any warning/errors/backtraces from the system logs

The first error

|May  4 17:11:31 <host> kernel: [96617.381743] VERIFY3(0 == dmu_bonus_hold_by_dnode(dn, FTAG, &db, flags)) failed (0 == 5)                               
│May  4 17:11:31 <host> kernel: [96617.381882] PANIC at dmu_recv.c:1811:receive_object()                                                                
│May  4 17:11:31 <host> kernel: [96617.381990] Showing stack for process 97766                                                                          
│May  4 17:11:31 <host> kernel: [96617.382002] CPU: 55 PID: 97766 Comm: receive_writer Tainted: P           OE     4.19.0-16-amd64 #1 Debian 4.19.181-1 
│May  4 17:11:31 <host> kernel: [96617.382006] Hardware name: Gigabyte Technology Co., Ltd. X399 AORUS PRO/X399 AORUS PRO-CF, BIOS F2g 05/08/2019       
│May  4 17:11:31 <host> kernel: [96617.382008] Call Trace:                                                                                              
│May  4 17:11:31 <host> kernel: [96617.382058]  dump_stack+0x66/0x81                                                                                    
│May  4 17:11:31 <host> kernel: [96617.382098]  spl_panic+0xd3/0xfb [spl]                                                                               
│May  4 17:11:31 <host> kernel: [96617.382454]  ? dbuf_rele_and_unlock+0x30f/0x660 [zfs]                                                                
│May  4 17:11:31 <host> kernel: [96617.382562]  ? dbuf_read+0x1ca/0x520 [zfs]                                                                           
│May  4 17:11:31 <host> kernel: [96617.382642]  ? dnode_hold_impl+0x350/0xc20 [zfs]                                                                     
│May  4 17:11:31 <host> kernel: [96617.382652]  ? spl_kmem_cache_free+0xec/0x1c0 [spl]                                                                  
│May  4 17:11:31 <host> kernel: [96617.382728]  receive_object+0x923/0xc70 [zfs]                                                                        
│May  4 17:11:31 <host> kernel: [96617.382803]  receive_writer_thread+0x279/0xa00 [zfs]                                                                 
│May  4 17:11:31 <host> kernel: [96617.382817]  ? set_curr_task_fair+0x26/0x50                                                                          
│May  4 17:11:31 <host> kernel: [96617.382894]  ? receive_process_write_record+0x190/0x190 [zfs]                                                        
│May  4 17:11:31 <host> kernel: [96617.382901]  ? __thread_exit+0x20/0x20 [spl]                                                                         
│May  4 17:11:31 <host> kernel: [96617.382905]  ? thread_generic_wrapper+0x6f/0x80 [spl]                                                                
│May  4 17:11:31 <host> kernel: [96617.382910]  thread_generic_wrapper+0x6f/0x80 [spl]                                                                  
│May  4 17:11:31 <host> kernel: [96617.382920]  kthread+0x112/0x130                                                                                     
│May  4 17:11:31 <host> kernel: [96617.382926]  ? kthread_bind+0x30/0x30                                                                                
│May  4 17:11:31 <host> kernel: [96617.382935]  ret_from_fork+0x22/0x40

Subsequent errors

│May 4 17:16:05 <host> kernel: [96890.895937] hrtimer: interrupt took 74060 ns
│May 4 17:16:20 <host> kernel: [96906.140617] INFO: task txg_quiesce:1694 blocked for more than 120 seconds.
│May 4 17:16:20 <host> kernel: [96906.140714] Tainted: P OE 4.19.0-16-amd64 #1 Debian 4.19.181-1
│May 4 17:16:20 <host> kernel: [96906.140799] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
│May 4 17:16:20 <host> kernel: [96906.140884] txg_quiesce D 0 1694 2 0x80000000
│May 4 17:16:20 <host> kernel: [96906.140889] Call Trace:
│May 4 17:16:20 <host> kernel: [96906.140903] __schedule+0x29f/0x840
│May 4 17:16:20 <host> kernel: [96906.140908] schedule+0x28/0x80
│May 4 17:16:20 <host> kernel: [96906.140924] cv_wait_common+0xfb/0x130 [spl]
│May 4 17:16:20 <host> kernel: [96906.140935] ? finish_wait+0x80/0x80
│May 4 17:16:20 <host> kernel: [96906.141077] txg_quiesce_thread+0x2a9/0x3a0 [zfs]
│May 4 17:16:20 <host> kernel: [96906.141169] ? txg_sync_thread+0x480/0x480 [zfs]
│May 4 17:16:20 <host> kernel: [96906.141176] ? __thread_exit+0x20/0x20 [spl]
│May 4 17:16:20 <host> kernel: [96906.141182] thread_generic_wrapper+0x6f/0x80 [spl]
│May 4 17:16:20 <host> kernel: [96906.141187] kthread+0x112/0x130
│May 4 17:16:20 <host> kernel: [96906.141190] ? kthread_bind+0x30/0x30
│May 4 17:16:20 <host> kernel: [96906.141193] ret_from_fork+0x22/0x40
│May 4 17:16:20 <host> kernel: [96906.141783] INFO: task receive_writer:97766 blocked for more than 120 seconds. ┤
│May 4 17:16:20 <host> kernel: [96906.141865] Tainted: P OE 4.19.0-16-amd64 #1 Debian 4.19.181-1 ┤
│May 4 17:16:20 <host> kernel: [96906.141963] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ┤
│May 4 17:16:20 <host> kernel: [96906.142065] receive_writer D 0 97766 2 0x80000000 ┤
│May 4 17:16:20 <host> kernel: [96906.142068] Call Trace: ┤
│May 4 17:16:20 <host> kernel: [96906.142074] __schedule+0x29f/0x840 ┤
│May 4 17:16:20 <host> kernel: [96906.142078] ? ret_from_fork+0x21/0x40 ┤
│May 4 17:16:20 <host> kernel: [96906.142083] schedule+0x28/0x80 ┤
│May 4 17:16:20 <host> kernel: [96906.142091] spl_panic+0xf9/0xfb [spl] ┤
│May 4 17:16:20 <host> kernel: [96906.142192] ? dbuf_rele_and_unlock+0x30f/0x660 [zfs] ┤
│May 4 17:16:20 <host> kernel: [96906.142239] ? dbuf_read+0x1ca/0x520 [zfs] │
│May 4 17:16:20 <host> kernel: [96906.142289] ? dnode_hold_impl+0x350/0xc20 [zfs] ┤
│May 4 17:16:20 <host> kernel: [96906.142295] ? spl_kmem_cache_free+0xec/0x1c0 [spl] ┤
│May 4 17:16:20 <host> kernel: [96906.142343] receive_object+0x923/0xc70 [zfs] ┤
│May 4 17:16:20 <host> kernel: [96906.142394] receive_writer_thread+0x279/0xa00 [zfs] ┤
│May 4 17:16:20 <host> kernel: [96906.142399] ? set_curr_task_fair+0x26/0x50 ┤
│May 4 17:16:20 <host> kernel: [96906.142448] ? receive_process_write_record+0x190/0x190 [zfs] ┤
│May 4 17:16:20 <host> kernel: [96906.142455] ? __thread_exit+0x20/0x20 [spl] ┤
│May 4 17:16:20 <host> kernel: [96906.142460] ? thread_generic_wrapper+0x6f/0x80 [spl] ┤
│May 4 17:16:20 <host> kernel: [96906.142465] thread_generic_wrapper+0x6f/0x80 [spl] ┤
│May 4 17:16:20 <host> kernel: [96906.142469] kthread+0x112/0x130 ┤
│May 4 17:16:20 <host> kernel: [96906.142471] ? kthread_bind+0x30/0x30 ┤
│May 4 17:16:20 <host> kernel: [96906.142473] ret_from_fork+0x22/0x40

EDIT May 11th 2021: downgrading ZFS to 0.8.4-1~bpo10+1 resolved the issue.
EDIT June 14th 2021: the issue has still been resolved on the downgraded ZFS version.
EDIT December 20th 2021: the issue is still there. Also with zfs-2.0.3-9 on 5.10.0-8-amd64

The text was updated successfully, but these errors were encountered:

stewartadam · 2021-11-21T00:00:37Z

Same issue here, ran into this while copying many snapshots over from one pool to another. Stack trace is similar, but a little different on ZFS 2.1.1:

Nov 17 23:53:57 localhost kernel: VERIFY3(0 == dmu_bonus_hold_by_dnode(dn, FTAG, &db, flags)) failed (0 == 5)
Nov 17 23:53:57 localhost kernel: PANIC at dmu_recv.c:1798:receive_object()
Nov 17 23:53:57 localhost kernel: Showing stack for process 1363685
Nov 17 23:53:57 localhost kernel: CPU: 2 PID: 1363685 Comm: receive_writer Tainted: P           OE     5.13.12-200.fc34.x86_64 #1
Nov 17 23:53:57 localhost kernel: Hardware name: Supermicro X10SLH-F/X10SLM+-F/X10SLH-F/X10SLM+-F, BIOS 3.2a 05/31/2019
Nov 17 23:53:57 localhost kernel: Call Trace:
Nov 17 23:53:57 localhost kernel: dump_stack+0x76/0x94
Nov 17 23:53:57 localhost kernel: spl_panic+0xd4/0xfc [spl]
Nov 17 23:53:57 localhost kernel: ? dbuf_rele_and_unlock+0x487/0x640 [zfs]
Nov 17 23:53:57 localhost kernel: ? dbuf_create+0x5c8/0x5f0 [zfs]
Nov 17 23:53:57 localhost kernel: ? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e
Nov 17 23:53:57 localhost kernel: ? dnode_rele_and_unlock+0x64/0xc0 [zfs]
Nov 17 23:53:57 localhost kernel: receive_object+0xa59/0xc50 [zfs]
Nov 17 23:53:57 localhost kernel: receive_writer_thread+0x1ba/0xac0 [zfs]
Nov 17 23:53:57 localhost kernel: ? kfree+0x403/0x430
Nov 17 23:53:57 localhost kernel: ? thread_generic_wrapper+0x62/0x80 [spl]
Nov 17 23:53:57 localhost kernel: ? receive_read_payload_and_next_header+0x2f0/0x2f0 [zfs]
Nov 17 23:53:57 localhost kernel: ? thread_generic_wrapper+0x6f/0x80 [spl]
Nov 17 23:53:57 localhost kernel: thread_generic_wrapper+0x6f/0x80 [spl]
Nov 17 23:53:57 localhost kernel: ? __thread_exit+0x20/0x20 [spl]
Nov 17 23:53:57 localhost kernel: kthread+0x127/0x150
Nov 17 23:53:57 localhost kernel: ? set_kthread_struct+0x40/0x40
Nov 17 23:53:57 localhost kernel: ret_from_fork+0x22/0x30

It does seem random as after reboot I re-ran the exact same zfs send | zfs recv (with resume token) and it completed no issues.

whyamiroot · 2021-12-20T13:06:02Z

Any updates for this issue or #12785?

vicsn · 2022-01-07T08:31:36Z

Some debugging output after a recent crash

The issue occured again on january 6 14:33:34 - I base this on default log output which I think is UTC+1 by default. This would translate to timestamp 1641476014.

zfs_get_all.txt
zfs_dbgmsg.txt
zpool_history.txt

EDIT: logs were set to UTC + 1

vicsn · 2022-01-11T08:12:23Z

One more set of logs of the same event happening a day later at Jan 7 20:20:22 1641583222

dbgmsg_after_reboot.txt
zpool_history.txt
dbgmsg.txt

With the following zpool status -v output:

  pool: data
 state: ONLINE
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 12:50:05 with 0 errors on Tue Jan  4 00:12:53 2022
config:

	NAME        STATE     READ WRITE CKSUM
	data        ONLINE       0     0     0
	  raidz2-0  ONLINE       0     0     0
	    sda     ONLINE       0     0     0
	    sdb     ONLINE       0     0     0
	    sdd     ONLINE       0     0     0
	    sde     ONLINE       0     0     0
	spares
	  sdc       AVAIL   

errors: Permanent errors have been detected in the following files:

        data/private/private/private/private/private/%recv:<0x0>

gamanakis · 2022-02-03T10:18:52Z

I am pretty sure the panic happens because dbuf_read() returns an EIO error in the codepath: receive_object() -> dmu_bonus_hold_by_dnode() -> dbuf_read().

Anyone who can replicate the bug:

set echo 512 | sudo tee /sys/module/zfs/parameters/zfs_flags
Trigger the bug
post a cat /proc/spl/kstat/zfs/dbgmsg

malfret · 2022-04-04T11:23:47Z

System information

Type	Version/Name
Distribution Name	Proxmox
Distribution Version	7.1.12
Linux Kernel	5.13.19-6-pve
Architecture	amd64
ZFS Version	2.1.4-pve1

zpool get all.txt

zfs send -c -I | ssh ... receive -Fe
echo 512 > /sys/module/zfs/parameters/zfs_flags

syslog.txt
dbgmsg.txt

Time syslog UTC+4

gamanakis · 2022-04-04T12:34:56Z

@malfret thank you for the logs. Based on the time the panic occurred I went to look for events in dbgmsg.txt around 1648983726. However the events logged into dbgmsg.txt are about 10 minutes later than the panic (starting at 1648984381).

If you can reproduce this, would you mind posting a dbgmsg (with echo 512 > /sys/module/zfs/parameters/zfs_flags set beforehand) right after the panic occurs?

malfret · 2022-04-04T12:44:50Z

At the time of panic, the file system was not working properly and the shell script was unable to save the log. The log was saved manually.

How to increase the log size so that the data can be saved?

malfret · 2022-04-04T12:50:36Z

If I change zfs_dbgmsg_maxsize - it will increase the size of the dbgmsg log?

echo $((128*1024*1024)) >> /sys/module/zfs/parameters/zfs_dbgmsg_maxsize

gamanakis · 2022-04-04T21:21:14Z

If I change zfs_dbgmsg_maxsize - it will increase the size of the dbgmsg log?

echo $((128*1024*1024)) >> /sys/module/zfs/parameters/zfs_dbgmsg_maxsize

Right, that may help.

malfret · 2022-04-06T02:03:26Z

2022-04-06_dbgmsg.log.zip
syslog 2022-04-06.txt

Time syslog UTC+4

gamanakis · 2022-05-04T19:02:12Z

The panic seems to happen because arc_untransform() returns an EIO error in the codepath: receive_object() -> dmu_bonus_hold_by_dnode() -> dbuf_read() -> arc_untransform().

And that happens because arc_buf_fill() returns an error ECKSUM (EBADE or 52 in linux).

rockhouse · 2022-11-29T18:42:46Z

@gamanakis We are still experiencing this bug on our production systems, any chance that this will be solved any time soon? We are running the latest version from debian stable 2.0.3-9. Could this be escalated maybe?

Any help is appreciated!

siilike · 2022-12-18T19:46:07Z

Also happened with 2.1.5-1~bpo11+1. Nothing logged to syslog.

kernel:[9838382.870025] VERIFY3(0 == dmu_bonus_hold_by_dnode(dn, FTAG, &db, flags)) failed (0 == 5)
kernel:[9838382.870083] PANIC at dmu_recv.c:1811:receive_object()

singingtelegram · 2022-12-24T03:36:34Z

Also occurred with 2.1.7 (with c8d2ab0 reverted, otherwise I would run into #14252).

Dec 23 12:30:18 jonathan-desktop kernel: VERIFY3(0 == dmu_bonus_hold_by_dnode(dn, FTAG, &db, flags)) failed (0 == 5)
Dec 23 12:30:18 jonathan-desktop kernel: PANIC at dmu_recv.c:1806:receive_object()
Dec 23 12:30:18 jonathan-desktop kernel: Showing stack for process 3273208
Dec 23 12:30:18 jonathan-desktop kernel: CPU: 5 PID: 3273208 Comm: receive_writer Tainted: P           OE      6.1.1-x64v3-xanmod1 #0~20221222.0006b63
Dec 23 12:30:18 jonathan-desktop kernel: Hardware name: To Be Filled By O.E.M. X570 Taichi/X570 Taichi, BIOS P5.00 10/19/2022
Dec 23 12:30:18 jonathan-desktop kernel: Call Trace:
Dec 23 12:30:18 jonathan-desktop kernel:  <TASK>
Dec 23 12:30:18 jonathan-desktop kernel:  dump_stack_lvl+0x44/0x5c
Dec 23 12:30:18 jonathan-desktop kernel:  spl_panic+0xcb/0xe3 [spl]
Dec 23 12:30:18 jonathan-desktop kernel:  ? dbuf_rele_and_unlock+0x5ad/0x730 [zfs]
Dec 23 12:30:18 jonathan-desktop kernel:  ? dbuf_cons+0xa2/0xc0 [zfs]
Dec 23 12:30:18 jonathan-desktop kernel:  ? spl_kmem_cache_alloc+0x9f/0x740 [spl]
Dec 23 12:30:18 jonathan-desktop kernel:  ? dbuf_read+0x10a/0x5e0 [zfs]
Dec 23 12:30:18 jonathan-desktop kernel:  ? aggsum_add+0x16a/0x180 [zfs]
Dec 23 12:30:18 jonathan-desktop kernel:  ? dnode_rele_and_unlock+0x54/0xe0 [zfs]
Dec 23 12:30:18 jonathan-desktop kernel:  receive_object+0xb06/0xd30 [zfs]
Dec 23 12:30:18 jonathan-desktop kernel:  ? dnode_rele_and_unlock+0x54/0xe0 [zfs]
Dec 23 12:30:18 jonathan-desktop kernel:  ? dmu_object_info+0x4f/0x80 [zfs]
Dec 23 12:30:18 jonathan-desktop kernel:  receive_writer_thread+0x1b2/0xa90 [zfs]
Dec 23 12:30:18 jonathan-desktop kernel:  ? set_next_task_fair+0x66/0x90
Dec 23 12:30:18 jonathan-desktop kernel:  ? start_cfs_bandwidth.part.0+0x50/0x50
Dec 23 12:30:18 jonathan-desktop kernel:  ? set_user_nice.part.0+0x11b/0x330
Dec 23 12:30:18 jonathan-desktop kernel:  ? receive_process_write_record+0x160/0x160 [zfs]
Dec 23 12:30:18 jonathan-desktop kernel:  ? __thread_exit+0x10/0x10 [spl]
Dec 23 12:30:18 jonathan-desktop kernel:  thread_generic_wrapper+0x55/0x60 [spl]
Dec 23 12:30:18 jonathan-desktop kernel:  kthread+0xd5/0x100
Dec 23 12:30:18 jonathan-desktop kernel:  ? kthread_complete_and_exit+0x20/0x20
Dec 23 12:30:18 jonathan-desktop kernel:  ret_from_fork+0x22/0x30
Dec 23 12:30:18 jonathan-desktop kernel:  </TASK>

lnagel · 2023-04-23T04:21:10Z

[1687794.507453] VERIFY3(0 == dmu_bonus_hold_by_dnode(dn, FTAG, &db, flags)) failed (0 == 5)
[1687794.507479] PANIC at dmu_recv.c:1806:receive_object()
[1687794.507493] Showing stack for process 2882666
[1687794.507505] CPU: 5 PID: 2882666 Comm: receive_writer Tainted: P      D    OE     5.19.0-38-generic #39~22.04.1-Ubuntu
[1687794.507529] Hardware name: Intel(R) Client Systems NUC8i7BEH/NUC8BEB, BIOS BECFL357.86A.0090.2022.0916.1942 09/16/2022
[1687794.507547] Call Trace:
[1687794.507557]  <TASK>
[1687794.507571]  show_stack+0x52/0x69
[1687794.507598]  dump_stack_lvl+0x49/0x6d
[1687794.507629]  dump_stack+0x10/0x18
[1687794.507647]  spl_dumpstack+0x29/0x35 [spl]
[1687794.507691]  spl_panic+0xd1/0xe9 [spl]
[1687794.507732]  ? arc_space_consume+0x54/0x130 [zfs]
[1687794.508027]  ? dbuf_create+0x5c1/0x5f0 [zfs]
[1687794.508306]  ? dbuf_read+0x11b/0x630 [zfs]
[1687794.508587]  ? dmu_bonus_hold_by_dnode+0x15b/0x1b0 [zfs]
[1687794.508860]  receive_object+0xae1/0xd40 [zfs]
[1687794.509186]  ? __slab_free+0x31/0x340
[1687794.509211]  ? spl_kmem_free+0x32/0x40 [spl]
[1687794.509244]  ? kfree+0x30f/0x330
[1687794.509259]  ? mutex_lock+0x13/0x50
[1687794.509277]  receive_writer_thread+0x1ce/0xb50 [zfs]
[1687794.509560]  ? set_next_task_fair+0x70/0xb0
[1687794.509578]  ? receive_process_write_record+0x1a0/0x1a0 [zfs]
[1687794.509861]  ? spl_kmem_free+0x32/0x40 [spl]
[1687794.509894]  ? kfree+0x30f/0x330
[1687794.509909]  ? receive_process_write_record+0x1a0/0x1a0 [zfs]
[1687794.510191]  ? __thread_exit+0x20/0x20 [spl]
[1687794.510225]  thread_generic_wrapper+0x61/0x80 [spl]
[1687794.510256]  ? thread_generic_wrapper+0x61/0x80 [spl]
[1687794.510288]  kthread+0xeb/0x120
[1687794.510303]  ? kthread_complete_and_exit+0x20/0x20
[1687794.510321]  ret_from_fork+0x1f/0x30
[1687794.510343]  </TASK>

singingtelegram · 2023-04-23T09:04:33Z

You might want to update to the latest ZFS release. jonathonf's PPA hasn't been updated in quite a while now. What I did was that I grabbed the zfs-linux source tarball from the lunar repo and compiled it with dpkg-buildpackage.

…

On Sat, Apr 22, 2023 at 9:21 PM Lenno Nagel ***@***.***> wrote: [1687794.507453] VERIFY3(0 == dmu_bonus_hold_by_dnode(dn, FTAG, &db, flags)) failed (0 == 5) [1687794.507479] PANIC at dmu_recv.c:1806:receive_object() [1687794.507493] Showing stack for process 2882666 [1687794.507505] CPU: 5 PID: 2882666 Comm: receive_writer Tainted: P D OE 5.19.0-38-generic #39~22.04.1-Ubuntu [1687794.507529] Hardware name: Intel(R) Client Systems NUC8i7BEH/NUC8BEB, BIOS BECFL357.86A.0090.2022.0916.1942 09/16/2022 [1687794.507547] Call Trace: [1687794.507557] <TASK> [1687794.507571] show_stack+0x52/0x69 [1687794.507598] dump_stack_lvl+0x49/0x6d [1687794.507629] dump_stack+0x10/0x18 [1687794.507647] spl_dumpstack+0x29/0x35 [spl] [1687794.507691] spl_panic+0xd1/0xe9 [spl] [1687794.507732] ? arc_space_consume+0x54/0x130 [zfs] [1687794.508027] ? dbuf_create+0x5c1/0x5f0 [zfs] [1687794.508306] ? dbuf_read+0x11b/0x630 [zfs] [1687794.508587] ? dmu_bonus_hold_by_dnode+0x15b/0x1b0 [zfs] [1687794.508860] receive_object+0xae1/0xd40 [zfs] [1687794.509186] ? __slab_free+0x31/0x340 [1687794.509211] ? spl_kmem_free+0x32/0x40 [spl] [1687794.509244] ? kfree+0x30f/0x330 [1687794.509259] ? mutex_lock+0x13/0x50 [1687794.509277] receive_writer_thread+0x1ce/0xb50 [zfs] [1687794.509560] ? set_next_task_fair+0x70/0xb0 [1687794.509578] ? receive_process_write_record+0x1a0/0x1a0 [zfs] [1687794.509861] ? spl_kmem_free+0x32/0x40 [spl] [1687794.509894] ? kfree+0x30f/0x330 [1687794.509909] ? receive_process_write_record+0x1a0/0x1a0 [zfs] [1687794.510191] ? __thread_exit+0x20/0x20 [spl] [1687794.510225] thread_generic_wrapper+0x61/0x80 [spl] [1687794.510256] ? thread_generic_wrapper+0x61/0x80 [spl] [1687794.510288] kthread+0xeb/0x120 [1687794.510303] ? kthread_complete_and_exit+0x20/0x20 [1687794.510321] ret_from_fork+0x1f/0x30 [1687794.510343] </TASK> — Reply to this email directly, view it on GitHub <#12001 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABI35LW7C3C2JF5BL4ICZVLXCSU4DANCNFSM44EHQVOA> . You are receiving this because you commented.Message ID: ***@***.***>

lnagel · 2023-12-24T14:40:53Z

Using 2.2.0 now

[   46.799495] ZFS: Loaded module v2.2.0-0ubuntu1~23.10, ZFS pool version 5000, ZFS filesystem version 5

The error seems to be the same

[984462.595691] VERIFY0(0 == dmu_bonus_hold_by_dnode(dn, FTAG, &db, flags)) failed (0 == 5)
[984462.595700] PANIC at dmu_recv.c:2093:receive_object()
[984462.595703] Showing stack for process 44582
[984462.595704] CPU: 4 PID: 44582 Comm: receive_writer Tainted: P           O       6.5.0-14-generic #14-Ubuntu
[984462.595706] Hardware name: Intel(R) Client Systems NUC8i7BEH/NUC8BEB, BIOS BECFL357.86A.0090.2022.0916.1942 09/16/2022
[984462.595708] Call Trace:
[984462.595710]  <TASK>
[984462.595713]  dump_stack_lvl+0x48/0x70
[984462.595722]  dump_stack+0x10/0x20
[984462.595724]  spl_dumpstack+0x29/0x40 [spl]
[984462.595742]  spl_panic+0xfc/0x120 [spl]
[984462.595757]  receive_object+0x9ce/0xaa0 [zfs]
[984462.596131]  ? __list_del_entry+0x1d/0x30 [zfs]
[984462.596427]  ? list_del+0xd/0x30 [zfs]
[984462.596674]  receive_process_record+0x84/0x2b0 [zfs]
[984462.596963]  receive_writer_thread+0xb7/0x1a0 [zfs]
[984462.597246]  ? __pfx_receive_writer_thread+0x10/0x10 [zfs]
[984462.597543]  ? __pfx_thread_generic_wrapper+0x10/0x10 [spl]
[984462.597563]  thread_generic_wrapper+0x5c/0x70 [spl]
[984462.597583]  kthread+0xef/0x120
[984462.597587]  ? __pfx_kthread+0x10/0x10
[984462.597590]  ret_from_fork+0x44/0x70
[984462.597594]  ? __pfx_kthread+0x10/0x10
[984462.597597]  ret_from_fork_asm+0x1b/0x30
[984462.597601]  </TASK>

vicsn added Status: Triage Needed New issue which needs to be triaged Type: Defect Incorrect behavior (e.g. crash, hang) labels May 5, 2021

scratchings mentioned this issue Nov 8, 2021

PANIC at dmu_recv.c on receiving snapshot to encrypted file system #12732

Open

darrenfreeman mentioned this issue Nov 23, 2021

Kernel panic on incremental send/recv between two encrypted datasets on the same pool, dest is using zstd-19 #12785

Closed

behlendorf added Component: Encryption "native encryption" feature Component: Send/Recv "zfs send/recv" feature and removed Status: Triage Needed New issue which needs to be triaged labels Jan 4, 2022

singingtelegram mentioned this issue Dec 24, 2022

zfs-2.1.7 zfs recv panic in dsl_dataset_deactivate_feature_impl #14252

Closed

This was referenced Feb 18, 2023

ZFS corruption related to snapshots post-2.0.x upgrade #12014

Open

permanent errors (ereport.fs.zfs.authentication) reported after syncoid snapshot/send workload #11688

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZFS hangs on kernel error: VERIFY3(0 == dmu_bonus_hold_by_dnode #12001

ZFS hangs on kernel error: VERIFY3(0 == dmu_bonus_hold_by_dnode #12001

vicsn commented May 5, 2021 •

edited

Loading

stewartadam commented Nov 21, 2021 •

edited

Loading

whyamiroot commented Dec 20, 2021

vicsn commented Jan 7, 2022 •

edited

Loading

vicsn commented Jan 11, 2022 •

edited

Loading

gamanakis commented Feb 3, 2022

malfret commented Apr 4, 2022

gamanakis commented Apr 4, 2022

malfret commented Apr 4, 2022

malfret commented Apr 4, 2022 •

edited

Loading

gamanakis commented Apr 4, 2022

malfret commented Apr 6, 2022

gamanakis commented May 4, 2022 •

edited

Loading

rockhouse commented Nov 29, 2022

siilike commented Dec 18, 2022

singingtelegram commented Dec 24, 2022

lnagel commented Apr 23, 2023

singingtelegram commented Apr 23, 2023 via email

lnagel commented Dec 24, 2023

ZFS hangs on kernel error: VERIFY3(0 == dmu_bonus_hold_by_dnode #12001

ZFS hangs on kernel error: VERIFY3(0 == dmu_bonus_hold_by_dnode #12001

Comments

vicsn commented May 5, 2021 • edited Loading

System information

Describe the problem you're observing

Describe how to reproduce the problem

Current mitigation

Include any warning/errors/backtraces from the system logs

stewartadam commented Nov 21, 2021 • edited Loading

whyamiroot commented Dec 20, 2021

vicsn commented Jan 7, 2022 • edited Loading

vicsn commented Jan 11, 2022 • edited Loading

gamanakis commented Feb 3, 2022

malfret commented Apr 4, 2022

System information

gamanakis commented Apr 4, 2022

malfret commented Apr 4, 2022

malfret commented Apr 4, 2022 • edited Loading

gamanakis commented Apr 4, 2022

malfret commented Apr 6, 2022

gamanakis commented May 4, 2022 • edited Loading

rockhouse commented Nov 29, 2022

siilike commented Dec 18, 2022

singingtelegram commented Dec 24, 2022

lnagel commented Apr 23, 2023

singingtelegram commented Apr 23, 2023 via email

lnagel commented Dec 24, 2023

vicsn commented May 5, 2021 •

edited

Loading

stewartadam commented Nov 21, 2021 •

edited

Loading

vicsn commented Jan 7, 2022 •

edited

Loading

vicsn commented Jan 11, 2022 •

edited

Loading

malfret commented Apr 4, 2022 •

edited

Loading

gamanakis commented May 4, 2022 •

edited

Loading