Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel panic on incremental send/recv between two encrypted datasets on the same pool, dest is using zstd-19 #12785

Closed
darrenfreeman opened this issue Nov 23, 2021 · 28 comments
Labels
Component: Encryption "native encryption" feature Component: Send/Recv "zfs send/recv" feature Status: Stale No recent activity for issue Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@darrenfreeman
Copy link

System information

Type Version/Name
Distribution Name debian
Distribution Version 11.1
Kernel Version 5.7.0-1-amd64
Architecture x86-64 AMD Opteron 6272 w/ 8 nodes and 64 cores, 128 GiB RAM
OpenZFS Version zfs-2.0.3-9 zfs-kmod-2.0.3-9

Describe the problem you're observing

The zfs send | zfs recv is frozen, and an attempt to zfs set mountpoint= on the source dataset also hanged. So the system had to be rebooted.

Upon reboot, a scrub appears to have started by itself.

  pool: rpool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub in progress since Tue Nov 23 08:00:01 2021
        13.9T scanned at 2.50G/s, 13.3T issued at 0B/s, 15.9T total
        0B repaired, 83.48% done, no estimated completion time
config:

        NAME         STATE     READ WRITE CKSUM
        rpool        ONLINE       0     0     0
          raidz2-0   ONLINE       0     0     0
            rpool0   ONLINE       0     0     0
            rpool1   ONLINE       0     0     0
            rpool2   ONLINE       0     0     0
            rpool3   ONLINE       0     0     0
            rpool4   ONLINE       0     0     0
            rpool5   ONLINE       0     0     0
        logs
          rpool-log  ONLINE       0     0     0
        cache
          sdg5       ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        <0xc6ca>:<0x0>

Describe how to reproduce the problem

I don't think this will reproduce, but it's what I was doing at the time:

zfs send rpool/sub/source@end -I start | pv | zfs recv -u rpool/sub/dest

Source and destination datasets are encrypted using the same root. The only difference should be that compression=lz4 on the source, but compression=zstd-19 on the dest. CPU usage was around 75%, on this system which is otherwise at 1% - I attribute this to zstd compression. This is not the first dataset, on this pool, that I have successfully recompressed using the above technique. I have also done many TB of raw send/recv of encrypted datasets between pools, without rebooting, before this step which failed.

Include any warning/errors/backtraces from the system logs

Nov 23 21:14:24 host kernel: VERIFY3(0 == dmu_bonus_hold_by_dnode(dn, FTAG, &db, flags)) failed (0 == 5)
Nov 23 21:14:24 host kernel: PANIC at dmu_recv.c:1811:receive_object()
Nov 23 21:14:24 host kernel: Showing stack for process 3588403
Nov 23 21:14:24 host kernel: CPU: 20 PID: 3588403 Comm: receive_writer Tainted: P           OE     5.7.0-1-amd64 #1 Debian 5.7.6-1
Nov 23 21:14:24 host kernel: Hardware name: SGI.COM H2106-G7/H8QG6-F, BIOS 080016     06/24/2014
Nov 23 21:14:24 host kernel: Call Trace:
Nov 23 21:14:24 host kernel:  dump_stack+0x66/0x90
Nov 23 21:14:24 host kernel:  spl_panic+0xd4/0xfc [spl]
Nov 23 21:14:24 host kernel:  ? dbuf_rele_and_unlock+0x30c/0x660 [zfs]
Nov 23 21:14:24 host kernel:  ? dbuf_read+0x1cc/0x510 [zfs]
Nov 23 21:14:24 host kernel:  ? zfs_zevent_post+0x1f0/0x2a0 [zfs]
Nov 23 21:14:24 host kernel:  receive_object+0x95d/0xca0 [zfs]
Nov 23 21:14:24 host kernel:  ? dnode_rele_and_unlock+0x5c/0xc0 [zfs]
Nov 23 21:14:24 host kernel:  ? _cond_resched+0x15/0x30
Nov 23 21:14:24 host kernel:  receive_writer_thread+0x1d9/0x9f0 [zfs]
Nov 23 21:14:24 host kernel:  ? receive_process_write_record+0x190/0x190 [zfs]
Nov 23 21:14:24 host kernel:  ? thread_generic_wrapper+0x6f/0x80 [spl]
Nov 23 21:14:24 host kernel:  thread_generic_wrapper+0x6f/0x80 [spl]
Nov 23 21:14:24 host kernel:  kthread+0xf9/0x130
Nov 23 21:14:24 host kernel:  ? __thread_exit+0x20/0x20 [spl]
Nov 23 21:14:24 host kernel:  ? kthread_park+0x90/0x90
Nov 23 21:14:24 host kernel:  ret_from_fork+0x22/0x40

Subsequently, sanoid attempted to snapshot an unrelated dataset. I'm not sure whether it completed, but I suspect not.

And then the inevitable time-out:

Nov 23 21:17:23 host kernel: INFO: task txg_quiesce:1075648 blocked for more than 120 seconds.
Nov 23 21:17:23 host kernel:       Tainted: P           OE     5.7.0-1-amd64 #1 Debian 5.7.6-1
Nov 23 21:17:23 host kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 23 21:17:23 host kernel: txg_quiesce     D    0 1075648      2 0x80004000
Nov 23 21:17:23 host kernel: Call Trace:
Nov 23 21:17:23 host kernel:  __schedule+0x2da/0x770
Nov 23 21:17:23 host kernel:  schedule+0x4a/0xb0
Nov 23 21:17:23 host kernel:  cv_wait_common+0xfb/0x130 [spl]
Nov 23 21:17:23 host kernel:  ? finish_wait+0x80/0x80
Nov 23 21:17:23 host kernel:  txg_quiesce_thread+0x2b1/0x3a0 [zfs]
Nov 23 21:17:23 host kernel:  ? txg_sync_thread+0x4a0/0x4a0 [zfs]
Nov 23 21:17:23 host kernel:  thread_generic_wrapper+0x6f/0x80 [spl]
Nov 23 21:17:23 host kernel:  kthread+0xf9/0x130
Nov 23 21:17:23 host kernel:  ? __thread_exit+0x20/0x20 [spl]
Nov 23 21:17:23 host kernel:  ? kthread_park+0x90/0x90
Nov 23 21:17:23 host kernel:  ret_from_fork+0x22/0x40
Nov 23 21:17:23 host kernel: INFO: task receive_writer:3588403 blocked for more than 120 seconds.
Nov 23 21:17:23 host kernel:       Tainted: P           OE     5.7.0-1-amd64 #1 Debian 5.7.6-1
Nov 23 21:17:23 host kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 23 21:17:23 host kernel: receive_writer  D    0 3588403      2 0x80004000
Nov 23 21:17:23 host kernel: Call Trace:
Nov 23 21:17:23 host kernel:  __schedule+0x2da/0x770
Nov 23 21:17:23 host kernel:  schedule+0x4a/0xb0
Nov 23 21:17:23 host kernel:  spl_panic+0xfa/0xfc [spl]
Nov 23 21:17:23 host kernel:  ? dbuf_rele_and_unlock+0x30c/0x660 [zfs]
Nov 23 21:17:23 host kernel:  ? dbuf_read+0x1cc/0x510 [zfs]
Nov 23 21:17:23 host kernel:  ? zfs_zevent_post+0x1f0/0x2a0 [zfs]
Nov 23 21:17:23 host kernel:  receive_object+0x95d/0xca0 [zfs]
Nov 23 21:17:23 host kernel:  ? dnode_rele_and_unlock+0x5c/0xc0 [zfs]
Nov 23 21:17:23 host kernel:  ? _cond_resched+0x15/0x30
Nov 23 21:17:23 host kernel:  receive_writer_thread+0x1d9/0x9f0 [zfs]
Nov 23 21:17:23 host kernel:  ? receive_process_write_record+0x190/0x190 [zfs]
Nov 23 21:17:23 host kernel:  ? thread_generic_wrapper+0x6f/0x80 [spl]
Nov 23 21:17:23 host kernel:  thread_generic_wrapper+0x6f/0x80 [spl]
Nov 23 21:17:23 host kernel:  kthread+0xf9/0x130
Nov 23 21:17:23 host kernel:  ? __thread_exit+0x20/0x20 [spl]
Nov 23 21:17:23 host kernel:  ? kthread_park+0x90/0x90
Nov 23 21:17:23 host kernel:  ret_from_fork+0x22/0x40

(Omitted three more similar timeouts.)

@darrenfreeman darrenfreeman added the Type: Defect Incorrect behavior (e.g. crash, hang) label Nov 23, 2021
@darrenfreeman
Copy link
Author

Likely related to #12001, #12270.

Maybe related to #12014, #12594, #12732.

@darrenfreeman
Copy link
Author

I allowed this scrub to complete normally, then sent the remaining snapshots to the destination dataset. The "permanent errors" were still listed against the pool.

Then I performed a second scrub, and after that, there are no permanent errors listed. I have not rebooted or export/import etc. in the meantime.

  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 11:55:37 with 0 errors on Wed Nov 24 14:00:59 2021
config:

        NAME         STATE     READ WRITE CKSUM
        rpool        ONLINE       0     0     0
          raidz2-0   ONLINE       0     0     0
            rpool0   ONLINE       0     0     0
            rpool1   ONLINE       0     0     0
            rpool2   ONLINE       0     0     0
            rpool3   ONLINE       0     0     0
            rpool4   ONLINE       0     0     0
            rpool5   ONLINE       0     0     0
        logs
          rpool-log  ONLINE       0     0     0
        cache
          sdg5       ONLINE       0     0     0

errors: No known data errors

This "permanent errors", "data corruption", and "Otherwise restore the entire pool from backup." messaging really needs to change. It's rather frightening, and the solution appears to be to scrub it not once, but twice.

@darrenfreeman
Copy link
Author

Well, it happened again, under similar conditions:

zfs send rpool/src@snapshot | pv | zfs recv -u -o recordsize=1M -o compression=zstd-19 rpool/dest

The output from pv indicates that 21.3 GB of the stream has been processed so far. I noticed some extra things in the system log, which could possibly be involved in triggering this issue:

Nov 24 22:18:37 host systemd[1]: sanoid.service: Succeeded.
Nov 24 22:18:37 host systemd[1]: Finished Snapshot ZFS filesystems.
Nov 24 22:18:37 host systemd[1]: sanoid.service: Consumed 42.905s CPU time.
Nov 24 22:18:38 host systemd[1]: Starting Prune ZFS snapshots...
Nov 24 22:18:39 host sanoid[1613707]: INFO: pruning snapshots...
Nov 24 22:18:39 host sanoid[1613707]: INFO: pruning other/sub/first@autosnap_2021-11-24_08:15:16_frequently ...
Nov 24 22:18:39 host sanoid[1613707]: INFO: removing destroyed snapshots from cache.
Nov 24 22:18:39 host sanoid[1613707]: INFO: pruning other/sub/second@autosnap_2021-11-24_08:15:16_frequently ...
Nov 24 22:18:41 host sanoid[1613707]: INFO: removing destroyed snapshots from cache.
Nov 24 22:18:41 host sanoid[1613707]: INFO: pruning other/third@autosnap_2021-11-24_08:15:16_frequently ...
Nov 24 22:18:43 host sanoid[1613707]: INFO: removing destroyed snapshots from cache.
Nov 24 22:18:43 host sanoid[1613707]: INFO: pruning other/another/fourth@autosnap_2021-11-24_10:45:20_frequently ...
Nov 24 22:18:43 host sanoid[1613707]: INFO: deferring pruning of other/another/fourth@autosnap_2021-11-24_10:45:20_frequently - other/another/p>
Nov 24 22:18:43 host sanoid[1613707]: INFO: pruning rpool/another/fifth@autosnap_2021-11-24_10:15:05_frequently ...
Nov 24 22:19:02 host CRON[1615373]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Nov 24 22:19:02 host CRON[1615411]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Nov 24 22:19:02 host CRON[1615373]: pam_unix(cron:session): session closed for user root
Nov 24 22:20:00 host sanoid[1613707]: INFO: removing destroyed snapshots from cache.
Nov 24 22:20:00 host sanoid[1613707]: INFO: pruning other/sub/sixth@autosnap_2021-11-24_08:15:16_frequently ...
Nov 24 22:20:00 host kernel: VERIFY3(0 == dmu_bonus_hold_by_dnode(dn, FTAG, &db, flags)) failed (0 == 5)
Nov 24 22:20:00 host kernel: PANIC at dmu_recv.c:1811:receive_object()
Nov 24 22:20:00 host kernel: Showing stack for process 1622663
Nov 24 22:20:01 host kernel: CPU: 56 PID: 1622663 Comm: receive_writer Tainted: P           OE     5.7.0-1-amd64 #1 Debian 5.7.6-1
Nov 24 22:20:01 host kernel: Hardware name: SGI.COM H2106-G7/H8QG6-F, BIOS 080016     06/24/2014
Nov 24 22:20:01 host kernel: Call Trace:
Nov 24 22:20:01 host kernel:  dump_stack+0x66/0x90
Nov 24 22:20:01 host kernel:  spl_panic+0xd4/0xfc [spl]
Nov 24 22:20:01 host kernel:  ? dbuf_rele_and_unlock+0x30c/0x660 [zfs]
Nov 24 22:20:01 host kernel:  ? dbuf_read+0x1cc/0x510 [zfs]
Nov 24 22:20:01 host kernel:  ? zfs_zevent_post+0x1f0/0x2a0 [zfs]
Nov 24 22:20:01 host kernel:  receive_object+0x95d/0xca0 [zfs]
Nov 24 22:20:01 host kernel:  ? dnode_rele_and_unlock+0x5c/0xc0 [zfs]
Nov 24 22:20:01 host kernel:  ? _cond_resched+0x15/0x30
Nov 24 22:20:01 host kernel:  receive_writer_thread+0x1d9/0x9f0 [zfs]
Nov 24 22:20:01 host kernel:  ? receive_process_write_record+0x190/0x190 [zfs]
Nov 24 22:20:01 host kernel:  ? thread_generic_wrapper+0x6f/0x80 [spl]
Nov 24 22:20:01 host kernel:  thread_generic_wrapper+0x6f/0x80 [spl]
Nov 24 22:20:01 host kernel:  kthread+0xf9/0x130
Nov 24 22:20:01 host kernel:  ? __thread_exit+0x20/0x20 [spl]
Nov 24 22:20:01 host kernel:  ? kthread_park+0x90/0x90
Nov 24 22:20:01 host kernel:  ret_from_fork+0x22/0x40
Nov 24 22:20:01 host CRON[1623091]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Nov 24 22:20:01 host sanoid[1613707]: INFO: removing destroyed snapshots from cache.
Nov 24 22:20:00 host systemd[1]: Starting system activity accounting tool...
Nov 24 22:20:01 host CRON[1623092]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Nov 24 22:20:00 host systemd[1]: sysstat-collect.service: Succeeded.
Nov 24 22:20:01 host CRON[1623091]: pam_unix(cron:session): session closed for user root
Nov 24 22:20:00 host systemd[1]: Finished system activity accounting tool.
Nov 24 22:20:01 host systemd[1]: sanoid-prune.service: Succeeded.
Nov 24 22:20:01 host systemd[1]: Finished Prune ZFS snapshots.
Nov 24 22:20:01 host systemd[1]: sanoid-prune.service: Consumed 2.941s CPU time.

Note that sanoid is never touching the src/dest datasets, but it is interacting with the pool that they are on. Also syncoid is sending snapshots onto this pool, every few minutes.

I'm not sure if a reboot was required, I was able to mount the source dataset, I was able to look at its properties. But, the dest is hanging around mid-receive, and the receive process couldn't be killed. I tried to unmound the source dataset, and that command froze, so I decided to reboot.

Prior to actually rebooting, I took a zpool status -v:

  pool: rpool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 11:55:37 with 0 errors on Wed Nov 24 14:00:59 2021
config:

        NAME         STATE     READ WRITE CKSUM
        rpool        ONLINE       0     0     0
          raidz2-0   ONLINE       0     0     0
            rpool0   ONLINE       0     0     0
            rpool1   ONLINE       0     0     0
            rpool2   ONLINE       0     0     0
            rpool3   ONLINE       0     0     0
            rpool4   ONLINE       0     0     0
            rpool5   ONLINE       0     0     0
        logs
          rpool-log  ONLINE       0     0     0
        cache
          sdg5       ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        rpool/another/seventh/%recv:<0x0>

The dataset mentioned on the last line above, is receiving via syncoid, on minutes ending in 1 or 6, via the following invocation from cron. (It's currently the only instance of syncoid being run anywhere.)

syncoid --no-sync-snap --create-bookmark other/another/fourth rpool/another/seventh

It's likely not a big delta at all, and there's every chance that this is what some of the CRON log messages are about.

@darrenfreeman
Copy link
Author

Upon reboot, the pool status shows the same as above. A scrub has not begun, and the error shows the same, with rpool/another/seventh/%recv<0:0>. The corresponding dataset had a receive_resume_token.

I then tried zfs receive -A rpool/another/seventh and got the error message:

'rpool/another/seventh' does not have any resumable receive state to abort.

I checked again, and the token was gone. I don't know whether syncoid did this, but as the key for rpool was not loaded, syncoid was unable to send new snapshots.

The end of zpool status -v now reads:

errors: Permanent errors have been detected in the following files:

        <0x39e2>:<0x0>

I loaded the keys, and manually invoked syncoid. It ran fine, the snapshots have appeared on rpool/another/seventh. The anonymous error still appears on zpool status, so I shall scrub twice, and see what happens.

@rincebrain rincebrain added Component: Encryption "native encryption" feature Component: Send/Recv "zfs send/recv" feature labels Nov 25, 2021
@darrenfreeman
Copy link
Author

Once again, after the second scrub, the error is no longer shown. I haven't rebooted, and I have been sending other datasets to this pool in the meantime.

@PrivatePuffin

This comment has been minimized.

@darrenfreeman

This comment has been minimized.

@darrenfreeman
Copy link
Author

darrenfreeman commented Nov 26, 2021

FWIW, I've diffed at least a TB of data between snapshots mounted side by side, lz4 and zstd-19. So far no differences found, no data loss that I'm aware of.

@PrivatePuffin

This comment has been minimized.

@darrenfreeman

This comment has been minimized.

@PrivatePuffin

This comment has been minimized.

@robszy

This comment has been minimized.

@PrivatePuffin

This comment has been minimized.

@darrenfreeman

This comment has been minimized.

@robszy

This comment has been minimized.

@robszy
Copy link

robszy commented Nov 28, 2021

Back to issue. No one will answers question @darrenfreeman is asking because this issue needs investigation and from my understanding there is no big interest in solving send/receive issues also with encryption so most probably this issue won't be resolved ever or You could try to resolve it alone :) as there is no support from my understanding.

@darrenfreeman
Copy link
Author

I am investigating without looking at any code. My current test is to recompress 3 TB, after dropping the L2ARC, SLOG, and stopping sanoid/syncoid. Many would find that a reasonable compromise if it succeeds, since those can be added back afterwards.

@robszy
Copy link

robszy commented Nov 28, 2021

Maybe someone can shed a light what this assert means because it it returning EIO:

VERIFY3(0 == dmu_bonus_hold_by_dnode(dn, FTAG, &db, flags)) failed (0 == 5)

It could lead to source of troubles ?

@BrainSlayer

This comment has been minimized.

@darrenfreeman

This comment has been minimized.

@darrenfreeman

This comment has been minimized.

@robszy

This comment has been minimized.

@darrenfreeman

This comment has been minimized.

@darrenfreeman

This comment has been minimized.

@ahrens
Copy link
Member

ahrens commented Nov 29, 2021

The symptoms seem identical to #12001 and #12270, which don't use zstd. So I don't see any evidence that this is specific to ZSTD.

I'm going to mark zstd-specific comments as offtopic. Polite suggestions for how to improve ZSTD are welcome in other (perhaps new) issues.

Similarly, I don't think that memory usage or overall speed/performance is at issue here; marked those as offtopic as well.

It seems probable that the EIO (5) here is not due to hardware failure, but rather something going wrong with the zfs receive, which needs to be investigated.

@darrenfreeman
Copy link
Author

The second hang, above, showed that a snapshot was being received via syncoid. And this snapshot was the one with the "permanent" error in its partially received state. I have just transferred a large single snapshot, without a hang. Syncoid was not running, and the send/recv was not incremental. Now going to reinstate sanoid and syncoid.

@ahrens
Copy link
Member

ahrens commented Nov 29, 2021

There are a few places that can return EIO in this path.

Can you try reproducing with SET_ERROR tracing enabled, $ echo 512 >/sys/module/zfs/parameters/zfs_flags (see __set_error()) and then gather zfs_dbgmsg after reproducing (cat /proc/spl/kstat/zfs/dbgmsg, or by using sdb on the crash dump).

It might also be helpful to see zpool history -i (at least the last few hundred lines, that might be relevant to this dataset).

Let's also get all the properties involved, zfs get all pool/source pool/dest.

@stale
Copy link

stale bot commented Dec 3, 2022

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Dec 3, 2022
@stale stale bot closed this as completed Mar 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Encryption "native encryption" feature Component: Send/Recv "zfs send/recv" feature Status: Stale No recent activity for issue Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

6 participants