Raw send on encrypted datasets does not work when copying snapshots back #10523

felixdoerre · 2020-07-01T18:05:46Z

Describe the problem you're observing

$ sudo zfs create -o encryption=on -o keyformat=passphrase testpool/source
Enter passphrase:
Re-enter passphrase:
$ sudo zfs snap testpool/source@base
$ sudo zfs snap testpool/source@s1
$ sudo zfs umount testpool/source
$ sudo zfs send -w testpool/source@base | sudo zfs receive testpool/target
$ sudo zfs send -w -i @base testpool/source@s1 | sudo zfs receive testpool/target
$ sudo zfs destroy testpool/source@s1
$ sudo zfs send -w -i @base testpool/target@s1 | sudo zfs receive testpool/source
$ sudo zfs mount testpool/source
filesystem 'testpool/source' can not be mounted: Input/output error
cannot mount 'testpool/source': Invalid argument
$

With --enable-debug the example already crashes at the zfs receive testpool/source. This produces the following kernel log:

[ 2516.835681] VERIFY3(0 == os->os_flags & OBJSET_FLAG_USERACCOUNTING_COMPLETE) failed (0 == 1)
[ 2516.835819] PANIC at dsl_crypt.c:2019:dsl_crypto_recv_raw_objset_check()
[ 2516.835889] Showing stack for process 305588
[ 2516.835897] CPU: 0 PID: 305588 Comm: zfs Tainted: P           OE     5.4.0-33-generic #37-Ubuntu
[ 2516.835898] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 2516.835899] Call Trace:
[ 2516.835980]  dump_stack+0x6d/0x9a
[ 2516.836043]  spl_dumpstack+0x29/0x2b [spl]
[ 2516.836049]  spl_panic+0xd4/0xfc [spl]
[ 2516.836425]  ? dnode_rele_and_unlock+0x6c/0x150 [zfs]
[ 2516.836468]  ? dmu_buf_get_user+0x17/0x30 [zfs]
[ 2516.836474]  ? tsd_hash_search.isra.0+0x71/0xa0 [spl]
[ 2516.836480]  ? tsd_get+0x49/0x90 [spl]
[ 2516.836535]  ? rrn_find+0x29/0x50 [zfs]
[ 2516.836592]  ? rrw_held+0x79/0xe0 [zfs]
[ 2516.836602]  ? _cond_resched+0x19/0x30
[ 2516.836605]  ? mutex_lock+0x13/0x40
[ 2516.836650]  ? dmu_objset_from_ds+0xbb/0x280 [zfs]
[ 2516.836699]  dsl_crypto_recv_key_check+0x5b4/0x6e0 [zfs]
[ 2516.836755]  ? zfs_refcount_add_many+0xab/0x110 [zfs]
[ 2516.836810]  dsl_sync_task_common+0x13d/0x2a0 [zfs]
[ 2516.836859]  ? dsl_crypto_recv_raw_key_check+0x430/0x430 [zfs]
[ 2516.836907]  ? dsl_crypto_recv_raw_key_sync+0x440/0x440 [zfs]
[ 2516.836956]  ? dsl_crypto_recv_raw_key_check+0x430/0x430 [zfs]
[ 2516.837004]  ? dsl_crypto_recv_raw_key_sync+0x440/0x440 [zfs]
[ 2516.837059]  dsl_sync_task+0x1a/0x20 [zfs]
[ 2516.837108]  dsl_crypto_recv_raw+0x4f/0x70 [zfs]
[ 2516.837154]  dmu_recv_stream+0x22d/0x11c0 [zfs]
[ 2516.837208]  ? dsl_sync_task+0x1a/0x20 [zfs]
[ 2516.837271]  zfs_ioc_recv_impl+0x90e/0x10e0 [zfs]
[ 2516.837337]  zfs_ioc_recv_new+0x2ab/0x330 [zfs]
[ 2516.837342]  ? spl_kvmalloc+0xc2/0xd0 [spl]
[ 2516.837347]  ? spl_kmem_alloc_impl+0x125/0x130 [spl]
[ 2516.837353]  ? spl_vmem_alloc+0x2c/0x60 [spl]
[ 2516.837388]  ? nv_alloc_sleep_spl+0x1f/0x30 [znvpair]
[ 2516.837393]  ? nv_mem_zalloc.isra.0+0x2f/0x40 [znvpair]
[ 2516.837397]  ? nvlist_xalloc.part.0+0x50/0xb0 [znvpair]
[ 2516.837460]  zfsdev_ioctl_common+0x43e/0x900 [zfs]
[ 2516.837523]  zfsdev_ioctl+0x55/0xe0 [zfs]
[ 2516.837580]  do_vfs_ioctl+0x407/0x670
[ 2516.837607]  ? do_user_addr_fault+0x216/0x450
[ 2516.837610]  ksys_ioctl+0x67/0x90
[ 2516.837611]  __x64_sys_ioctl+0x1a/0x20
[ 2516.837625]  do_syscall_64+0x57/0x190
[ 2516.837628]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 2516.837635] RIP: 0033:0x7f5ac4ce737b
[ 2516.837638] Code: 0f 1e fa 48 8b 05 15 3b 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e5 3a 0d 00 f7 d8 64 89 01 48
[ 2516.837639] RSP: 002b:00007ffe255d0988 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 2516.837645] RAX: ffffffffffffffda RBX: 0000000000005a46 RCX: 00007f5ac4ce737b
[ 2516.837645] RDX: 00007ffe255d09b0 RSI: 0000000000005a46 RDI: 0000000000000005
[ 2516.837646] RBP: 00007ffe255d3fa0 R08: 00005564d654b040 R09: 00007f5ac4dbbd00
[ 2516.837647] R10: 00005564d652a010 R11: 0000000000000246 R12: 00007ffe255d4000
[ 2516.837648] R13: 00007ffe255d09b0 R14: 0000000000005a46 R15: 00005564d6537630

The text was updated successfully, but these errors were encountered:

felixdoerre · 2020-07-03T07:25:20Z

I have disabled the ASSERT that caused the panic to be able to understand where the EIO in release mode comes from. I have enabled the ZFS_DEBUG_SET_ERROR bit in /sys/module/zfs/parameters/zfs_flags, reproduced the problem and got this log excerpt:

593760513   zap_leaf.c:426:zap_leaf_lookup(): error 2
1593760513   zap_leaf.c:426:zap_leaf_lookup(): error 2
1593760513   dsl_prop.c:149:dsl_prop_get_dd(): error 2
1593760513   zap_micro.c:985:zap_lookup_impl(): error 2
1593760513   zap_micro.c:985:zap_lookup_impl(): error 2
1593760513   zap_micro.c:985:zap_lookup_impl(): error 2
1593760513   dsl_prop.c:149:dsl_prop_get_dd(): error 2
1593760513   zap_leaf.c:510:zap_entry_read_name(): error 75
1593760513   dsl_crypt.c:2711:spa_do_crypt_objset_mac_abd(): error 52
1593760513   arc.c:2148:arc_untransform(): error 5
1593760515   zap_micro.c:985:zap_lookup_impl(): error 2
1593760515   zap.c:770:fzap_checksize(): error 22
1593760515   zap_leaf.c:510:zap_entry_read_name(): error 75
1593760515   zap_micro.c:985:zap_lookup_impl(): error 2
1593760515   metaslab.c:3816:metaslab_flush(): flushing: txg 297, spa testpool, vdev_id 0, ms_id 2, unflushed_allocs 66560, unflushed_frees 23552, appended 208 bytes
1593760515   zap_leaf.c:510:zap_entry_read_name(): error 75
1593760515   dbuf.c:2835:dbuf_findbp(): error 2
1593760520   zap_micro.c:985:zap_lookup_impl(): error 2
1593760520   zap.c:770:fzap_checksize(): error 22

So the EIO is most likely triggered by the ECKSUM here:

zfs/module/zfs/dsl_crypt.c

Line 2711 in ae7b167

return (SET_ERROR(ECKSUM));

which gets translated into an EIO here:

zfs/module/zfs/arc.c

Line 2160 in 3c42c9e

ret = SET_ERROR(EIO);

From the 2 mac checks it is the "local" one that fails: bcmp(local_mac, osp->os_local_mac, ZIO_OBJSET_MAC_LEN) != 0

IslandLife · 2020-07-08T15:58:06Z

I have also seen this. This problem has existed in the master branch from some months now. For example zfs 0.8.0-699 (build from 20 March 2020) -796 (28 May 2020), -873 (build from 25th June 2020) and zfs 0.8.0-897 ( build from 8th Jul 2020 ) After an attempt to mount the re-constructed filesystem fails a zfs status -v immediately identifies the re-constructed
filesystem as suffering from a corruption - even though the disk devices themselves have a zero error count.

errors: Permanent errors have been detected in the following files:

        testpool/source:<0x0>

Unfortunately the defect introduced is NOT detected by a zpool scrub! A scrub after the receive and before the mount attempt will not show a problem.

zfs create -o encryption=on -o keyformat=passphrase test/source
Enter passphrase: 
Re-enter passphrase: 
zfs snap test/source@base
zfs umount test/source
zfs snap test/source@s1
zfs send -w test/source@base | zfs receive test/target 
zfs send -w -i @base test/source@s1 | zfs receive test/target 
zfs destroy test/source@s1
zfs rollback test/source@base
zfs send -w -i @base test/target@s1| zfs receive test/source 
zpool scrub test
zpool wait -t scrub test
zpool status -v test
  pool: test
 state: ONLINE
  scan: scrub repaired 0B in 0 days 00:00:00 with 0 errors on Thu Jul  9 10:47:48 2020
config:

        NAME                        STATE     READ WRITE CKSUM
        test                        ONLINE       0     0     0
          /localhome/zfsadmin/tank  ONLINE       0     0     0

errors: No known data errors
zfs mount test/source
filesystem 'test/source' can not be mounted: Input/output error
cannot mount 'test/source': Invalid argument
ERROR: bad status 1 for zfs mount test/source
zpool status -v test
  pool: test
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://zfsonlinux.org/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 0 days 00:00:00 with 0 errors on Thu Jul  9 10:47:48 2020
config:

        NAME                        STATE     READ WRITE CKSUM
        test                        ONLINE       0     0     0
          /localhome/zfsadmin/tank  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        test/source:<0x0>

Rolling back the faulty snapshot update (to @base) and re-starting scrub the corruption disappears as might be expected.
The only work around is then to rebuild sending a full snapshot and then the incrementals. Over a slow network this is impractical.

IslandLife · 2020-07-09T09:54:07Z

The observation that zpool scrub will not detect the problem is particularly worrying. While one can quickly test a filesystem dataset via a mount command .. what about when you can't use mount? For example when the dataset is a volume?

felixdoerre · 2020-07-09T10:22:19Z

I believe that the reported "corrpution" is not actually a corruption (that's why scrub does not find any problems). The problem is reported, because the mac-mismatch reports an EIO which is probably interpreted by levels higher up as a data corruption problem. The real cause seems to be, that useraccounting-data is present where it is not expected, and the empty object is calculated to have a different mac (

zfs/module/os/linux/zfs/zio_crypt.c

Line 1212 in 10fa254

bzero(local_mac, ZIO_OBJSET_MAC_LEN);

) that the actually existing one.

IslandLife · 2020-07-09T12:30:58Z

I did a small test using a volume rather than a filesystem. The rebuilt volume appears under /dev/zvol and the contents
are indeed identical. zpool status -v never finds a problem and neither does zpool scrub.

behlendorf · 2020-10-12T21:47:59Z

@felixdoerre after talking this issue over with @tcaputi we've got a good idea how to go about fixing this. As you mentioned in an earlier comment the core issue here is that the user space accounting is present when it's not expected to be. This leads to the subsequent mount failure due a checksum error when verifying the local mac. What we can do this handle this specific case it to clear the OBJSET_FLAG_USERACCOUNTING_COMPLETE and reset the local mac. This should then allow the user accounting to be correctly updated on first mount using the normal upgrade process. @tcaputi is going to put together a patch, in the next week or so.

When sending raw encrypted datasets the user space accounting is present when it's not expected to be. This leads to the subsequent mount failure due a checksum error when verifying the local mac. Fix this by clearing the OBJSET_FLAG_USERACCOUNTING_COMPLETE and reset the local mac. This allows the user accounting to be correctly updated on first mount using the normal upgrade process. Closes openzfs#10523. Signed-off-by: George Amanakis <[email protected]>

gamanakis · 2020-11-19T19:26:08Z

I bumped recently into this issue. Given the excellent discussion/debug done here, I decided to give it a try.

When sending raw encrypted datasets the user space accounting is present when it's not expected to be. This leads to the subsequent mount failure due a checksum error when verifying the local mac. Fix this by clearing the OBJSET_FLAG_USERACCOUNTING_COMPLETE and reset the local mac. This allows the user accounting to be correctly updated on first mount using the normal upgrade process. Closes openzfs#10523. Signed-off-by: George Amanakis <[email protected]>

When sending raw encrypted datasets the user space accounting is present when it's not expected to be. This leads to the subsequent mount failure due a checksum error when verifying the local mac. Fix this by clearing the OBJSET_FLAG_USERACCOUNTING_COMPLETE and reset the local mac. This allows the user accounting to be correctly updated on first mount using the normal upgrade process. Reviewed-By: Brian Behlendorf <[email protected]> Reviewed-By: Tom Caputi <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #10523 Closes #11221

When sending raw encrypted datasets the user space accounting is present when it's not expected to be. This leads to the subsequent mount failure due a checksum error when verifying the local mac. Fix this by clearing the OBJSET_FLAG_USERACCOUNTING_COMPLETE and reset the local mac. This allows the user accounting to be correctly updated on first mount using the normal upgrade process. Reviewed-By: Brian Behlendorf <[email protected]> Reviewed-By: Tom Caputi <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes openzfs#10523 Closes openzfs#11221

rincebrain · 2021-10-06T15:16:40Z

Since 6217656 reverted d1d4769 5 months ago and has not been fixed again, maybe this should be reopened? (Someone just came by IRC running into this, and I was surprised that it got fixed, reverted, and never corrected again.)

behlendorf · 2021-10-06T17:47:28Z

Good point, I'm reopening this.

Da-kota · 2021-11-12T19:35:46Z

I believe that the reported "corrpution" is not actually a corruption (that's why scrub does not find any problems). The problem is reported, because the mac-mismatch reports an EIO which is probably interpreted by levels higher up as a data corruption problem. The real cause seems to be, that useraccounting-data is present where it is not expected, and the empty object is calculated to have a different mac (

zfs/module/os/linux/zfs/zio_crypt.c

Line 1212 in 10fa254

bzero(local_mac, ZIO_OBJSET_MAC_LEN);

) that the actually existing one.

It smells like a design flaw. The dataset seems to be containing surplus metadata (in this context I mean everything else than that which is strictly necessary to mount/access the data) upon which on some higher than physical I/O level is decided to not access the data and report an error which is incorrectly escalated/handled. If this really IS a design flaw, I hope that a well coordinated effort will be able to carefully update the design, which will pay off hugely in future development. ZFS should be easily capable of using recursive incremental snapshots of (possibly) encrypted datasets that can be sent/received in raw form to safely and efficiently synchronize sets bi-directionally.

Best regards.

Blackclaws · 2022-01-12T15:32:21Z

Definitely something that needs to be looked into. I just ran into this issue when testing the received encrypted snapshots on our backup system. I'm wondering right now whether zfs encryption is even usable with raw sends in this form. Is there any sort of ETA on when a fix might be available?

rincebrain · 2022-01-12T15:49:50Z

I would not recommend making any plans that assume raw send works reliably on any particular timetable, given the amount of time people seem willing to invest in fixing it.

…

On Wed, Jan 12, 2022 at 10:32 AM Felix Winterhalter < ***@***.***> wrote: Definitely something that needs to be looked into. I just ran into this issue when testing the received encrypted snapshots on our backup system. I'm wondering right now whether zfs encryption is even usable with raw sends in this form. Is there any sort of ETA on when a fix might be available? — Reply to this email directly, view it on GitHub <#10523 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABUI7J2OFLVBPUXVRCPV6DUVWNJDANCNFSM4ONXDKLA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you commented.Message ID: ***@***.***>

Raw receiving a snapshot back to the originating dataset is currently impossible because of user accounting being present in the originating dataset. One solution would be resetting user accounting when raw receiving on the receiving dataset. However, to recalculate it we would have to dirty all dnodes, which may not be preferable on big datasets. Instead, we rely on the os_phys flag OBJSET_FLAG_USERACCOUNTING_COMPLETE to indicate that user accounting is incomplete when raw receiving. Thus, on the next mount of the receiving dataset the local mac protecting user accounting is zeroed out. The flag is then cleared when user accounting of the raw received snapshot is calculated. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #12981 Closes #10523 Closes #11221 Closes #11294 Closes #12594 Issue #11300

Raw receiving a snapshot back to the originating dataset is currently impossible because of user accounting being present in the originating dataset. One solution would be resetting user accounting when raw receiving on the receiving dataset. However, to recalculate it we would have to dirty all dnodes, which may not be preferable on big datasets. Instead, we rely on the os_phys flag OBJSET_FLAG_USERACCOUNTING_COMPLETE to indicate that user accounting is incomplete when raw receiving. Thus, on the next mount of the receiving dataset the local mac protecting user accounting is zeroed out. The flag is then cleared when user accounting of the raw received snapshot is calculated. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes openzfs#12981 Closes openzfs#10523 Closes openzfs#11221 Closes openzfs#11294 Closes openzfs#12594 Issue openzfs#11300

behlendorf added Component: Encryption "native encryption" feature Type: Defect Incorrect behavior (e.g. crash, hang) Component: Send/Recv "zfs send/recv" feature labels Jul 1, 2020

felixdoerre changed the title ~~Raw send does not work when copying snapshots back~~ Raw send on encrypted datasets does not work when copying snapshots back Jul 1, 2020

gamanakis mentioned this issue Nov 19, 2020

Fix raw sends on encrypted datasets when copying back snapshots #11221

Merged

12 tasks

behlendorf closed this as completed in #11221 Dec 4, 2020

gamanakis mentioned this issue Jun 24, 2021

[WIP] Reset user accounting when raw receiving a snapshot #11300

Closed

13 tasks

digitalsignalperson mentioned this issue Sep 29, 2021

Sometimes raw send on encrypted datasets does not work when copying snapshots back #12594

Closed

behlendorf reopened this Oct 6, 2021

gamanakis mentioned this issue Jan 16, 2022

Introduce a flag to skip comparing the local mac when raw sending #12981

Merged

13 tasks

behlendorf closed this as completed in #12981 Jan 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Raw send on encrypted datasets does not work when copying snapshots back #10523

Raw send on encrypted datasets does not work when copying snapshots back #10523

felixdoerre commented Jul 1, 2020

felixdoerre commented Jul 3, 2020 •

edited

Loading

IslandLife commented Jul 8, 2020 •

edited

Loading

IslandLife commented Jul 9, 2020

felixdoerre commented Jul 9, 2020

IslandLife commented Jul 9, 2020

behlendorf commented Oct 12, 2020

gamanakis commented Nov 19, 2020 •

edited

Loading

rincebrain commented Oct 6, 2021 •

edited

Loading

behlendorf commented Oct 6, 2021

Da-kota commented Nov 12, 2021 •

edited

Loading

Blackclaws commented Jan 12, 2022

rincebrain commented Jan 12, 2022 via email

Raw send on encrypted datasets does not work when copying snapshots back #10523

Raw send on encrypted datasets does not work when copying snapshots back #10523

Comments

felixdoerre commented Jul 1, 2020

Describe the problem you're observing

felixdoerre commented Jul 3, 2020 • edited Loading

IslandLife commented Jul 8, 2020 • edited Loading

IslandLife commented Jul 9, 2020

felixdoerre commented Jul 9, 2020

IslandLife commented Jul 9, 2020

behlendorf commented Oct 12, 2020

gamanakis commented Nov 19, 2020 • edited Loading

rincebrain commented Oct 6, 2021 • edited Loading

behlendorf commented Oct 6, 2021

Da-kota commented Nov 12, 2021 • edited Loading

Blackclaws commented Jan 12, 2022

rincebrain commented Jan 12, 2022 via email

felixdoerre commented Jul 3, 2020 •

edited

Loading

IslandLife commented Jul 8, 2020 •

edited

Loading

gamanakis commented Nov 19, 2020 •

edited

Loading

rincebrain commented Oct 6, 2021 •

edited

Loading

Da-kota commented Nov 12, 2021 •

edited

Loading