[WIP] Reset user accounting when raw receiving a snapshot #11300

gamanakis · 2020-12-07T11:47:37Z

Motivation and Context

Description

When raw receiving a snapshot we should reset user accounting (to be recalculated upon loading the key and mounting it). When raw receiving a snapshot the local_mac is set to 0, as it's not portable. By resetting the dnode_phys_structures related to user accounting we now force zeroing out the local_mac upon mounting. Otherwise we will calculate a local_mac != 0 which prevents the dataset from being mounted.

How Has This Been Tested?

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

gamanakis · 2020-12-07T11:50:18Z

@tcaputi, @behlendorf would you mind looking over this?

behlendorf

Thanks makes sense to me. Thanks for tackling this so quickly.

module/zfs/dsl_crypt.c

gamanakis · 2020-12-14T18:08:02Z

I had a brief discussion with @tcaputi . He pointed out that we would like to compare the local_mac also in corner cases where user accounting but not userobj accounting is enabled. We decided that following a different strategy may be more appropriate: start from the code base of 2.0.0 (ie revert d1d4769) and make sure that whenever we raw-receive an encrypted dataset we zero-out user accounting and userobj accouting. That way we should cover all corner-cases.

gamanakis · 2020-12-22T21:38:25Z

I am working on how to implement the new approach. In zio_crypt.c in 2.0.0 we have:

	if ((datalen >= OBJSET_PHYS_SIZE_V3 &&
	    osp->os_userused_dnode.dn_type == DMU_OT_NONE &&
	    osp->os_groupused_dnode.dn_type == DMU_OT_NONE &&
	    osp->os_projectused_dnode.dn_type == DMU_OT_NONE) ||
	    (datalen >= OBJSET_PHYS_SIZE_V2 &&
	    osp->os_userused_dnode.dn_type == DMU_OT_NONE &&
	    osp->os_groupused_dnode.dn_type == DMU_OT_NONE) ||
	    (datalen <= OBJSET_PHYS_SIZE_V1)) {
		bzero(local_mac, ZIO_OBJSET_MAC_LEN);
		return (0);
	}

This essentially means we need to zero-out the dnode_phys_t in the objset of the dataset which is being received (along with any os_flags signaling that user accounting is present). We can do this in dsl_crypt.c right after the local_mac of the received dataset is set to zero (https://github.com/openzfs/zfs/blob/zfs-2.0-release/module/zfs/dsl_crypt.c#L2108). This approach actually works and the raw-received dataset can now be mounted.

However I fail to see why zfs userspace on the received dataset does not return any output, which essentially means that userspace metadata are not updated, not even after a re-mount of the dataset. Any suggestions?

gamanakis · 2021-01-06T22:38:14Z

Due to the previous problem I followed a new approach:
I just defined a new os_flag which is set when we raw receive a dataset (right after the local_mac of that dataset is set to 0) and results in skipping comparing the local_mac when mounting. After the first mount this new flag is cleared when user(obj)accounting is updated and thus it should not interfere with any corner case.

Issue #11294 should be resolved that way as the newly defined flag defaults to 0 (unset) and previously encrypted datasets using intermediate encryption versions would be normally mounted by taking into account the local_mac.

behlendorf

I definitely prefer this approach, it's nice that it only applies to the raw recv case and relies on all of the existing logic. @tcaputi would you mind taking another look at this.

gamanakis · 2021-02-01T23:10:37Z

@behlendorf thank you for taking a look! I had a brief discussion with @tcaputi today. He would prefer to avoid introducing a new flag and instead zero out the dnode_phys_t structures. The problem is that although the dataset can be mounted this way, the userspace metadata on the receiving dataset are not updated upon remounting. I am looking into the code to see why this is happening and how we could make it work.

gamanakis · 2021-02-03T17:17:56Z

@behlendorf @tcaputi Zero-ing out the dnode_phys_t this way poses 2 problems:

Upon destroying the receiving dataset (zfs destroy -r test/source) we have a panic:

[  271.714041] VERIFY3(0 == ba.ba_phys->bt_bytes) failed (0 == 3072)
[  271.715414] PANIC at bptree.c:295:bptree_iterate()
[  271.716503] Showing stack for process 3756
[  271.716510] CPU: 9 PID: 3756 Comm: txg_sync Tainted: P           OE     5.4.30-1-vfio-debug #1
[  271.716512] Hardware name: Gigabyte Technology Co., Ltd. Z390 AORUS ELITE/Z390 AORUS ELITE-CF, BIOS F10g 09/16/202
0
[  271.716514] Call Trace:
[  271.716528]  dump_stack+0x66/0x90
[  271.716560]  spl_panic+0xef/0x117 [spl]
[  271.716818]  ? queued_spin_unlock+0x5/0x10 [zfs]
[  271.717040]  ? __raw_spin_unlock+0x5/0x10 [zfs]
[  271.717260]  bptree_iterate+0x423/0x4f0 [zfs]
[  271.717525]  ? dsl_scan_obsolete_block_cb+0xc0/0xc0 [zfs]
[  271.717781]  dsl_process_async_destroys+0xaa/0x730 [zfs]
[  271.718021]  dsl_scan_sync+0x1e1/0xb00 [zfs]
[  271.718307]  spa_sync_iterate_to_convergence+0x139/0x310 [zfs]
[  271.718572]  spa_sync+0x38d/0x870 [zfs]
[  271.718848]  txg_sync_thread+0x2f7/0x450 [zfs]
[  271.719113]  ? txg_dispatch_callbacks+0xf0/0xf0 [zfs]
[  271.719153]  thread_generic_wrapper+0x78/0xb0 [spl]
[  271.719166]  kthread+0xfb/0x130
[  271.719190]  ? IS_ERR+0x10/0x10 [spl]
[  271.719195]  ? kthread_park+0x90/0x90
[  271.719201]  ret_from_fork+0x35/0x40

zfs userspace on receiving dataset returns no output.

% sudo zfs userspace test/source
%

Is there a recommended way to zero-out the dnode_phys_t?

tcaputi · 2021-02-03T19:25:11Z

Ah ok. so the problem here is that you haven't frreed the blocks that the dnode is pointing to. The assert is telling you that it detected that the destroy didn't destroy everything it was supposed to. @behlendorf im a little rusty on the exact function, but he should just have to free the dnode's data before zeroing it out correct?

For the second problem, did you mount the dataset before checking the space?

gamanakis · 2021-02-03T19:36:15Z

For the second problem, did you mount the dataset before checking the space?

Yes, exported the whole pool, imported again, reentered keys, and mounted the dataset.

gamanakis · 2021-02-13T22:30:33Z

3c3e60c: The above panic is unresolved, zfs userspace test/source after receiving back the raw snapshot returns no output, even after remounting the dataset. There is still a leak of dnode_t structures upon unloading the zfs module.

gamanakis · 2021-02-19T20:16:08Z

6b51856: managed to reset the dnode_phys_t, now fails when issuing a zio_encrypt() and cannot lookup the key mapping:

[ +10.506741] VERIFY3(0 == spa_do_crypt_objset_mac_abd(B_TRUE, spa, dsobj, zio->io_abd, psize, BP_SHOULD_BYTESWAP(bp))) failed (0 == 2)
[  +0.000999] PANIC at zio.c:4127:zio_encrypt()
[  +0.000355] Showing stack for process 953
[  +0.000002] CPU: 9 PID: 953 Comm: z_wr_iss Tainted: P           OE     5.4.30-1-vfio-debug #1
[  +0.000001] Call Trace:
[  +0.000006]  dump_stack+0x66/0x90
[  +0.000010]  spl_panic+0xef/0x117 [spl]
[  +0.000105]  zio_encrypt+0x711/0x830 [zfs]
[  +0.000007]  ? spl_kmem_cache_free+0x11b/0x1b0 [spl]
[  +0.000078]  zio_execute+0xea/0x220 [zfs]
[  +0.000007]  taskq_thread+0x254/0x4b0 [spl]
[  +0.000005]  ? wake_up_q+0x60/0x60
[  +0.000077]  ? zio_execute_stack_check.constprop.0+0x10/0x10 [zfs]
[  +0.000002]  kthread+0xfb/0x130
[  +0.000007]  ? param_set_taskq_kick+0xf0/0xf0 [spl]
[  +0.000001]  ? kthread_park+0x90/0x90
[  +0.000003]  ret_from_fork+0x35/0x40

gamanakis · 2021-02-22T10:17:00Z

Printing out the address of the failing zio and inspecting with sdb we get:

sdb> echo 0xffff95d5b881a220 | cast zio_t * | deref
(zio_t){
        .io_bookmark = (zbookmark_phys_t){
                .zb_objset = (uint64_t)538,
                .zb_object = (uint64_t)0,
                .zb_level = (int64_t)-1,
                .zb_blkid = (uint64_t)0,
        },
        .io_prop = (zio_prop_t){
                .zp_checksum = (enum zio_checksum)ZIO_CHECKSUM_FLETCHER_4,
                .zp_compress = (enum zio_compress)ZIO_COMPRESS_EMPTY,
                .zp_type = (dmu_object_type_t)DMU_OT_NONE,
                .zp_level = (uint8_t)11,
                .zp_copies = (uint8_t)0,
                .zp_dedup = (boolean_t)512,
                .zp_dedup_verify = (boolean_t)B_FALSE,
                .zp_nopwrite = (boolean_t)B_FALSE,
                .zp_encrypt = (boolean_t)B_FALSE,
                .zp_byteorder = (boolean_t)B_TRUE,
                .zp_salt = (uint8_t [8]){ 1 },
                .zp_iv = (uint8_t [12]){},
                .zp_mac = (uint8_t [16]){},
                .zp_zpl_smallblk = (uint32_t)0,
        },

I suspect this may be due to one of the objects (DMU_USERUSED_OBJECT and DMU_GROUPUSED_OBJECT) we zeroed out leading to the failure of looking up its key mapping.

gamanakis · 2021-03-02T20:14:36Z

511ca43: Solved the problem looking up the key by generating a new key mapping. Now the panic when destroying the source dataset is resolved. Looking to see why zfs userspace test/source still doesn't generate output.

9132142: cleanup

gamanakis · 2021-03-06T23:39:05Z

efba406: added some debugging functions, upon writing new files in the dataset (that raw received the incremental snapshot) zfs userspace returns a meaningful output. Meaning the corresponding objects are not created/updated in the right order after mounting the dataset.

gamanakis · 2021-03-08T20:18:35Z

2d70177: If we print out the dn_type of os_userused_dnode we get:

[  +2.136665] NOTICE: create obj
[  +0.076103] NOTICE: upgrade, dn_type_uu: 39
[  +4.300333] NOTICE: create obj
[  +0.052383] NOTICE: upgrade, dn_type_uu: 39
[  +0.551302] --SENDING BACK--
[  +0.138830] ---MOUNTING---
[  +0.039028] NOTICE: upgrade, dn_type_uu: 0
[  +0.003418] NOTICE: create obj
[  +0.000614] ---ZFS USERSPACE---

Meaning that in the case of the dataset which raw-received the snapshot the zeroed out dnodes are upgraded before they are populated/created.

gamanakis · 2021-03-18T21:48:49Z

I thought that we could zero out the os_flags of the dataset receiving the raw-snapshot in dmu_recv_end_sync() to force recalculation of user accounting. Zeroing out those flags can be validated by missing objects (-1, -2 and -3) with zdb -dd. After mounting the dataset user accounting is recalculated and can be seen by zdb -dd as present objects (-1, -2 and -3).

However despite all this, zfs userspace still returns no output in this PR. Notably, if we create a new file in the dataset, only the new file appears in zfs userspace.

Signed-off-by: George Amanakis <[email protected]>

jumbi77 · 2021-07-23T12:55:07Z

@gamanakis Was it intented to close this? If there is a lack of review maybe just ping behlendorf and mmaybee politely again?

Raw receiving a snapshot back to the originating dataset is currently impossible because of user accounting being present in the originating dataset. One solution would be resetting user accounting when raw receiving on the receiving dataset. However, to recalculate it we would have to dirty all dnodes, which may not be preferable on big datasets. Instead, we rely on the os_phys flag OBJSET_FLAG_USERACCOUNTING_COMPLETE to indicate that user accounting is incomplete when raw receiving. Thus, on the next mount of the receiving dataset the local mac protecting user accounting is zeroed out. The flag is then cleared when user accounting of the raw received snapshot is calculated. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #12981 Closes #10523 Closes #11221 Closes #11294 Closes #12594 Issue #11300

Raw receiving a snapshot back to the originating dataset is currently impossible because of user accounting being present in the originating dataset. One solution would be resetting user accounting when raw receiving on the receiving dataset. However, to recalculate it we would have to dirty all dnodes, which may not be preferable on big datasets. Instead, we rely on the os_phys flag OBJSET_FLAG_USERACCOUNTING_COMPLETE to indicate that user accounting is incomplete when raw receiving. Thus, on the next mount of the receiving dataset the local mac protecting user accounting is zeroed out. The flag is then cleared when user accounting of the raw received snapshot is calculated. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes openzfs#12981 Closes openzfs#10523 Closes openzfs#11221 Closes openzfs#11294 Closes openzfs#12594 Issue openzfs#11300

behlendorf added the Status: Code Review Needed Ready for review and testing label Dec 7, 2020

behlendorf reviewed Dec 7, 2020

View reviewed changes

module/zfs/dsl_crypt.c Outdated Show resolved Hide resolved

gamanakis force-pushed the raw_sends2 branch 2 times, most recently from 493e4c6 to 6ebe413 Compare December 8, 2020 20:15

behlendorf approved these changes Dec 8, 2020

View reviewed changes

behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Dec 9, 2020

behlendorf added Status: Work in Progress Not yet ready for general review and removed Status: Accepted Ready to integrate (reviewed, tested) labels Dec 15, 2020

gamanakis marked this pull request as draft December 24, 2020 07:57

gamanakis changed the title ~~Do not compare the local_mac if user accounting has not been completed~~ [WIP] Do not compare the local_mac if user accounting has not been completed Dec 24, 2020

gamanakis force-pushed the raw_sends2 branch from 6ebe413 to 6cda55e Compare January 6, 2021 22:36

gamanakis force-pushed the raw_sends2 branch from 6cda55e to 368e678 Compare January 7, 2021 12:35

behlendorf approved these changes Feb 1, 2021

View reviewed changes

gamanakis force-pushed the raw_sends2 branch from 368e678 to 090e9b4 Compare February 3, 2021 08:16

gamanakis mentioned this pull request Feb 7, 2021

HEAD issues mounting encrypted datasets/corrupting metadata #11294

Closed

gamanakis force-pushed the raw_sends2 branch from 090e9b4 to 3c3e60c Compare February 13, 2021 22:25

gamanakis force-pushed the raw_sends2 branch from 3c3e60c to f4512d2 Compare February 18, 2021 20:49

behlendorf self-requested a review February 19, 2021 00:01

gamanakis force-pushed the raw_sends2 branch from 6b51856 to d785104 Compare February 22, 2021 10:22

gamanakis force-pushed the raw_sends2 branch from 511ca43 to 9132142 Compare March 3, 2021 09:32

gamanakis force-pushed the raw_sends2 branch from efba406 to 2d70177 Compare March 8, 2021 20:16

behlendorf mentioned this pull request Apr 30, 2021

Input/Output Error when sending an encrypted incremental dataset back to it's source #11983

Closed

AttilaFueloep mentioned this pull request May 11, 2021

corrupted pool with 2.1.0-rc5 #12025

Closed

ahrens assigned mmaybee Jun 5, 2021

Reset dnode_phys_t structures, generate new key mapping

43e5d76

Signed-off-by: George Amanakis <[email protected]>

gamanakis force-pushed the raw_sends2 branch from 2d70177 to 43e5d76 Compare June 24, 2021 10:17

gamanakis changed the title ~~[WIP] Do not compare the local_mac if user accounting has not been completed~~ [WIP] Reset user accounting when raw receiving a snapshot Jun 24, 2021

gamanakis closed this Jul 23, 2021

gamanakis mentioned this pull request Sep 30, 2021

Sometimes raw send on encrypted datasets does not work when copying snapshots back #12594

Closed

gamanakis mentioned this pull request Jan 16, 2022

Introduce a flag to skip comparing the local mac when raw sending #12981

Merged

13 tasks

rincebrain mentioned this pull request Nov 1, 2022

I/O error on mounting encrypt fs after upgrading #13709

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Reset user accounting when raw receiving a snapshot #11300

[WIP] Reset user accounting when raw receiving a snapshot #11300

gamanakis commented Dec 7, 2020 •

edited

Loading

gamanakis commented Dec 7, 2020

behlendorf left a comment

gamanakis commented Dec 14, 2020

gamanakis commented Dec 22, 2020

gamanakis commented Jan 6, 2021 •

edited

Loading

behlendorf left a comment

gamanakis commented Feb 1, 2021 •

edited

Loading

gamanakis commented Feb 3, 2021 •

edited

Loading

tcaputi commented Feb 3, 2021

gamanakis commented Feb 3, 2021 •

edited

Loading

gamanakis commented Feb 13, 2021 •

edited

Loading

gamanakis commented Feb 19, 2021

gamanakis commented Feb 22, 2021 •

edited

Loading

gamanakis commented Mar 2, 2021 •

edited

Loading

gamanakis commented Mar 6, 2021 •

edited

Loading

gamanakis commented Mar 8, 2021

gamanakis commented Mar 18, 2021

jumbi77 commented Jul 23, 2021

[WIP] Reset user accounting when raw receiving a snapshot #11300

[WIP] Reset user accounting when raw receiving a snapshot #11300

Conversation

gamanakis commented Dec 7, 2020 • edited Loading

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

gamanakis commented Dec 7, 2020

behlendorf left a comment

Choose a reason for hiding this comment

gamanakis commented Dec 14, 2020

gamanakis commented Dec 22, 2020

gamanakis commented Jan 6, 2021 • edited Loading

behlendorf left a comment

Choose a reason for hiding this comment

gamanakis commented Feb 1, 2021 • edited Loading

gamanakis commented Feb 3, 2021 • edited Loading

tcaputi commented Feb 3, 2021

gamanakis commented Feb 3, 2021 • edited Loading

gamanakis commented Feb 13, 2021 • edited Loading

gamanakis commented Feb 19, 2021

gamanakis commented Feb 22, 2021 • edited Loading

gamanakis commented Mar 2, 2021 • edited Loading

gamanakis commented Mar 6, 2021 • edited Loading

gamanakis commented Mar 8, 2021

gamanakis commented Mar 18, 2021

jumbi77 commented Jul 23, 2021

gamanakis commented Dec 7, 2020 •

edited

Loading

gamanakis commented Jan 6, 2021 •

edited

Loading

gamanakis commented Feb 1, 2021 •

edited

Loading

gamanakis commented Feb 3, 2021 •

edited

Loading

gamanakis commented Feb 3, 2021 •

edited

Loading

gamanakis commented Feb 13, 2021 •

edited

Loading

gamanakis commented Feb 22, 2021 •

edited

Loading

gamanakis commented Mar 2, 2021 •

edited

Loading

gamanakis commented Mar 6, 2021 •

edited

Loading