-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Repair encryption hierarchy of 'send -Rw | recv -d' datasets that do not bring their encryption root #12000
Comments
I think this is a combination of a documentation shortfall, and an opportunity for a protecting users by testing IVs when receiving. On the documentation side, the Encryption section of the ZFS manpage should stress that critical encryption information is stored within the encryptionroot, and that destroying all copies of an encryptionroot will cause data loss. On the protection side, perhaps zfs receive should warn or fail whenever an encryptionroot is discarded using the '-d' argument to zfs receive. Additionally, zfs receive could compare the IV of existing encryptionroots to the received encryptionroot and ensure they were/are the same if encryptionroot inheritance is being preserved. |
the handling of |
That would be an option that could have helped me assess the situation better, prior to discarding the source pool. Then as a consequence, ultimately it appears to me Additionally it now comes to my mind, we also need a mechanism to back up and restore encryption headers, IVs and keys, as we would do with any LUKS device as well. Wasn't this something the kind of binary @tcaputi brought up could have helped with? |
I would guess that a |
Just started using ZoL with native encryption and think I have hit the same or a similar bug (related to #6624 as well).
All good at this point. Everything works as expected. Now, do an incremental send:
Again, all good. Now unmount/mount:
Yikes! This appears to have corrupted 10TB of backup filesystems. I've been trying to recover from this but no luck so far. If I don't run After running I may have also uncovered another bug in trying to recover from this. If I run
I've tested this on (all x64_64):
|
I think you've got the problem mostly identified. While your When you There's clearly a documentation shortfall, and some opportunities for code to protect users as well as some tooling to repair/import some of this data in disaster recovery situations. In your current situation, you may be able to recover by rolling back your receive dataset to eliminate all of the snapshots made after the |
So it sounds to me like it's OK to have child datasets with a different IV but the tooling should absolutely not be allowing
After running
Unfortunately this does not work. The Another thing I noticed: while keys are loaded I tried doing I tried this a several times and get the same result: the password is no longer accepted. Then I tried doing
This happens even when I destroy the corrupted child dataset so it appears that not only is this corrupting the replicated child datasets but it's also corrupting the parent dataset as well.
Yeah this was shockingly easy to step in to and pretty scary considering it appears to work just fine until that first reboot or unmount/mount! |
I am not a ZFS expert, but I would guess that that is the case. I believe that |
@almereyda What you are observing should've been fixed by #9309 which was merged two years ago. Unfortunately it was never included in a 0.8.x release, this looks like an oversight to me. It is included in the 2.0.0 release though. And indeed I can't reproduce this on a somewhat current master. Regarding your inaccessible data, let's summarize key handling for a better understanding of what 's happening here. Every dataset has its master key which is shared with all snapshots and clones of that dataset. This is a random key generated on dataset creation and never changes. It is stored encrypted on disk. The encryption keys which are used to encrypt the data are derived from that master key. The on disk master key is encrypted with a wrapping key which in turn is either passed in verbatim You are now in a situation where you can't decrypt the master key of the Please note that there's a bit of speculation in the above since I have no zfs 0.8.x lying around to reproduce with. If you're desperate enough I can give you some commands to check my assumptions. |
@AttilaFueloep That's interesting backstory. However, no where in @almereyda's listed commands is |
Well, this is in the reproducer.
I think you are referring to the issue @brenc reported here. That one is different since it involves changing the |
Ahh yes. We may have two different problems that ended up in the same issue. They could be related, or they could not be. Perhaps we can spin off the @brenc problem into a new issue, and close @AttilaFueloep's as resolved in 2.0? |
Thank you for the detailed explanation @AttilaFueloep! Especially the details on the relation between the salts and the IVs as wrapping keys for the key helped me a lot in understanding the mechanics involved here. The original encryption root is not available anymore, but the mirrors of its descendants are. I would like to test these commands to check your assumptions, @AttilaFueloep. If we were to fabricate said binary, or extension to the
even better, and we might all benefit from this at some point. I'm not able to tell if #9309 already addresses any of the technicalities involved here. Just to confirm the issue as prevailing (with some little help from #4553 (comment)) on a dangling copy of the locked-up datasets (despite the key is present and known):
As a takaway from @secabeen's other comment
my strategy for sure now will be to keep a zpool's native dataset unmountable and unencrypted, and have its capital-letter descendants work as unmountable encrytion roots from now on. Which also opens up the possibility to use different keys for each, which is nice. A strategy and way to recursively and incrementally replicate (un)encrypted datasets across machines into (1) new encryptionroots or (2) raw mirroring the original encryption hierarchy, where the encryption keys are not loaded in the same place, but eventually at the same time, is left open for me to discover. Previously non-raw send/receive worked well with encrypted datasets between pools on the same machine, and we're new to do so between different ones. I guess here we would be dealing with accomodations for the cases of:
To cite the current documentation on https://openzfs.github.io/openzfs-docs/man/8/zfs-recv.8.html (emphasis by me):
Now that's a bummer if the dataset with the name of the pool is incidentally also a required intermediate file system, acting as an If here we do not aspire to extend the (online) documentation as suggested above, or seek for an implementation to extract and implant IV salts, it is fine with me to close the discussion as won't fix (anymore), and forking off the related case into its separate issue. Else, would there be any non-destructive way for me to check if the IVs are still present in the datasets, and if or how to use Edit:To second this thought, it's a pity the OpenZFS 2.x release line hadn't been backported to Ubuntu 20.04 LTS, which is still their currently supported long-term release, and will be for another while, despite bundling the package consciously and even mandating for it a long time [¹]((https://ubuntu.com/blog/zfs-licensing-and-linux), even well before they started making it a dependency for ZSYS. |
A minor clarification. In this context an IV (Initialization Vector, sometimes also called nonce (number used once)) is a block of random data which is used to make sure that the same key and plaintext always produce a different ciphertext. It is generated randomly for each block to encrypt. To Decrypt the block you need both, the key and the IV. The IV used to encrypt the master key is stored on disk alongside the encrypted master key. The wrapping key is generated from the passphrase. A salt is a similar concept but applies to the passphrase. Again you need both, the passphrase and the salt to generate the wrapping key.
Never mind, I could dig out an old enough zfs installation and reproduce the issue. Unfortunately I don't see a way to recover the lost salt, so I'm afraid your data isn't recoverable. By destroying the originating pool you lost the wrapping key since you lost the salt.
Yes they are addressed. If receiving a raw zstream with a missing encryption root the topmost received dataset is made the new encryption root of itself and the datasets below it. #9309 just fixed a bug which broke this process in the case of
Allthough currently there is no way to restore key information you can dump it in plain text by doing zfs snapshot data/set@snap
zfs send -w data/set@snap | zstreamdump | sed -n -e '/crypt_keydata/,/end crypt/p; /END/q' To make an encrypted dataset the encryption root of itself and all descendant datasets do the following zfs change-key -o keyformat=passphrase -o keylocation=promt data/set there is no need for keyformat and keylocation to differ from the ones of the current encryption root but the keys must be loaded.
You could ask Canonical to include #9309 in their next zfs update. It should apply cleanly to the 0.8 tree. As far as I know they do backport certain ZFS changes to their (LTS) distro tree. |
Although this isn't my issue I'd say that's the way to go. |
@brenc I can reproduce your issue on a somewhat current master. In short the problem is that the incremental receive overwrites the master key of the Please see #12000 (comment) for the terminology used above. Off the top my head I can't come up with a simple fix, I've to think more about it. To recover the This definitely is a different problem and deserves its own issue. |
Done #12614
Luckily for me I was able to just start over and replicate the entire dataset from the encryption root up. Hopefully there aren't too many people out there doing what I did who haven't rebooted / remounted in months... Thanks for the detailed info. I appreciate knowing more about how this stuff works. |
Thanks, I'll continue over there.
Fully agree, sadly this is way too easy to trigger and can go undetected for a long time. |
I am not sure if this is related. But I cannot unlock one of my child datasets.
I also cannot create or delete files in the root of my parent. In working dataset and in subfolders I have no issue:
|
Would you like to open a new follow-up issue, to triage your case
independently of the specifics detailed out here? Then we can more cleanly
find suitable answers to your questions.
Anyway I'm trying to find some initial answers from what you provided:
It appears fully legible that you cannot remove the ZFS dataset with rm, in
case it is mounted at that path.
Can you show us the outputs of the following commands?
df -h /mnt/secure2
df -h /mnt/secure2/nextcloud2
They will show us which dataset/volume is currently mounted there.
Also some details about your dataset could help in debugging the issue.
zfs get all /mnt/secure2
zfs get all /mnt/secure2/nextcloud2
Then I can only think of the case, that you might have conducted a raw send
and receive of the nextcloud2 dataset, with creating another secure2
dataset that uses a different IV from its origin.
I suggest everything else we discuss in your follow up issue.
…On Mon, 10 Oct 2022 at 14:06, Alexander Weps ***@***.***> wrote:
I am not sure if this is related.
But I cannot unlock one of my child datasets.
sudo zfs load-key secure2
Key load error: Key already loaded for 'secure2'.
sudo zfs load-key secure2/nextcloud2
Key load error: Keys must be loaded for encryption root of 'secure2/nextcloud2' (secure2).
Radomír Polách, [10.10.22 13:30]
sudo zfs load-key secure2
Key load error: Key already loaded for 'secure2'.
Radomír Polách, [10.10.22 13:31]
sudo zfs load-key secure2/nextcloud2
Key load error: Keys must be loaded for encryption root of
'secure2/nextcloud2' (secure2).
I also cannot create or delete files in the root of my parent. In working
dataset and in subfolders I have no issue:
***@***.***[/mnt/secure2]# ls -la
total 35
drwxr-xr-x 5 root root 5 Sep 26 16:01 .
drwxr-xr-x 6 root root 6 Oct 10 06:04 ..
drwxr-xr-x 6 root root 6 Oct 10 13:47 charts
drwxr-xr-x 2 root root 2 Sep 26 16:01 ix-applications
drwxrwx--- 3 root www-data 4 Oct 10 13:53 nextcloud2
***@***.***[/mnt/secure2]# touch test
touch: setting times of 'test': No such file or directory
***@***.***[/mnt/secure2]# rm -rf nextcloud2
rm: cannot remove 'nextcloud2': Operation not permitted
—
Reply to this email directly, view it on GitHub
<#12000 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAMRV7HSWRSNPYYHCYVG6P3WCQBD3ANCNFSM44D5LHWQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@almereyda I fixed the issue for my nextcloud2 dataset, but every new dataset has the same issue. I think the issue may be related to the fact, that I cannot write/remove any files and directories in the root of my parent dataset.
So, my issue is currently with my parent dataset and how to repair, so my new datasets are created without issues and I can write files into it. I can create separate issue for this. |
I have made a related issue: #14011 |
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions. |
As the original post was solved with #9309, we can close here and open questions continue in |
Is this really fixed? I encountered this with OpenZFS v2.1 |
As of bb61cc3 the patch that was communicated to solve this behaviour was available since ZFS v2.x.x. Could you provide a small writeup on how you reproduced this error in your environment? I'm happy to reopen here, if we find more evidence that makes the case. |
Yeah, I actually hit this again. Sender is an x86_64 system running zfs v2.2.99-365_g8f2f6cd2a and kernel v6.1.81 while receiver is an arm64 (rpi4) system running zfs v2.2.3 and kernel v6.6.12 (arch linux arm). Reproducer:
|
I guess I'd need to send the updates as un-encrypted streams somehow. Any hints? |
System information
Describe the problem you're observing
Following up from #6624 and #9273 it happens that a certain Long-Term Support release from Ubuntu that natively ships with ZFS 0.8.1 (meanwhile updated to 0.8.3) can still produce datasets that have a broken encryption hierarchy, despite all appears to be well at first.
In #6624 (comment) @tcaputi speaks of:
Is it possible to apply this workaround in a way to restore the ability to mount the datasets? For this case, let's consider the source is not available anymore for retransmission. This is due to a juggling of datasets for downsizing a pool in a mirrored setup, where the source has been replaced by a smaller version of itself.
Two things are odd here:
encryptionroot
is a read-only value. How is it possible to re-set it?Describe how to reproduce the problem
Messages from the logs
Additional information
suggests never to encrypt the primary dataset of a pool, thus the pool, but to create encrypted datasets as malleable encryption roots below
This is also discussed in:
Interestingly running
zfs load-key -r pool1
will not try to load the keys for the datasets for which keys are available (via their encryption rootpool1
), yet still they remain unmountable. Also some of the descendent encryption roots that Docker apparently created are decryptable, but others aren't. This gives hope, next to the key not being rejected for the other datasets, that all USERDATA is still available and recoverable.Some datasets decrypt, others don't, especially from Docker.
Further mentions of similar behaviours are:
-d
-F
and also replicates encryption roots distinctivelychange-key -i
It is not possible to set a new keylocation for dependant datasets in the encryption hierarchy.
Following on from #6847 (comment) also doesn't help in this case, where the encryption root has been replaced:
The pool itself does not report any data errors:
May it be possible to reintroduce a new encryption root for those datasets that don't mount currently, eventually by decomposing and reconstructing the pool into two and back again, or are there any other known workarounds I am not aware of, yet?
Thank you for your kind help, our users will appreciate it.
Reflection and conclusion
Reading the linked references brings up their reasoning and questioning again:
Which other good practices are known to omit the side effects and edge cases we produce here, other than creating independently encrypted datasets under the root dataset of the pool that act as encryption roots for their descendants, and leaving that itself unencrypted?
When trying to do
send | recv -x encryption
, the command output complains about a missing raw flag, while we have migrated datasets from unencrypted pools into encrypted ones before. Now this looks even more bestranging:Is the source key actually available and loaded, or is it not? From the given output of the commands, for me it is not possible to depict anymore.
Reading through zfs-receive.8 makes me guess that we now probably have different initialization vectors (IV) for the AEAD cipher in the
pool1
dataset and its descendants, why recreating the encrypted zpool with the same passphrase does not mean it can act as an encryption root for its newly retrieved childs.The text was updated successfully, but these errors were encountered: