Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zfs: load keys for encrypted datasets during pool import #1384

Merged
merged 3 commits into from
Nov 22, 2024

Conversation

cyphar
Copy link
Member

@cyphar cyphar commented Nov 16, 2024

If a user has set up their own zpools and given them to us to manage,
it's possible they've configured ZFS-native encryption. For the most
part, this works completely transparently to us. However, because we
manually do zpool-import and zpool-export during startup and shutdown of
Incus, ZFS datasets with keys will have their keys unloaded during
shutdown and then the keys are not automatically loaded on startup. This
results in containers being unable to start on startup because all IOs
are blocked indefinitely until the dataset keys are loaded manually by
the admin -- even if the admin has configured automatic key loading on
their system!

The simplest solution would be to pass -l to zfs-import (which causes
ZFS to auto-import all keys for all datasets in the pool). However, it
is slightly nicer to do a separate zfs-load-key so that we can unmount
the pool if the key import fails (zfs-import will leave the pool
imported but without keys loaded).

If the user has configured keylocation=prompt (or otherwise
misconfigured the encryption settings for their pool), the command will
fail and the pool import will fail loudly (rather than silently failing
in the form of an imported pool that is not usable as a filesystem).

Signed-off-by: Aleksa Sarai [email protected]

@cyphar
Copy link
Member Author

cyphar commented Nov 16, 2024

I haven't added tests yet because I'm not sure how you would like me to do that -- should we have just one specific test for this behaviour or run all of the ZFS-related tests on both an encrypted and unencrypted dataset?

Also let me know if you think just doing zpool import -l is good enough. Every time I update Incus (or restart my server) I seem to hit this particular issue. I haven't tested this on my machine yet, just posting this to get a review of the general idea and whether you'd like a different solution.

Copy link
Member

@stgraber stgraber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic looks good to me.

For testing, the easiest would be a system tests which only runs under INCUS_BACKEND=zfs and proceeds to manually create a test loop backed test zpool with encryption, then restarts the test Incus instance to make sure it gets unlocked properly.

internal/server/storage/drivers/driver_zfs.go Outdated Show resolved Hide resolved
internal/server/storage/drivers/driver_zfs_utils.go Outdated Show resolved Hide resolved
internal/server/storage/drivers/driver_zfs_utils.go Outdated Show resolved Hide resolved
internal/server/storage/drivers/driver_zfs_utils.go Outdated Show resolved Hide resolved
internal/server/storage/drivers/driver_zfs_utils.go Outdated Show resolved Hide resolved
internal/server/storage/drivers/driver_zfs_utils.go Outdated Show resolved Hide resolved
internal/server/storage/drivers/driver_zfs_utils.go Outdated Show resolved Hide resolved
internal/server/storage/drivers/driver_zfs_utils.go Outdated Show resolved Hide resolved
@stgraber
Copy link
Member

zpool -l may be easier for this, assuming it behaves correctly with /dev/null as stdin so it doesn't actually end up blocking and instead fails the import.

@cyphar
Copy link
Member Author

cyphar commented Nov 19, 2024

@stgraber Ah, so you would prefer if the import failed? I assumed you'd prefer a warning (hence all of the logic for figuring out which datasets have non-file:// keylocations). That would make this patch much simpler. Let me test it out...

@stgraber
Copy link
Member

A storage pool failing to start should have it go into a failed state that gets retried so we should be okay with it failing to mount.

@cyphar
Copy link
Member Author

cyphar commented Nov 20, 2024

@stgraber Okay, zpool import -l will return an error if stdin is /dev/null and we have keylocation=prompt. However, the pool will still be imported (just without the keys loaded), so I guess we should export the pool in that case? Or should we leave it in the half-imported state?

EDIT: I think doing load-key separately (but without any of the extra handling) would be nicer because we could do d.Unmount in the error case more cleanly.

@cyphar cyphar marked this pull request as ready for review November 21, 2024 17:36
@cyphar cyphar force-pushed the zfs-loadkeys branch 3 times, most recently from df712a0 to ec1b82d Compare November 21, 2024 18:04
@cyphar cyphar requested a review from stgraber November 21, 2024 18:04
@cyphar cyphar force-pushed the zfs-loadkeys branch 3 times, most recently from a533cb2 to 93f8d3a Compare November 22, 2024 04:29
stgraber and others added 3 commits November 22, 2024 06:52
If a user has set up their own zpools and given them to us to manage,
it's possible they've configured ZFS-native encryption. For the most
part, this works completely transparently to us. However, because we
manually do zpool-import and zpool-export during startup and shutdown of
Incus, ZFS datasets with keys will have their keys unloaded during
shutdown and then the keys are not automatically loaded on startup. This
results in containers being unable to start on startup because all IOs
are blocked indefinitely until the dataset keys are loaded manually by
the admin -- even if the admin has configured automatic key loading on
their system!

The simplest solution would be to pass -l to zfs-import (which causes
ZFS to auto-import all keys for all datasets in the pool). However, it
is slightly nicer to do a separate zfs-load-key so that we can unmount
the pool if the key import fails (zfs-import will leave the pool
imported but without keys loaded).

If the user has configured keylocation=prompt (or otherwise
misconfigured the encryption settings for their pool), the command will
fail and the pool import will fail loudly (rather than silently failing
in the form of an imported pool that is not usable as a filesystem).

Signed-off-by: Aleksa Sarai <[email protected]>
This primarily ensures that we correctly load the dataset keys when
importing a pool, and if we can't import the dataset that the storage
pool is correctly tagged as UNAVAILABLE.

Signed-off-by: Aleksa Sarai <[email protected]>
@stgraber stgraber enabled auto-merge November 22, 2024 07:00
@stgraber stgraber merged commit d55c315 into lxc:main Nov 22, 2024
29 of 30 checks passed
@cyphar cyphar deleted the zfs-loadkeys branch November 22, 2024 10:18
tmeijn pushed a commit to tmeijn/dotfiles that referenced this pull request Dec 13, 2024
This MR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [lxc/incus](https://github.com/lxc/incus) | minor | `v6.7.0` -> `v6.8.0` |

MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot).

**Proposed changes to behavior should be submitted there as MRs.**

---

### Release Notes

<details>
<summary>lxc/incus (lxc/incus)</summary>

### [`v6.8.0`](https://github.com/lxc/incus/releases/tag/v6.8.0): Incus 6.8

[Compare Source](lxc/incus@v6.7.0...v6.8.0)

#### What's Changed

-   exec: Consume websocket pings for stderr by [@&#8203;stefanor](https://github.com/stefanor) in lxc/incus#1380
-   incus-simplestreams: Add prune command by [@&#8203;presztak](https://github.com/presztak) in lxc/incus#1381
-   internal/instance: Fix validation of volatile.cpu.nodes by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1394
-   Add a function to clone map and use it where appropriate by [@&#8203;montag451](https://github.com/montag451) in lxc/incus#1397
-   cgo/process_utils: fix 32bit builds by [@&#8203;brauner](https://github.com/brauner) in lxc/incus#1398
-   Start using goimports by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1399
-   instance/config: Mark user keys as live updatable by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1404
-   incus/internal/server/instance/drivers/: Fix incorrect Vars file mapping in edk2 driver by [@&#8203;cmspam](https://github.com/cmspam) in lxc/incus#1406
-   zfs: load keys for encrypted datasets during pool import by [@&#8203;cyphar](https://github.com/cyphar) in lxc/incus#1384
-   incusd/instance: Lock image access by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1408
-   incus/image: Make use of server-side alias handling by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1409
-   incusd/cluster: Validate cluster HTTPS address on join too by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1411
-   Remove metadata info from space usage calculation by [@&#8203;presztak](https://github.com/presztak) in lxc/incus#1417
-   Add ability to set the initial owner of a custom volume by [@&#8203;presztak](https://github.com/presztak) in lxc/incus#1415
-   Allow local live-migration between storage pools by [@&#8203;presztak](https://github.com/presztak) in lxc/incus#1410
-   incus: Add aliases completion by [@&#8203;montag451](https://github.com/montag451) in lxc/incus#1385
-   golangci: Add local prefixes for goimports by [@&#8203;breml](https://github.com/breml) in lxc/incus#1401
-   client: invalidate simple streams cache by [@&#8203;breml](https://github.com/breml) in lxc/incus#1424
-   incusd/instances_post: Fix cluster internal migrations by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1427
-   Fix DHCP client keeping container up by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1430
-   Add support for VGA console screenshots by [@&#8203;breml](https://github.com/breml) in lxc/incus#1431
-   Add --reuse to incus image import by [@&#8203;presztak](https://github.com/presztak) in lxc/incus#1428
-   Fix random ETag values due to map ordering by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1432
-   incusd/task: Fix wait group logic (more entries than running tasks) by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1433
-   Allow setting aliases during raw image upload by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1434
-   Fixes an issue when copying a custom volume using the `--refresh` flag by [@&#8203;presztak](https://github.com/presztak) in lxc/incus#1437
-   Openfga improvements by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1435
-   doc/instance/properties: Add missing instance properties by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1439
-   incusd/daemon_storage: Ensure corect symlinks for images/backups by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1441
-   incusd/storage/lvm: Handle newer LVM by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1442
-   Tweak rendering of manpage in doc by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1443
-   incusd/storage/lvm: Require 512-bytes physical block size for VM images by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1444
-   incusd: Fill ExpiryDate and remove LastUsedDate in volumeSnapshotToProtobuf by [@&#8203;presztak](https://github.com/presztak) in lxc/incus#1448
-   incusd/device/tpm: Wait for swtpm to be ready by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1447
-   incus: Improve completion for `file push` and `file pull` by [@&#8203;montag451](https://github.com/montag451) in lxc/incus#1445
-   incusd/auth/tls: Restrict config access to non-admin by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1451
-   incusd/storage: Handle default disk size in GetInstanceUsage by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1452
-   incus: Improve completion for some file sub-commmands by [@&#8203;montag451](https://github.com/montag451) in lxc/incus#1453
-   incus: Fix completion for `profile copy` by [@&#8203;montag451](https://github.com/montag451) in lxc/incus#1454
-   incus: Add completion for `image alias` subcommands  by [@&#8203;montag451](https://github.com/montag451) in lxc/incus#1457
-   doc/installing: Update Fedora instructions by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1456
-   Fix gap in validation of pre-existing certificates when switching to PKI mode by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1458
-   doc/network_forwards: Split configuration into own table by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1460
-   chore: Happy path on the left, early return by [@&#8203;breml](https://github.com/breml) in lxc/incus#1461
-   incus: Fix completion for `image alias create` by [@&#8203;montag451](https://github.com/montag451) in lxc/incus#1459
-   incus/top: Ignore CPU idle time by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1462
-   incus: Display the alias expansion when execution of an alias fails by [@&#8203;montag451](https://github.com/montag451) in lxc/incus#1464
-   lint: disallow restricted licenses in go-licenses by [@&#8203;breml](https://github.com/breml) in lxc/incus#1466
-   chore: code structure, Go identifier shaddowing by [@&#8203;breml](https://github.com/breml) in lxc/incus#1465
-   incus: Fix alias arguments handling by [@&#8203;montag451](https://github.com/montag451) in lxc/incus#1463
-   incus/file/push Use SFTP client instead of file API by [@&#8203;HassanAlsamahi](https://github.com/HassanAlsamahi) in lxc/incus#1468
-   Fix TPM fd leaks and OpenFGA patching issue by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1469
-   Clarify device override syntax by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1471
-   incusd/auth/openfga: refresh model before applying patches by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1472
-   Add authorization scriptlet by [@&#8203;bensmrs](https://github.com/bensmrs) in lxc/incus#1412
-   doc: add openSUSE installation instructions by [@&#8203;cyphar](https://github.com/cyphar) in lxc/incus#1475
-   OCI image debugging improvements by [@&#8203;danbiagini](https://github.com/danbiagini) in lxc/incus#1478
-   Add function checks to scriptlet validation by [@&#8203;bensmrs](https://github.com/bensmrs) in lxc/incus#1484
-   incus/project: Fix handling of default (unset) project in `get-current` by [@&#8203;irhndt](https://github.com/irhndt) in lxc/incus#1476
-   Translations update from Hosted Weblate by [@&#8203;weblate](https://github.com/weblate) in lxc/incus#1492
-   Add `--force` flag to the console command by [@&#8203;presztak](https://github.com/presztak) in lxc/incus#1491
-   Accept io.Writer in RenderTable by [@&#8203;breml](https://github.com/breml) in lxc/incus#1490
-   doc/network_bridge: Fix missing escaping around variable by [@&#8203;irhndt](https://github.com/irhndt) in lxc/incus#1493
-   incusd/cluster: Skip project restrictions during join by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1497
-   incusd/instance/lxc: Skip instances without idmap allocation yet by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1495
-   incusd/storage/drivers/common: Truncate/Discard ahead of sparse write by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1496
-   Add AskPassword/AskPasswordOnce to Asker by [@&#8203;breml](https://github.com/breml) in lxc/incus#1499
-   Add additional check to Cancel method for ConsoleShow operation by [@&#8203;presztak](https://github.com/presztak) in lxc/incus#1500
-   Improve console disconnections by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1501
-   Fix duplicate OVN load-balancer entries by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1502
-   Improve SFTP performance by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1503
-   incusd/instance_post: Expand profiles in scriptlet context by [@&#8203;stgraber](https://github.com/stgraber) in lxc/incus#1504

#### New Contributors

-   [@&#8203;stefanor](https://github.com/stefanor) made their first contribution in lxc/incus#1380
-   [@&#8203;brauner](https://github.com/brauner) made their first contribution in lxc/incus#1398
-   [@&#8203;cyphar](https://github.com/cyphar) made their first contribution in lxc/incus#1384
-   [@&#8203;breml](https://github.com/breml) made their first contribution in lxc/incus#1401
-   [@&#8203;danbiagini](https://github.com/danbiagini) made their first contribution in lxc/incus#1478
-   [@&#8203;irhndt](https://github.com/irhndt) made their first contribution in lxc/incus#1476

**Full Changelog**: lxc/incus@v6.7.0...v6.8.0

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this MR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box

---

This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS42Mi42IiwidXBkYXRlZEluVmVyIjoiMzkuNjIuNiIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiUmVub3ZhdGUgQm90Il19-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants