Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce a KeyManager #2990

Merged
merged 5 commits into from
May 8, 2023
Merged

Introduce a KeyManager #2990

merged 5 commits into from
May 8, 2023

Conversation

andrewjstone
Copy link
Contributor

The KeyManager will be used by sled-agent to retrieve rack secrets via an implementation of a SecretRetriever and generate derived keys.

Currently the only keys produced are those necessary for U.2 drive encryption, and follow the derivation scheme in section 4 of RFD 301.

For initail implementation, input key material (IKM) will be retrieved from a SecretRetriever backed by a sled local mechanism. Future implementations of SecretRetriever will use the trust quorum rack secret retrieved from the bootstore.

@andrewjstone andrewjstone requested review from jgallagher and smklein May 2, 2023 22:54
The `KeyManager` will be used by sled-agent to retrieve rack secrets via
an implementation of a `SecretRetriever` and generate derived keys.

Currently the only keys produced are those necessary for U.2 drive
encryption, and follow the derivation scheme in section 4 of RFD 301.

For initail implementation, input key material (IKM) will be retrieved
from a `SecretRetriever` backed by a sled local mechanism. Future
implementations of `SecretRetriever` will use the trust quorum rack
secret retrieved from the bootstore.
/// upon construction. We sparate the two types, because a `Secret` must contain
/// `Zeroizable` data, and a `Box<[u8; 32]>` is not zeroizable on its own.
#[derive(Zeroize, ZeroizeOnDrop)]
pub struct Ikm(pub Box<[u8; 32]>);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the omission of Debug impls is intentional. Would it be useful to implement Debug manually in a way that does not reveal any data? I can think of a couple benefits but am not sure it's completely worth it:

  1. types containing any of these values can derive Debug
  2. it's very obvious that the Debug impl is special to avoid leaking information

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely intentional. I'm also not sure why you'd want a Debug impl here. This is a secret and we don't want to log it by accident. The secret is further protected in the Secret type below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So one other thing here. The secrets themselves are meant to be transient, and shouldn't be stored in any longer lived data structure. The IKM gets created once and put in a secret, where it is fed into Hkdf and discarded.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I just default to assuming everything should impl Debug, because it's really annoying when I try to derive(Debug) and discover some field I have doesn't. In this case there's good reason not to, which is fine - maybe just add a comment of the "WARNING DO NOT IMPLEMENT DEBUG" variety for future maintainers? My proposal to manually implement Debug was to avoid that but maybe it's correct here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good point. I'll just derive it to avoid the comment and problems :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good point. I'll just derive it to avoid the comment and problems :)

I'll impl it, not derive. Sigh.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6abfb00

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually removed the Debug impl again and instead made this type private in 79cf577

key-manager/src/lib.rs Outdated Show resolved Hide resolved
key-manager/src/lib.rs Outdated Show resolved Hide resolved
Comment on lines +18 to +19
/// This should never be used directly, and always wrapped in a `Secret<Ikm>`
/// upon construction. We sparate the two types, because a `Secret` must contain
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is true, why expose this type as pub? Why not just define a pub type alias to the secret, and keep this private?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great call. This should not be pub at all. Neither should Aes256GcmDiskEncryptionKey. I made both of them private and added accessors to their versioned counterparts which are the types a user sees.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 79cf577

}

/// Derived Disk Encryption key
#[derive(Zeroize, ZeroizeOnDrop, Default)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dumb question, but why should this derive default? Seems potentially dangerous to make it "easy" to hand out invalid keys

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to allocate the memory first so we can write into it. The only other alternatives are to have our own
Aes256GcmDiskEncryptionKey::new() which does the same thing or do the following inline: let mut key = Aes256GcmDiskEncryptionKey(Box::new([0u8; 32]));

Neither one seems really better than default to me. And in the latest version of the code I made the type private. Since we only construct it in one place, this isn't that dangerous.

Comment on lines +183 to +186
/// This is useful when a new entity is being encrypted and there is no need
/// for a reconfiguration. When an entity is already encrypted, and needs to
/// be decrypted, the user should instead call the [`SecretRetriever::get`].
async fn get_latest(&self) -> Result<VersionedIkm, SecretRetrieverError>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why isn't this returning an epoch? (... which could potentially be passed to Self::get?)

What happens if this is called while a reconfiguration is happening?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VersionedIkm contains the epoch.

What happens if this is called when reconfiguration is happening is the same as when it isn't. The latest secret will be returned as that's what the caller asked for. In general this should only be used for configuring a new device and so the latest is what the user actually wants. They have no use for older secrets.

In general, we can't deal with reconfiguration entirely within this part of the code. Even the call to get is only optimistic. There's an inherent race condition where a reconfiguration can be started after the caller gets the values of secrets back. It's the responsibility of the system supplying the input key material to ensure that that is fine and keys can be rotated one after another for any still available epochs (up to a policy limit). In trust quorum, where we expect to use this, the sled-agent will get informed of a new configuration and trigger a rekey again.

andrewjstone added a commit that referenced this pull request May 8, 2023
Encryption of U.2 devices relies on key generated by the `KeyManager`
introduced in #2990. The `KeyManager` is parameterized by a
`SecretRetriever`, which provides input key material (IKM) that the
`KeyManager` can use to derive keys. In this commit we introduce a
`LocalSecretRetriever` which provides secrets local to a sled. Right
now the secret is hardcoded, but we intend to either store random
secrets on the M.2s or retrieve them from VPD data in the short
term. Longer term, we will replace the `LocalSecretRetriever` with a
`TrustQuorumSecretRetriever` which will provide a rack secret recomputed
from trust quorum key shares.

The `KeyManger` has been  extended to run in an tokio task and process
requests for from different "requesters". Each requester only has the
ability to request keys that it requires to do its job. Currently there
is a single requester, `StorageKeyRequester`, that is owned by the
`StorageManager` which loans it out to `sled_hardware::Disk` to request
keys on demand.

`sled_hardware::Disk` requests keys for both creation of encrypted
datasets and mounting of existing encrypted datasets. In order to
actually allow ZFS to use the generated keys, it creates keyfiles inside
a RAMDisk. Attempts are made to limit the lifetime of these files as
well as accidental leakage of secrets.

Changes have also been made to `illumos_utils::Zfs` to support
encrytpion. To support future secret reconfiguration with trust quorum
we store an additional `epoch` property on each encrypted dataset, which
allows the `KeyManager` to retrieve the right vesion of the secret from
a `SecretRetriever`.
@andrewjstone andrewjstone merged commit 21b31c8 into main May 8, 2023
@andrewjstone andrewjstone deleted the key-manager branch May 8, 2023 18:47
andrewjstone added a commit that referenced this pull request May 8, 2023
Encryption of U.2 devices relies on key generated by the `KeyManager`
introduced in #2990. The `KeyManager` is parameterized by a
`SecretRetriever`, which provides input key material (IKM) that the
`KeyManager` can use to derive keys. In this commit we introduce a
`LocalSecretRetriever` which provides secrets local to a sled. Right
now the secret is hardcoded, but we intend to either store random
secrets on the M.2s or retrieve them from VPD data in the short
term. Longer term, we will replace the `LocalSecretRetriever` with a
`TrustQuorumSecretRetriever` which will provide a rack secret recomputed
from trust quorum key shares.

The `KeyManger` has been  extended to run in an tokio task and process
requests for from different "requesters". Each requester only has the
ability to request keys that it requires to do its job. Currently there
is a single requester, `StorageKeyRequester`, that is owned by the
`StorageManager` which loans it out to `sled_hardware::Disk` to request
keys on demand.

`sled_hardware::Disk` requests keys for both creation of encrypted
datasets and mounting of existing encrypted datasets. In order to
actually allow ZFS to use the generated keys, it creates keyfiles inside
a RAMDisk. Attempts are made to limit the lifetime of these files as
well as accidental leakage of secrets.

Changes have also been made to `illumos_utils::Zfs` to support
encrytpion. To support future secret reconfiguration with trust quorum
we store an additional `epoch` property on each encrypted dataset, which
allows the `KeyManager` to retrieve the right vesion of the secret from
a `SecretRetriever`.
andrewjstone added a commit that referenced this pull request May 8, 2023
Encryption of U.2 devices relies on key generated by the `KeyManager`
introduced in #2990. The `KeyManager` is parameterized by a
`SecretRetriever`, which provides input key material (IKM) that the
`KeyManager` can use to derive keys. In this commit we introduce a
`LocalSecretRetriever` which provides secrets local to a sled. Right
now the secret is hardcoded, but we intend to either store random
secrets on the M.2s or retrieve them from VPD data in the short
term. Longer term, we will replace the `LocalSecretRetriever` with a
`TrustQuorumSecretRetriever` which will provide a rack secret recomputed
from trust quorum key shares.

The `KeyManger` has been  extended to run in an tokio task and process
requests for from different "requesters". Each requester only has the
ability to request keys that it requires to do its job. Currently there
is a single requester, `StorageKeyRequester`, that is owned by the
`StorageManager` which loans it out to `sled_hardware::Disk` to request
keys on demand.

`sled_hardware::Disk` requests keys for both creation of encrypted
datasets and mounting of existing encrypted datasets. In order to
actually allow ZFS to use the generated keys, it creates keyfiles inside
a RAMDisk. Attempts are made to limit the lifetime of these files as
well as accidental leakage of secrets.

Changes have also been made to `illumos_utils::Zfs` to support
encrytpion. To support future secret reconfiguration with trust quorum
we store an additional `epoch` property on each encrypted dataset, which
allows the `KeyManager` to retrieve the right vesion of the secret from
a `SecretRetriever`.
andrewjstone added a commit that referenced this pull request Jun 1, 2023
Encryption of U.2 devices relies on key generated by the `KeyManager`
introduced in #2990. The
`KeyManager` is parameterized by a
`SecretRetriever`, which provides input key material (IKM) that the
`KeyManager` can use to derive keys. In this commit we introduce a
`LocalSecretRetriever` which provides secrets local to a sled. Right
now the secret is hardcoded, but we intend to either store random
secrets on the M.2s or retrieve them from VPD data in the short
term. Longer term, we will replace the `LocalSecretRetriever` with a
`TrustQuorumSecretRetriever` which will provide a rack secret recomputed
from trust quorum key shares.

The `KeyManger` has been  extended to run in an tokio task and process
requests for from different "requesters". Each requester only has the
ability to request keys that it requires to do its job. Currently there
is a single requester, `StorageKeyRequester`, that is owned by the
`StorageManager` which loans it out to `sled_hardware::Disk` to request
keys on demand.

`sled_hardware::Disk` requests keys for both creation of encrypted
datasets and mounting of existing encrypted datasets. In order to
actually allow ZFS to use the generated keys, it creates keyfiles inside
a RAMDisk. Attempts are made to limit the lifetime of these files as
well as accidental leakage of secrets.

Changes have also been made to `illumos_utils::Zfs` to support
encryption. To support future secret reconfiguration with trust quorum
we store an additional `epoch` property on each encrypted dataset, which
allows the `KeyManager` to retrieve the right vesion of the secret from
a `SecretRetriever`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants