-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encrypted blob store reuse DEK #53352
Encrypted blob store reuse DEK #53352
Conversation
In addition to what has been recently discussed over at #50846 , this PR contains the following two important changes. I am confident that they are good choices, but I still want to raise them for discussion before this PR is ready for review (integration tests are a drag):
@tvernum May you please think about these two issues, and let me know your thoughts? |
Here is my justification for putting the actual AES key in the The chief reason is that it simplifies the code. It's not a great saving, but this is a sensible area of the code. To support textual passwords more code has to be added, in addition to what has been described above. I am willing to change my mind and adopt the textual password again, if others believe it a very important convenience feature, but for now I would continue with the repository secret key implementation if there are no strong objections against it. CC @tvernum (the "others" in the above actually refers to you 😃 ) @nachogiljaldo do you foresee any difference one way or the other for encrypted snapshots in cloud. I believe in your case, textual or binary, repository secrets both must be generated randomly, so it shouldn't matter. |
IIUC this shouldn't make a big difference on our end. |
Let me summarise to make sure I'm understanding you correctly.
At that point I'm a bit lost. That said, progress over perfection. If it is substantially easier to deliver this with raw AES keys in the keystore, then let's do that. |
Thank you for thinking through it @tvernum !
That's correct. I glanced over it, but this is indeed the core of the issue. Next, overall I think there are two ways to look at it: either there is a single salt value, stored in the repository on the storage service, OR there are several salt values and the encrypted blobs must point to the one that was used in their case. My involved description above described the second one, because I think (but I'm not sure) that it's the simplest one to implement.
This discussion got me thinking some more, and I tried to formalize the problem.
That being said, I think you're right and there is enough to be able obtain the KEK from a textual password using as a nonce the random "name" of the DEK. I want to try sketch the code for it, to convince myself first, and then I'll post an update. |
@tvernum I've reverted back to using a password as a repository secret. It turns out we can use the DEK Id, which is generated randomly, as a nonce to compute the KEK from the repository password. This way, every DEK is wrapped by a different KEK. Generating a KEK from the password is expensive, but given the strategy to reuse DEKs, it should amortize to not be relevant in practical use cases. The changes required to move from the raw key to the password are all encapsulated in the 406a722 commit (it's not a big deal, we can switch back if we have second thoughts). From a security standpoint, weak passwords are vulnerable to brute force, because snapshots contain blobs with known plaintext, and PBKDF2 being a glorified hash, effectively turns the encrypted known plaintext into a salted hash. It's easy to see an example of it if you look into the code at how the blob name of the wrapped DEK is generated: the random DEK Id is used as the salt for the PBKDF2 function that generates the KEK, then the KEK is used to "wrap" a known plaintext. In essence, AES key are "vulnerable" to the same thing, but brute force is not something to use when referring to random 32 bytes. Therefore textual passwords are just as secure as AES keys if they are 32 bytes long and generated randomly. I'm still leaning for raw AES keys, if not for avoiding the PBKDF2 code shenanigans, at least for the implicit requirement that the repository secret must be unguessable (i.e. random) and long. If however we stick with passwords, maybe we should rename it to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some (now partly outdated) comments that we mostly already covered on another channel just now
...-azure/src/test/java/org/elasticsearch/repositories/azure/AzureBlobStoreRepositoryTests.java
Outdated
Show resolved
Hide resolved
protected final String createRepository(final String name, final Settings settings) { | ||
final boolean verify = randomBoolean(); | ||
protected final String createRepository(final String name) { | ||
return createRepository(name, Settings.builder().put(repositorySettings()).put(repositorySettings(name)).build(), randomBoolean()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we want to randomly turn off verification in the general case? It seems that takes away from our chances to catch some IO issues here and there? Maybe we should make this always verify and selective turn off verification where we really need it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I honored the original behavior, but you're right it should be true
unless specified otherwise. I've made this change (verification true
instead of randomBoolean
).
@@ -81,13 +83,15 @@ protected Settings repositorySettings() { | |||
return Settings.builder().put("compress", randomBoolean()).build(); | |||
} | |||
|
|||
protected final String createRepository(final String name) { | |||
return createRepository(name, repositorySettings()); | |||
protected Settings repositorySettings(String repositoryName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: Maybe call this additionalRepositorySettings
? Or better yet + in the sense of simplicity, maybe just add the repositoryName
parameter to the existing repositorySettings()
method? That way we don't have to complicate the repo creation code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add the repositoryName parameter to the existing repositorySettings() method
Done, thanks for the suggestion!
internalCluster().getDataOrMasterNodeInstances(RepositoriesService.class).forEach(repositories -> { | ||
RepositoryMissingException e = expectThrows(RepositoryMissingException.class, () -> repositories.repository(name)); | ||
assertThat(e.getMessage(), containsString("missing")); | ||
assertThat(e.getMessage(), containsString(name)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can just assert on e.repository()
here and delete the "missing"
assertion, that part really is implied by the exception type :)
Or you could just not assert on the message at all IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or you could just not assert on the message at all IMO
Done, thanks again!
assertThat(e.getMessage(), containsString(name)); | ||
}); | ||
|
||
return name; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You never use the return here and it seems pointless since it's just the method input?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the return value; I was trying to be consistent with the return value of the createRepository
method.
@@ -175,7 +192,7 @@ public void testList() throws IOException { | |||
BlobMetaData blobMetaData = blobs.get(generated.getKey()); | |||
assertThat(generated.getKey(), blobMetaData, CoreMatchers.notNullValue()); | |||
assertThat(blobMetaData.name(), CoreMatchers.equalTo(generated.getKey())); | |||
assertThat(blobMetaData.length(), CoreMatchers.equalTo(generated.getValue())); | |||
assertThat(blobMetaData.length(), CoreMatchers.equalTo(blobLengthFromContentLength(generated.getValue()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: Maybe it's easier to follow this logic if you just add a method:
assertBlobSize(BlobMeta meta, long contentLength)
and then use it here and override in the encrypted tests? (I think it saves a little complication and it's what we do for a bunch of other repo related assertions)
...c/main/java/org/elasticsearch/repositories/blobstore/ESBlobStoreRepositoryIntegTestCase.java
Outdated
Show resolved
Hide resolved
...ain/java/org/elasticsearch/repositories/blobstore/ESMockAPIBasedRepositoryIntegTestCase.java
Outdated
Show resolved
Hide resolved
...c/main/java/org/elasticsearch/repositories/blobstore/ESBlobStoreRepositoryIntegTestCase.java
Outdated
Show resolved
Hide resolved
return DEKCache.computeIfAbsent(DEKId, ignored -> loadDEK(DEKId)); | ||
} catch (ExecutionException e) { | ||
// some exception types are to be expected | ||
if (e.getCause() instanceof IOException) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I'm not a big fan of this kind of unwrapping. We're losing part of the stacktrace here and it may be painful to understand (test-)failures because of it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used it because the loadDEK
method is careful to throw IOExceptions
when storage has problems and RepositoryException
when there are encryption problems (wrong password or tampered metadata). I've learned that IOException
s in readBlob
can move the repository in a corrupted state, and if we wrap the IOException
for DEK load (which is called before reading the actual blob) that'd amount to a change of behavior in those cases.
Moreover, I don't believe the ExecutionException
wrapping adds useful information.
I have no concerns with that. We may need to think about tooling at some point, but we can ship a first release that relies on external tools to generate keys. I am starting to feel more and more as though cutting out unnecessary complexity is a valuable step to take on this project, so if raw keys does that, then it's probably the right call to make. |
Pinging @elastic/es-security (:Security/Security) |
@original-brownbear @tvernum this is ready for review proper. Thank you @original-brownbear for the first review round. I've acted on all your suggestions. Unfortunately code shifted around a bit since I've applied |
@albertzaharovits @tvernum any objections to simply opening a clean PR for this (including the feature branch changes) against
|
@original-brownbear Thanks for the prompt feedback! The feature branch is clean, and it follows master with a few days of lag, and it does not contain previous approaches (I've not merge any of those). I prefer I keep maintaining the feature branch because it gives me the liberty of adding incremental features, most notably password rotation (but maybe others, such as password min-strength requirements, and implicit repository password name). Some integration tests might also come in a follow-up as "incremental features" (although I don't have any specific examples right now). Yet this PR is big, and does not look "incremental", because it contains the entirety of the code for the functional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @albertzaharovits I gave it a more thorough read now, looks really close + good! Thanks! Didn't find any big issues just a few small + quick points :)
server/src/main/java/org/elasticsearch/common/blobstore/BlobPath.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/common/blobstore/BlobPath.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/repositories/fs/FsBlobStoreRepositoryIntegTests.java
Outdated
Show resolved
Hide resolved
test/framework/src/main/java/org/elasticsearch/common/settings/MockSecureSettings.java
Outdated
Show resolved
Hide resolved
...c/main/java/org/elasticsearch/repositories/blobstore/ESBlobStoreRepositoryIntegTestCase.java
Outdated
Show resolved
Hide resolved
...ry-encrypted/src/main/java/org/elasticsearch/repositories/encrypted/EncryptedRepository.java
Outdated
Show resolved
Hide resolved
...ry-encrypted/src/main/java/org/elasticsearch/repositories/encrypted/EncryptedRepository.java
Outdated
Show resolved
Hide resolved
...rypted/src/main/java/org/elasticsearch/repositories/encrypted/EncryptedRepositoryPlugin.java
Show resolved
Hide resolved
...epository-encrypted/src/main/java/org/elasticsearch/repositories/encrypted/SingleUseKey.java
Outdated
Show resolved
Hide resolved
...epository-encrypted/src/main/java/org/elasticsearch/repositories/encrypted/SingleUseKey.java
Outdated
Show resolved
Hide resolved
Thank you for the feedback @original-brownbear . I have addressed all your suggestions, this is ready for another round. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the iterations on this @albertzaharovits ! => LGTM, I really like the approach now :) => let's merge it to the feature branch IMO!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've had these comments sitting for a couple of weeks, and I really should have submited them a while ago. More coming.
long packetIvCounter = ivBuffer.getLong(Integer.BYTES); | ||
if (packetIvNonce != nonce) { | ||
throw new IOException("Packet nonce mismatch. Expecting [" + nonce + "], but got [" + packetIvNonce + "]."); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify why we don't validate the per-packet nonce anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the explicit nonce check was necessary in order to support using the same DEK for multiple encrypted blobs.
Previously, every blob had its own associated metadata blob, where we stored the nonce, which was explicitly verified here, during decryption. But now there is no place where we can associate metadata for every blob (and this is by design, because associating something to every blob incurs at least one API call and it's difficult to update).
There is no security problem if we're not explicitly checking the nonce, because the nonce is part of every packet's IV and is hence validated when the packet's authn tag is validated, implicitly by the GCM algorithm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense.
Is there still a reason why we store it in the stream then?
...ted/src/main/java/org/elasticsearch/repositories/encrypted/DecryptionPacketsInputStream.java
Show resolved
Hide resolved
...rypted/src/main/java/org/elasticsearch/repositories/encrypted/EncryptedRepositoryPlugin.java
Outdated
Show resolved
Hide resolved
...rypted/src/main/java/org/elasticsearch/repositories/encrypted/EncryptedRepositoryPlugin.java
Outdated
Show resolved
Hide resolved
...rypted/src/main/java/org/elasticsearch/repositories/encrypted/EncryptedRepositoryPlugin.java
Outdated
Show resolved
Hide resolved
...rypted/src/main/java/org/elasticsearch/repositories/encrypted/EncryptedRepositoryPlugin.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made it through everything other than the crypto. I'll review that next.
...rypted/src/main/java/org/elasticsearch/repositories/encrypted/EncryptedRepositoryPlugin.java
Outdated
Show resolved
Hide resolved
...epository-encrypted/src/main/java/org/elasticsearch/repositories/encrypted/SingleUseKey.java
Outdated
Show resolved
Hide resolved
...epository-encrypted/src/main/java/org/elasticsearch/repositories/encrypted/SingleUseKey.java
Show resolved
Hide resolved
AtomicReference<SingleUseKey> keyCurrentlyInUse | ||
) { | ||
final Object lock = new Object(); | ||
return () -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like a complicated way to produce a Supplier
of a set of keys with a sequential nonce.
Does it really need the complexity of AtomicReference
over a synchronized
method?
It feels like the general concept could be written quite simply with a class that extends Supplier
, has an internal counter and is synchronized on get
(or uses an AtomicInteger
for the counter).
I presume there is a reason why you took this approach instead, but the code doesn't tell me why that is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like the general concept could be written quite simply with a class that extends Supplier, has an internal counter and is synchronized on get (or uses an AtomicInteger for the counter).
This is pretty much what the code here does; here it is without production ornaments:
return () -> {
final SingleUseKey nonceAndKey = keyCurrentlyInUse.getAndUpdate(
prev -> prev.nonce < MAX_NONCE ? new SingleUseKey(prev.keyId, prev.key, prev.nonce + 1) : EXPIRED_KEY
);
if (nonceAndKey.nonce < MAX_NONCE) {
return nonceAndKey;
} else {
synchronized (lock) {
if (keyCurrentlyInUse.get().nonce == MAX_NONCE) {
final Tuple<BytesReference, SecretKey> newKey = keyGenerator.get();
keyCurrentlyInUse.set(new SingleUseKey(newKey.v1(), newKey.v2(), MIN_NONCE));
}
}
}
};
Instead of extending Supplier this uses the fact that Supplier is a FunctionalInterface
. The internal counter is the nonce itself. Instead of synchronizing on get this uses an AtomicReference
as an AtomicInteger
. The remaining code is to generate a new key and reset the counter. And in this last code is where the preference for AtomicReference
over of AtomicInteger
shows off because it allows to atomically set the key and the nonce.
I have added a few more comments. I hope this all makes more sense now.
Do you think I should change something here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense. I think I was missing the fact that you need to generate a new base key when the nonce rolls over. but I do think this code is unnecessarily difficult to follow.
That's a combination of trying too hard to do it all in a lambda and trying to keep the locks really small. Is that necessary?
Couldn't we have:
return new CheckedSupplier<SingleUseKey, T>() {
private Tuple<BytesReference, SecretKey> key = null;
private int nonce = 0;
public synchronized SingleUseKey get() throws T() {
if( key == null || nonce == MAX_NONCE) {
key = keyGenerator.get();
nonce = MIN_NONCE;
} else {
nonce++;
}
return new SingleUseKey(key.v1(), key.v2(), nonce);
}
};
@tvernum I have acted on all review points. |
ac724cf
to
dd3e8fb
Compare
1979ebc
to
01d7698
Compare
dd3e8fb
to
40dab2d
Compare
01d7698
to
cf8c7fd
Compare
…-DEKs-universally-reformated
…arch/repositories/encrypted/AESKeyUtilsTests.java Co-authored-by: Tim Vernum <[email protected]>
…-DEKs-universally-reformated
@albertzaharovits I see some change to the way the KEK is established. Originally it looks like this was a 256-bit AES key, and now it's:
Can you provide more detail about where the KEK comes from? How is it generated? How is it stored? I think the origin and security of the KEK itself is a key component of this feature. |
@choobinejad I've created a sketch diagram in #41910 (comment) to try to explain it. Sorry it took so long.... The KEK is generated from the password using a PBKDF2 algorithm, where the salt is the id of the DEK for which the KEK is used to wrap it; so you've got a 1-1 relationship between DEKs and KEKs. KEKs are never stored, they're only in memory as a proxy to allow using the same password to encrypt DEKs. |
Thanks @albertzaharovits, that info and the sketch are really useful! |
EDITED 20.03.20 the repository secret is a textual password not a binary AES key
Follows the approach described in #50846 (comment)
This builds upon the data encryption streams from #49896 to create an encrypted snapshot repository.
The repository encryption works with the following existing repository types: FS, Azure, S3, GCS (possibly works with HDFS and URL, but these are not tested).
The encrypted repository is protected by a
256-bit AES keypassword stored on every node's keystore. The repository keys (KEK - key encryption key) are generated from the password using the PBKDF2 function, and are used to encrypt (using the AES Wrap algorithm) other symmetric keys (referred to as DEK - data encryption keys) which are themselves used to encrypt the blobs of the regular snapshot.The platinum license is required to snapshot to the encrypted repository, but no license is required to list or restore already encrypted snapshots.
Example of how to use the Encrypted FS Repository:
./gradlew :distribution:archives:no-jdk-darwin-tar:assemble
)elasticsearch.yml
conf file (on all the cluster nodes), eg:path.repo: ["/tmp/repo"]
xpack.security.license.self_generated.type: true
in theelasticsearch.yml
filegenerate the master 256-bit repository key (KEK)keypassword inside theelasticsearch.keystore
, on every cluster's node, eg for thetest_enc_key
test_enc_pass
repositorykeypassword name:Relates #49896
Relates #41910
Obsoletes #50846 #48221