-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for the encryption key rotation to the encrypted saved objects. #72420
Conversation
cb5a066
to
a0a1344
Compare
a0a1344
to
b9d07de
Compare
b9d07de
to
79c45cb
Compare
@@ -37,7 +53,8 @@ export function createConfig$(context: PluginInitializerContext) { | |||
} | |||
|
|||
return { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: the config object we return here is supposed to be a convenient wrapper around config related stuff used by plugin internally, so it's not required to reflect raw config shape exactly, so I reshaped it slightly.
// Keeps track of object IDs that have been processed already. | ||
const processedObjectIDs = new Set<string>(); | ||
|
||
// Until we get scroll/search_after support in Saved Objects client we have to retrieve as much objects as allowed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: in an absence of scroll/search_after support that's the only reasonable workaround I could come up with so far. Happy to hear other ideas, critique, objects, concerns! I tried to explain the approach in details in this comment, but let me know if something isn't clear.
}); | ||
|
||
// Keeps track of object IDs that have been processed already. | ||
const processedObjectIDs = new Set<string>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: even though this set can get quite large, I believe it shouldn't become a problem memory wise since we store only IDs....
Hey @legrego! I believe PR is ready for the preliminary feedback whenever you have time (no tests yet). I tried to outline all the details in the description and additional comments, but basically there are two parts that I'd like to get your opinion on:
|
Thanks @azasypkin! I'll start taking a look tomorrow at the latest, so that you can have some preliminary feedback for next week |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better late than never (sorry!), but here's some preliminary feedback for you. I did not have a chance to run this yet, but will do so as part of the next review round. Looks great so far! I appreciate all of the thought & work that went into this. You made it look easy 😄
} | ||
|
||
export function defineRoutes(params: RouteDefinitionParams) { | ||
if (params.config.keyRotation.decryptionOnlyKeys.length > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than conditionally defining this route, what do you think about returning a 400
or similar, with an error message that describes why we can't fulfill the request?
Perhaps something like:
Kibana is not configured to support encryption key rotation. Update
kibana.yml
to includexpack.encryptedSavedObjects.keyRotation.decryptionOnlyKeys
to rotate your encryption keys.
I could see the conditional definition being hard to diagnose if there are multiple instances behind a load-balancer, and only one of them is misconfigured. That would result in intermittent 404
responses from this endpoint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could see the conditional definition being hard to diagnose if there are multiple instances behind a load-balancer, and only one of them is misconfigured. That would result in intermittent 404 responses from this endpoint
++, didn't like that as well, but didn't have a solution yet. Sounds like a good proposal, let me try to do that.
x-pack/plugins/encrypted_saved_objects/server/crypto/encrypted_saved_objects_service.ts
Show resolved
Hide resolved
} catch (err) { | ||
// Remember the error thrown when we tried to decrypt with the primary key. | ||
if (!decryptionError) { | ||
decryptionError = err; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above: Should we check to make sure that this is an EncryptionError
, and not some other unexpected failure condition?
|
||
// This list of cryptos consists of the primary crypto that is used for both encryption and | ||
// decryption, and the optional secondary cryptos that are used for decryption only. | ||
const cryptos = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love this, but given its usage on the encryption service, it feels tolerable. I worry about consumers having to know that the first entry can be used for encryption, and all other keys can only be used for decryption.
What do you think about creating a "patched" version of the "decryption only" crypto instances to throw an error if we ever attempt to call one of the encryption functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love this, but given its usage on the encryption service, it feels tolerable. I worry about consumers having to know that the first entry can be used for encryption, and all other keys can only be used for decryption.
I see, we can try to split this constructor argument into two separate ones and join inside constructor so that for consumers it will be something like primaryCrypto
and optional decryptionOnlyCryptos
?
What do you think about creating a "patched" version of the "decryption only" crypto instances to throw an error if we ever attempt to call one of the encryption functions?
You mean wrap/patch them inside of the constructor? If so, yeah I can try to do that and we'll see how that looks like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean wrap/patch them inside of the constructor? If so, yeah I can try to do that and we'll see how that looks like.
Yeah something like that is what I had in mind.
I see, we can try to split this constructor argument into two separate ones and join inside constructor so that for consumers it will be something like primaryCrypto and optional decryptionOnlyCryptos?
I'd be happy with either of these approaches -- don't feel like you have to implement both. Whichever you feel is better is good with me 👍
}); | ||
} catch (err) { | ||
logger.error(err); | ||
return response.internalError(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this is an administrative endpoint, it might make sense to return some information about the failure. Are you aware of failure scenarios that would expose information we don't want leaked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you aware of failure scenarios that would expose information we don't want leaked?
Can't think of any. Agree, we can return custom
error here instead that would wrap the original error. Will do.
schema.string({ minLength: 32, defaultValue: 'a'.repeat(32) }) | ||
), | ||
keyRotation: schema.object({ | ||
decryptionOnlyKeys: schema.arrayOf(schema.string({ minLength: 32 }), { defaultValue: [] }), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to add this to kibana_docker
, and allow on ESS once this merges.
/** | ||
* Indicates whether decryption should only be performed using secondary decryption-only keys. | ||
*/ | ||
useDecryptionOnlyKeys?: boolean; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think reversing this would make it easier to understand:
useDecryptionOnlyKeys?: boolean; | |
includePrimaryEncryptionKey?: boolean; |
or if you didn't want to reverse it, something like:
useDecryptionOnlyKeys?: boolean; | |
omitPrimaryEncryptionKey?: boolean; |
or (my least favorite suggestion)
useDecryptionOnlyKeys?: boolean; | |
onlyUseDecryptionOnlyKeys?: boolean; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
omitPrimaryEncryptionKey?: boolean;
Sounds good 👍
…n route, prevent multiple concurrent rotation operation, limit max number of batches during rotation and more, add tests.
Hey @legrego, I believe I've handled all the comments you left and PR is ready for review whenever you have time. Thanks! |
@@ -0,0 +1,186 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: Essentially I have 6 Saved Objects here: 2 x "single-namespace", 2 x "multiple-namespaces", 2 x "hidden". Three of them are encrypted with 'a'.repeat(32)
encryption key and three with 'b'.repeat(32)
.
@@ -437,15 +437,15 @@ export default function ({ getService }: FtrProviderContext) { | |||
}:${id}`; | |||
} | |||
|
|||
afterEach(async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: we need this hook just for basic tests, not for migration and key rotation tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docker config change LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks and works great! Just a couple of nits/questions for you, but otherwise I think we're good to go!
} | ||
} | ||
|
||
this.options.logger.debug( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the infrequency of this operation, we can likely make this informational
this.options.logger.debug( | |
this.options.logger.info( |
|
||
const result = { total: 0, successful: 0, failed: 0 }; | ||
if (registeredSavedObjectTypes.length === 0) { | ||
this.options.logger.debug( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the infrequency of this operation, we can likely make this informational
this.options.logger.debug( | |
this.options.logger.info( |
return result; | ||
} | ||
|
||
this.options.logger.debug( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the infrequency of this operation, we can likely make this informational
this.options.logger.debug( | |
this.options.logger.info( |
let rotationInProgress = false; | ||
router.post( | ||
{ | ||
path: '/api/encrypted_saved_objects/rotate_key', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: We've tried to prefix our non-REST "RPC-style" calls with an underscore:
path: '/api/encrypted_saved_objects/rotate_key', | |
path: '/api/encrypted_saved_objects/_rotate_key', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah, good point, thanks!
}, | ||
}, | ||
async (context, request, response) => { | ||
if (config.keyRotation.decryptionOnlyKeys.length === 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏
let routeHandler: RequestHandler<any, any, any>; | ||
let routeConfig: RouteConfig<any, any, any, any>; | ||
beforeEach(() => { | ||
const [rotateRouteConfig, rotateRouteHandler] = router.post.mock.calls.find( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh I like this approach to locating routes a lot better than the positional lookups we've done in the past
let routeConfig: RouteConfig<any, any, any, any>; | ||
beforeEach(() => { | ||
const [rotateRouteConfig, rotateRouteHandler] = router.post.mock.calls.find( | ||
([{ path }]) => path === '/api/encrypted_saved_objects/rotate_key' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
([{ path }]) => path === '/api/encrypted_saved_objects/rotate_key' | |
([{ path }]) => path === '/api/encrypted_saved_objects/_rotate_key' |
options: { body: { total: 3, successful: 6, failed: 0 } }, | ||
}); | ||
|
||
// And consequent requests resolve properly too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super-nit
// And consequent requests resolve properly too. | |
// And subsequent requests resolve properly too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦 thanks!
([, , , hiddenSavedObject]) => !hiddenSavedObject | ||
)) { | ||
const url = hidden | ||
? `/api/saved_objects/${type}/${id}` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you intentionally filtering out hidden saved objects in this test? url
is always resolving to the same value here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, just oversight, was initially thinking that there is some trick to force this endpoint to return hidden types as well and wanted to preserve this logic. Will remove, thanks!
@@ -529,5 +537,104 @@ export default function ({ getService }: FtrProviderContext) { | |||
}); | |||
}); | |||
}); | |||
|
|||
describe('key rotation', () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a test that verifies that a user with just the kibana_admin
role can't invoke the endpoint?
The api access check we have now is relying in a subtle implementation detail, so it'd be good to verify this continues to work going forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great idea, let me try to figure out how to do that.
…e proper naming convention for the rotate route URL and more.
Hmm, have no idea why type check out of blue started to complain like that .... Looking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM on green CI. Nice job!
💚 Build SucceededMetrics [docs]distributable file count
History
To update your PR or re-run it, just comment with: |
7.x/7.10.0: 07e2720 |
Hey @elastic/kibana-qa, Just wanted to ping you to check if you can include few test cases that are related to the new functionality introduced in this PR. We can probably join it with the test scenarios you may have for Alerting already (cc @elastic/kibana-alerting-services for better and real examples). We basically want to test that:
POST http://localhost:5601/api/encrypted_saved_objects/_rotate_key
Authorization: XXXX
Accept: application/json
Content-Type: application/json
kbn-xsrf: true
Thanks! |
@azasypkin I looked into this from an automation side, I let Lee know there were start/stop functions for the ES and KBN servers in our Kibana test package. Then, I spoke to Spencer to get more details to see if this can be used to do this type of testing and he referred me that this test coverage can be added as part of our jest integration tests, since it supports starting and stopping Elasticsearch and Kibana servers, it needs a few additions to support reusing the configuration, but looks possible. However, the jest integration tests are currently broken, but @spalger and @tylersmalley are working to get this fixed soon, we will know more once that is done. Thanks. |
@azasypkin are there user-facing docs for this functionality? If so could you pls post a link here. |
Thanks for looking into this! Currently we cover major parts of this functionality with unit and API integration tests, but having ability to cover entire flow would be great. At this point we want to raise awareness for this new flow even if we cannot automate its testing entirely.
Alerting Team mentions encryption functionality in the docs and planning to document key rotation in the scope of this issue. We also have a public RFC that gives a more general overview of this functionality. |
I have a couple of concerns here;
I'm thinking that if Kibana has to always be restarted to use a new key, we should just document to follow one consistent process like;
|
@azasypkin am I correct that this would be the process we'd recommend?
the PR and RFC cover all the details, but I think @LeeDr is correct it's not clear what our recommended steps are to users. On cloud what is the expectation? I don't believe we expose cc @bmcconaghy who is updating the alerting docs for 7.10 |
Referenced cloud issue |
@peterschretlen your understanding is correct. We have a number of documentation changes to make which we were planning on addressing once we got through feature freeze. This was going to be my primary focus over the next couple of weeks, as we have a number of exciting(?) changes that we need to educate our users about.
That's correct - as of now, we do not allow the encryption key to be set by deployment administrators. If they wish to use a custom encryption key, then they can contact support to have that done. @azasypkin was discussing this with Cloud earlier today, and he is drafting instructions for support/cloud to follow whenever the encryption key needs to be rotated. |
Summary
In this PR we'll introduce rotation support for the encryption keys used by Encrypted Saved Objects plugin (ESO).
Today if we cannot decrypt attributes (e.g. when encryption key changes) we either fail operation (when
getDecryptedAsInternalUser()
is used) or strip encrypted attributes from the response and record decryption error (when standard Saved Objects (SO) APIs are used for attributes that should be automatically decrypted).Having a dedicated error type allows consumers to gracefully handle such errors (e.g. prompt user to re-enter data that needs to be encrypted). This approach works reasonably well in some scenarios, but it may be a very troublesome if we have to deal with lots of SOs. Moreover we'd like to recommend to periodically rotate encryption keys even if they aren't leaked, but before we do so we need to provide a way of seamless migration of the existing encrypted SOs to a new encryption key.
There are two possible scenarios here:
Encryption key is lost
In this scenario encrypted portion of the existing SOs will be lost completely and the only way to recover from this state is manual intervention described above and hence ESO consumers should continue supporting this scenario.
Note: in some cases leaked, but known encryption key can be treated as the lost key, for example when there is a risk that encrypted data was tampered with and it's risky to continue relying on it.
Encryption key is known, but needs to be rotated
In this scenario a new encryption key (primary encryption key) will be generated and Kibana will use it to encrypt new or updated SOs. Kibana will still need to know the old encryption key to be able to decrypt existing data, but this key will no longer be used to encrypt any of the new or existing SOs. It's also should be possible to have multiple old decryption-only keys.
Having multiple decryption keys at the same time brings one problem though: we need to figure out which key to use to decrypt specific SO. If our encryption keys could have a unique ID that we would store together with the encrypted data (we cannot use encryption key hash for that for obvious reasons) we could know for sure which key to use, but we don't have such functionality right now and it may not be the easiest one to manage through
yml
configuration anyway.The other solution is to try available existing decryption keys one by one to decrypt data (always starting from the primary encryption key). That's the easiest solution, but as you'll see in the
Explorations
section below it comes with the cost.So when the key rotation is required configuration could look like this:
Depending on the organization policies
decryptionOnlyKeys
array should be eventually cleaned up and removed from thekibana.yml
.One important thing to note here is that being able to decrypt with the old encryption key at run time can also be helpful in scenarios when SOs were created by two different Kibana instances that use different encryption keys. That's likely to happen only during initial Elastic Stack setup stage though:
But since it's expensive to try multiple decryption keys to decrypt SO encrypted with the old key whenever it's accessed we should also provide a way to migrate all SOs to a new encryption key at once.
For example we could introduce a new
kibana.yml
option that would tell ESO to re-encrypt everything it knows about with the new key on start and don't usedecryptionOnlyKeys
after that (or keep using if decryption fails, not a big deal). But I don't like that it's not really convenient and would require restart. Assuming we generally want to have a more UI-driven user friendly configuration we may want to introduce UI for something like this in the future and hence we'll need an API endpoint that would support such UI.That's why I'm leaning towards introducing a protected API endpoint that would perform bulk re-encryption on-demand/by:
This way admins would have more control on when and how to perform re-encryption and we can respond with more useful stat as well.
Explorations
With the current WIP I tried to gather some data regarding the performance impact of multiple decryption attempts. The difference isn't definitely not catastrophic when you request SOs separately, but it may add up in bulk operations and search/find.
It's worth mentioning that we decrypt in bulk operations only when SO type registered attributes with
dangerouslyExposeValue
and to best of my knowledge nobody is using this API yet. Here are my explorations nevertheless:Input
I ingested 1000 of simple SOs with a single attribute that needs to be decrypted during bulk operations (
publicPropertyStoredEncrypted
):I'm using standard
_find
operation to fetch all of these 1000 SOs at once and measure the time it takes Kibana to process (don't pay too much attention to the absolute values since it's a Kibana in a dev mode with a verbose logging enabled, the difference is what matters here). So the basicautocannon
snippet I used looks like this (100 requests in total with 10 simulatenous connections):Results
Successful decryption with the primary key
Failed decryption with the primary key following successful with the old key
So with two decryption attempts ESO is twice as slow as with a single attempt. I didn't dig into
@elastic/node-crypto
to understand why the difference is that large yet, but that's what we have right now.RFC: #72828
Fixes: #56889
Affected by: #77961
Release note:
xpack.encryptedSavedObjects.encryptionKey
can now be rotated without losing access to existing encrypted Saved Objects (alerts, actions etc.). Old key(s) can be moved toxpack.encryptedSavedObjects.keyRotation.decryptionOnlyKeys
to be used only to decrypt existing objects while new or updated objects will be encrypted using new primary encryption key. Administrators can also use dedicated API endpoint/api/encrypted_saved_objects/_rotate_key
to trigger re-encryption of all existing objects with a new primary key so that old keys can be safely disposed.