Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encrypted saved objects encryption key gets generated by default #56448

Closed
mikecote opened this issue Jan 30, 2020 · 13 comments
Closed

Encrypted saved objects encryption key gets generated by default #56448

mikecote opened this issue Jan 30, 2020 · 13 comments
Labels
discuss Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) Team:Security Team focused on: Auth, Users, Roles, Spaces, Audit Logging, and more!

Comments

@mikecote
Copy link
Contributor

Definitions

ESO = Encrypted Saved Objects

Problem

With alerting being built on top of ESO and SIEM using alerts for their detection engine. We have a blocking issue for 7.6 where the detection engine stops working after Kibana restarts, because an encryption key is being reset.

This discuss issue will be focused on the problem that there is data loss happening on alerts if administrators don't setup their installation properly. The feature of generating an encryption key also comes with the feature of losing your data on restart that users of alerting need to be aware of.

Generating encryption keys doesn't communicate to administrators that the following won't work:

  • Running multiple instances of Kibana
  • CRUD on alerts after restarting Kibana (administrators don't realize alerts use ESO)

While also not having any warnings in the following scenarios:

  • No logs on cloud to indicate there will be data loss when xpack.encrypted_saved_objects.encryptionKey is not set
  • dev mode due to a static encryptionKey being used

Options

We're exploring options to prevent users from creating alerts in such scenarios to avoid losing their data as well as exploring a way to provide SIEM the tools they need to prevent users from setting up the detection engine from the UI. Some of the options we're exploring so far are:

  1. Disable the alert APIs whenever ESO is running with a generated encryption key. Expose via a property or function that returns a boolean indicating if the API is disabled in this scenario (to be used by SIEM). There is currently no way to find this out but would require some code changes in the ESO plugin to support this.

  2. Prevent CRUD on any ESO whenever a generated encryption key is being used (in other words, removing generated encryption keys). This option would automatically work for our alerting and actions plugin and seems to be a better approach by preventing the user from creating data that will be lost on restart.

  3. Some other fresh idea, maybe I'll have something better tomorrow 🙂

cc @peterschretlen

@mikecote mikecote added blocker discuss Team:Security Team focused on: Auth, Users, Roles, Spaces, Audit Logging, and more! v7.6.0 Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Jan 30, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-security (Team:Security)

@mikecote
Copy link
Contributor Author

cc @FrankHassanabad

@kobelb
Copy link
Contributor

kobelb commented Jan 30, 2020

Encryption keys being invalid can occur for a number of reasons:

  • There's an automatically generated encryption key and Kibana restarted
  • There's a HA deployment of Kibana and the encryption key isn't synchronized between all the instances
  • The user intentionally changed the encryption key because the original encryption key was accidentally leaked

I do agree that we should improve the UX when there is an automatically generated encryption key and do a better job at warning the user. However, I don't think we should rely upon this warning as the sole solution to encryption keys being invalid, as there are other situations where this can occur.

Also, I'm not sure I'm following the part about "data loss". Granted, alerts will not be able to run during this time period, and it will require user intervention to "re-enable" them and provide a new API Key. Is this what you're referring to?

For what it's worth, I provided the following commit to @XavierM which exposes whether or not the encryption key was randomly generated kobelb@b77c5c9

@mikecote
Copy link
Contributor Author

mikecote commented Jan 31, 2020

After giving it some more thought and going through @kobelb's feedback, I'm thinking we need to solve a few different scenarios while trying to warn the user as early as possible of potential consequences.

Not all of this can or should be done for 7.6 but creating GitHub issues for the agreed upon solutions will be a start.

Scenario 1: Encryption key is generated and user does a CRUD on an alert or action

This is where we should do as much as we can in UX to avoid the user getting into scenario 2, 3 or 4.

Some options where each require some changes to the ESO plugin:

1. Show a warning message in the UI

If they get past it, they can fall into scenario 2, 3 or 4 but at least they have been warned within alerting.

2. Prevent the user from doing a CRUD at the alert API level with some UI / UX

This prevents users getting into scenario 2, 3 or 4 from a generated encryption key. They would either have to change it themselves or not synchronize the keys between their deployments to get to scenario 2, 3 or 4.

The UX for this would be to also show a warning from option 1 but prevent the user from continuing.

3. Prevent CRUD on ESO when no encryption key is provided

This has the same notes as option 2 but implemented at a lower level (ESO plugin instead of alerting plugin).

Scenario 2: Encryption key has changed and user does a CRUD on an alert or action

The problem we currently have at this layer is in some APIs (update, update API key, delete) load the decrypted alert before doing anything else. As soon as the decryption fails, the entire API fails.

Some options to solve this:

1. Isolate background activity

The only reason we're loading the decrypted saved object is because we need to invalidate the API key. We could push this within try/catches to prevent the request from failing due to this background cleanup activity.

In some scenarios we still need to load the alert saved object before continuing. In this scenario, we would have two loads on the saved object.

Scenario 3: Encryption key has changed and alert is running

Alerts will not be able to run until users manually fix them by calling an alert API to generate a new API key.

Some options to solve this:

1. UX and make sure retry logic is solid

The only option I can think of to solve this is to make sure the alerts recover after a user fixes the objects with broken encryption. The UX for this would be within the management screen, having a list of alerts with a status column showing "Error" for alerts that fail to run. The users would then be able to run the alert immediately after update or we could do it for them.

Scenario 4: Encryption key has changed and action is running

Actions are encrypted saved objects with secrets being an encrypted attribute. We also queue the execution of an action within a action_task_params ESO. It also stores the API key to execute the action with. For the action saved objects, they will have to be re-set by the user but for action_task_params, hmm? There is also this problem that most actions don't have retry logic, they try once and that is it.

Some options to solve this:

1. UX and make sure retry logic is solid

Since we can capture the failure of decrypting the action saved object, we could enforce re-attempts so it works when the user fixes the secrets (this could be complex).

I'm still not sure how we solve the action_task_params objects. The only UX I could think of is a task manager UI or activity log as the only way to let the users know pending action executions failed / are failing.

@FrankHassanabad
Copy link
Contributor

I think data loss is meant generically here @kobelb . For example, if someone was using ESO to encrypt something such as PII data and they ended up with a random generated key, everything will work up to the point where they restart and then they will have to figure out how to re-enter all the data again.

We don't use it for that at the moment, just for the API keys to be encrypted.

But I can imagine that since most SIEM data eventually contains comments and information that could be of a PII or sensitive nature we might end up with a requirement later to encrypt more fields of timeline data, case data, etc ... We don't though at the moment.

@pmuellr
Copy link
Member

pmuellr commented Jan 31, 2020

Ya, re: data loss, my read is the same as Frank's. I think the only example we have today (beyond API keys) is action secrets. For the email action, that would be the userid / password of the mail service they're using. Our pagerduty, servicenow, slack and webhook actions all have secrets as well. When the encryption key changes, customers will lose access to these, and have to re-enter them to allow them to run again.

@kobelb
Copy link
Contributor

kobelb commented Jan 31, 2020

Gotcha, thanks for the clarification regarding data-loss @FrankHassanabad and @pmuellr, I forgot that we also had secrets for third-party services in the actions.

@peterschretlen
Copy link
Contributor

peterschretlen commented Jan 31, 2020

We can try to warn & discourage in the UI, the problem is we'll have a lot of alerting & actions consumers (i.e. many UIs to display warnings in), and UI is not the only way to create alerts. We will have API users too for SREs and devops use cases and likely others. The warnings need to get to all of these places.

So I think we:

  1. Fail at the alert/action API level (or ESO level) when we create or edit these saved objects. ( 1.3 that @mikecote outlined ) if a key is unspecified.
  2. OR use a default encryption key, for example xpack_encryptedSavedObjects_default_encryptionKey. And we can display warnings at the API level that a default key is being used and this is not recommended. For those providing a UI they can pass those warnings on to users. Basically the same as our dev experience, but not using 'a'.repeat(32) as a default value.

looks like beats management uses the approach of a default key:

I favor a default encryption key. It is not ideal, but neither is the frustration of being set up to fail because of a missing config. Failing before you create an alert is only slightly less frustrating than failing after a restart of Kibana.

For users experimenting in a non-prod environment, a default key allows quick setup that will work across restarts. We'd need to add warnings to our API responses that can be displayed in the UI, as well as to our logs every time the default key is used, and make it obvious and annoying enough that it will be hard to ignore.

@peterschretlen
Copy link
Contributor

I think we have 2 other problems too:

  1. We don't provide any way to swap an encryptionKey. Unlike reporting and session cookies which may cause a minor inconvenience if you have to rotate the key, with ESOs you're stuck unless you're willing to throw away your alerts and actions. I think we'll need some way to migrate data from using one key to another. The answer can't be "sorry you need to set up all your alerts and actions again"

  2. We have no docs on xpack.encrypted_saved_objects.encryptionKey. We should probably add a "Alerts and actions settings" section under https://www.elastic.co/guide/en/kibana/current/settings.html? Or have somewhere people can google "xpack.encrypted_saved_objects.encryptionKey" and get an explanation.

@jportner
Copy link
Contributor

  1. We have no docs on xpack.encrypted_saved_objects.encryptionKey. We should probably add a "Alerts and actions settings" section under https://www.elastic.co/guide/en/kibana/current/settings.html? Or have somewhere people can google "xpack.encrypted_saved_objects.encryptionKey" and get an explanation.

There was an issue opened for that just today: #55380
CC @gchaps

@peterschretlen
Copy link
Contributor

  1. We don't provide any way to swap an encryptionKey.

I'll open this as a separate issue, we won't be able to address it here. Going back to the original encrypted saved objects RFC, key rotation was briefly discussed and could be used as a starting point: #33740 (comment)

  • having a primary key, and list of secondary keys (used for decryption only)
  • having a privileged API that can re-encrypt existing data ( decrypt only objects using secondary keys, and re-encrypt with the primary key)

@mikecote
Copy link
Contributor Author

I'm going to close this issue. With the variable usingEphemeralEncryptionKey from the ESO plugin, this is sufficient for us to disable alerting and actions APIs in such scenarios. An issue is also opened for key rotation (#56889) and we will go over other scenarios individually.

Thanks for everyone's input 🙏

@kobelb kobelb added the needs-team Issues missing a team label label Jan 31, 2022
@botelastic botelastic bot removed the needs-team Issues missing a team label label Jan 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) Team:Security Team focused on: Auth, Users, Roles, Spaces, Audit Logging, and more!
Projects
None yet
Development

No branches or pull requests

7 participants