Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advanced API Key invalidation #53868

Closed
mikecote opened this issue Dec 31, 2019 · 11 comments · Fixed by #82211
Closed

Advanced API Key invalidation #53868

mikecote opened this issue Dec 31, 2019 · 11 comments · Fixed by #82211
Assignees
Labels
discuss Feature:Alerting R&D Research and development ticket (not meant to produce code, but to make a decision) Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)

Comments

@mikecote
Copy link
Contributor

Once the simple invalidation of API keys is resolved (#45144), there will still be some cases that errors may happen and should be solved in this issue.

The extra cases are:

  • If the underlying alert task is running while the API key is invalidated, the task may throw an error if it needed that key (authentication failure, invalid key, etc)
  • The same applies to actions if they are running while the API key is invalidated
  • If actions are scheduled but not executed before the API key is invalidated, the action will throw an error if it depended on that key
  • If an action is in a retry state and depended on that key, it will keep failing
@mikecote mikecote added Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Dec 31, 2019
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

@mikecote
Copy link
Contributor Author

mikecote commented Apr 22, 2020

Some quick notes:

To solve this, we may have to create an API key from the alert's API key (similar to a token) and delete it after the alert / action finished execution. We may also have to create a new API key for each action scheduled from an alert to know that when the API key is deleted, we're really done with it.

The mechanism to create an API key from the alert's API key may have to be re-attempted (with a max retires) in the scenario the API key got invalidated between reading the alert and creating a sub-API key. This scenario is pretty common in the API integration tests and we've had to workaround this a lot.

@mikecote
Copy link
Contributor Author

mikecote commented May 5, 2020

Turns out using API keys to create sub API keys doesn't work when using Kibana to create them on behalf of other users. This leaves a few alternatives worth thinking about but I believe we may hold off until this becomes a more severe issue.

The best scenario to describe the issue is when an alert or action is taking a while to run and the alert gets updated, the API key no longer works half way through.

1. Queue the API key for invalidation

The way I'm thinking, it would involve the following changes from a high level:

  • Store the API key id decrypted and key encrypted in action_task_params instead of currently being stored together and encrypted
  • Alerting plugin to define a type of saved object used to queue invalidation of API keys (with API key id (unencrypted), alert id, etc)
  • Alerting to create an invalidation object for each API key that needs to be invalidated instead of invalidating it immediately
  • A single recurring task (or maybe one-per) to check on a regular interval for any API key ids that doesn't exist in action_task_params and the alert task is "idle".

This one will require a saved object migration on action_task_params or some extra code to handle dual scenarios. This doesn't require external dependencies and also solves all 4 scenarios in the description without having to create more API keys or other objects.

2. Look into allowing this scenario (API keys creating sub API keys)?

To allow API keys to create another API key. This would require external dependencies and research to see if this is worth pursuing.

3. Look into using a token service?

I'm not too familiar with it but it requires manage_token cluster privilege which Kibana probably doesn't have.

https://www.elastic.co/guide/en/elasticsearch/reference/master/token-authentication-services.html

4. Leave functionality as is

Probably the option for now until we need to solve it with a scenario like above or others. Will mark issue as blocked / discuss for now.

@kobelb
Copy link
Contributor

kobelb commented May 5, 2020

The best scenario to describe the issue is when an alert or action is taking a while to run and the alert gets updated, the API key no longer works half way through.

Does the API key no longer work because it was explicitly invalidated by the user? If so, we can only assume the intent is to prevent it from being used for authentication, so having it fail mid-way through processing the alert makes sense.

Users are able to invalidate API Keys using Elasticsearch APIs, and we can't prevent users from directly invalidating API Keys using those APIs and force them to be queued until they're no longer used.

If we tried to circumvent this by cloning the API Key, the clone could just be invalidated also.

@mikecote
Copy link
Contributor Author

mikecote commented May 5, 2020

Does the API key no longer work because it was explicitly invalidated by the user?

That is definitely one use case which we won't be able to protect ourselves from as you mention.

The other scenario is more when users are updating alerts. We immediately create a new API key and invalidate the old one (here: https://github.com/elastic/kibana/blob/master/x-pack/plugins/alerting/server/alerts_client.ts#L300). If the alert was running or had pending / running actions, those are all possible scenarios that could fail due to referencing a now invalidated API key from an update.

The alert would recover from this scenario at the next execution / interval. Some actions may fail to run and not re-try due to always having a reference to an old API key.

@kobelb
Copy link
Contributor

kobelb commented May 5, 2020

The other scenario is more when users are updating alerts. We immediately create a new API key and invalidate the old one (here: https://github.com/elastic/kibana/blob/master/x-pack/plugins/alerting/server/alerts_client.ts#L300). If the alert was running or had pending / running actions, those are all possible scenarios that could fail due to referencing a now invalidated API key from an update.

Gotcha, that makes sense. Thanks for explaining. 🤔

@kobelb
Copy link
Contributor

kobelb commented May 5, 2020

  1. Queue the API key for invalidation

This is an interesting proposal. It seems somewhat complicated, but seems feasible.

  1. Look into allowing this scenario (API keys creating sub API keys)?

I've proposed adding the ability to allow API Keys to optionally clone themselves to enable other features in the past. We ended up not going this approach, but it's worth exploring.

  1. Look into using a token service?

The Get token API is primarily consumed by Kibana's auth providers internally to support features like SAML, OIDC, Kerberos, etc. Unfortunately, it's not possible to have an APIKey and use this to get a token.

  1. Prevent the user from updating Alerts while they're running

This is somewhat limiting, and it could be a potentially frustrating UX. However, we could use optimistic concurrency control to make sure that we can update the API Key which will be used before invalidating it, to ensure it didn't start running during the process.

@mikecote
Copy link
Contributor Author

  1. Prevent the user from updating Alerts while they're running

This is somewhat limiting, and it could be a potentially frustrating UX. However, we could use optimistic concurrency control to make sure that we can update the API Key which will be used before invalidating it, to ensure it didn't start running during the process.

This sounds interesting. How would the OCC work, would it be with the task in Task Manager?

@mikecote
Copy link
Contributor Author

mikecote commented Jul 9, 2020

I created an issue within Elasticsearch repo to explore option 2.

elastic/elasticsearch#59304

@mikecote mikecote mentioned this issue Aug 11, 2020
36 tasks
@mikecote
Copy link
Contributor Author

mikecote commented Sep 1, 2020

I created an issue within Elasticsearch repo to explore option 2.

elastic/elasticsearch#59304

We'll have to come up with a different approach based on timing. I'm thinking of a new option where we just delete the API key at a future time by using task manager. The API key ID for deletion can be stored in a saved object and either a single task manager task or one-per API key would execute the cleanup.

If ever something was to fail after this fixed deletion time, we could consider this the same scenario as a user manually deleting the API key and leave the functionality as is for now.

@mikecote mikecote added R&D Research and development ticket (not meant to produce code, but to make a decision) and removed blocked labels Sep 8, 2020
@YulNaumenko YulNaumenko self-assigned this Sep 14, 2020
@YulNaumenko
Copy link
Contributor

After discussion we decided to investigate the two approaches:

  1. use SO for saving the API key IDs that should be deleted and create a configuration option where can set an execution interval for a TM task which will get the data from this SO and remove marked for delete keys.
  2. Create a task for each API key which should delete it after the configurable awaiting time will be passed.
    As a result come with pros and cons for both.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Feature:Alerting R&D Research and development ticket (not meant to produce code, but to make a decision) Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)
Projects
None yet
4 participants