-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Ingest Manager] Improve agent unenrollment #67409
Comments
Pinging @elastic/ingest-management (Team:Ingest Management) |
Would like your feedback on this @blakerouse @michalpristas |
@nchaulet It is possible that the "graceful" and "force" is the same thing with different timeout value? |
@ph the way I see this the "force" unenrollment is the end of the gracefull one. Scenario 1 gracefull unenrollment: Scenario 2 force unenrollment (compromised tokens for example) |
Ok, I think I am OK with you with this. I the case of scenario 1, we do a clean shutdown from a specific action. For scenario 2, the only things the Agent will receive in that case is a 401, so we probably need a mix of "retries X maybe it's transient with a badly configured proxy" after X retries we put the agent halt mode, we uninstall endpoint and we try to reconnect? But concerning scenario 1, it is possible that we never receive an ack from the agent, so we should also have a "timeout" period to invalidate the key. |
Yes we should have a timeout, or we can have a new status for the agent, and allow users to force unenroll if the gracefull shutdown did not work |
I think we should have a state machine on the Agent with a defined transition for the states. But I think the behavior should be automatic, let's assume that you gracefully unenroll agent and leave Kibana. It's possible that you never come back and its possible that you can't find the agent that you have "gracefully unenrolled" I think it's fair to expect the end result is to have the key invalidated. Also maybe force unenroll is possible while in the unenrolling state, this is probably a bad work to describe it? |
side note on graceful period: we probably need to make that visible as this means something went wrong with the agent-fleet communication. we might have an agent with removed processes and failed ack or we might have an agent with failed uninstall still running processes failed to report error (and this error will never be reported because token is revoked) for this we need to make sure that admin has the information that this agent is misbehaving and needs manual resolution |
I like the idea here @michalpristas, not sure we can implement it for 7.9. This is something we should find a way to expose in the UI. This also expose the need to have a better defined "state" for the agent, a formal state machine. I think we could link it with the work that has been done by @blakerouse by adding degraded. |
i'm with you on state machine @ph |
Description
Work in progress
Currently the agent unenrollment is done with the following:
We can improve this process to ensure the unenrollment worked correctly with a gracefull unenrollment.
We should also provide a way to have a immediate unenrollment, that invalidate API keys, without gracefull shutdown.
Possible implementations
We can send a new action with a new ACTION type
UNENROLL
the agent can do all the thing he need (uninstalling endpoint, sending last events) then when he ack the action we will invalidate the API keys and change the agent status.We probably need to have a background job that after a defined amount of time, clean agent that did not ack the
UNENROLL
action. (It's possible to do this in Kibana?)I suggest we introduce a new action in fleet UNENROLL ?
We probably want a mean to force unenrolled an agent, (invalidating API keys directly, without sending the action to the agent)
The text was updated successfully, but these errors were encountered: