-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fleet] Agent Policy should have an option to automatically unenroll INACTIVE agents #179399
Comments
Pinging @elastic/fleet (Team:Fleet) |
Pinging @elastic/fleet (Feature:Fleet) |
@nimarezainia thanks for filing this. I think the old "unenroll timeout" feature worked better for ephemeral agents, while the newer "inactivity timeout" feature works substantially better for more permanent agents (e.g. employee laptops). I think overall providing some opt-in value for automatic unenrollment on certain policies is the way to go. I don't think having a hardcoded 24h cutoff is the way to go, though. I think we should essentially replicate the old unenrollment timeout value as a per-policy setting and it should be provided as a number in seconds. We still have the UI for the policy-level unenrollment setting, but it's deprecated and ignored unless Fleet Server is running on a version prior to 8.7.0, for context: I'd recommend adding a new timeout setting below "Inactivity timeout" that's described as the "Inactive agent unenrollment timeout", e.g.
We may also want to add some additional clarity or reword the deprecated "Unenrollment timeout" form field as well to avoid confusion. We should also include detail documentation for this value and its intended use. |
@kpollich Happy to have it as a configurable element - off by default. ++ to Documentation obviously. Will add as a task. |
We don't remove documents currently even for unenrolled agents. I think with the auto unenroll also, the documents should stay with unenrolled state. Also, I think we should remove the deprecated Unenrollment timeout field from the UI, it's not supported for a long time now, and it might cause confusion. Regarding the implementation, I think automatically moving agents from inactive to unenrolled state is not trivial, currently the manual unenrollment is an action, which would have to be triggered automatically, in order for the action doc to be updated, and API keys revoked. |
We should allow for cancellation between when agents are going to be unenrolled and when we actually unenroll them |
Yes please.
This is actually a concern from the users, particularly in ephemeral environments. These documents could add up for INACTIVE agents and frankly take up space and that is what the customer is paying for. @juliaElastic is it possible to remove these documents for unenrolled agents? |
added this to the requirement. |
It is possible, though at least we should add some delay until deleting them for debug/visibility purposes. We could do something like move from inactive to unenrolled when the inactive unenrollment timeout is reached, and then have another task clean up unenrolled agents after some time. |
sure. I would say that the unenrolled agents should be cleaned up in this manner at all times. Modified the description to say this. |
@juliaElastic @nimarezainia
UI
EDIT - To be done in separate tickets
|
Should this be configurable maybe in the kibana config? it's kind of a breaking change |
Maybe we can move this to a separate ticket and discuss it there? I think it can be done separately anyway, this ticket it's already quite big. |
Good summary! I think only the cancellation behavior is not defined too well, maybe we should move that to a separate ticket as well? |
I agree, that's not too clear to me as well and it will require some investigation to define how it should be done. Also we don't have any UX for it yet. |
.fleet*
indices related to agents that were already unenrolled
#189506
Created two follow up tickets:
I'm also updating the ticket description to reflect the previous discussion. |
@criamico could this not be a text box accepting a timeout value. |
@nimarezainia here's a screenshot of how it will look: |
Would it make sense to put this behind a feature flag until the cancellation work in #189508 is done? I don't think it makes sense to allow users to opt into this behavior without some means of cancellation before agents are actually unenrolled. |
I'm not sure we need to have all the pieces of this puzzle together before the feature can be used. I'm also thinking about the cleaning up of the dot indices. The offline agents clog up the UI, so this timeout would help the user who is concerned with that. Cancellation is needed but I think can be treated as a follow on enhancement. |
Hi Team, We have created 05 testcases under Testmo for this feature under Fleet test suite at links:
Please let us know if any other scenario needs to be added from our end. Thanks! |
Hi Team, We have executed 05 testcases under the Feature test run for the 8.16.0 release at the link: Status: PASS: 05 Build details: As the testing is completed on this feature, we are marking this as QA:Validated. Please let us know if anything else is required from our end. |
In reference to the state machine in the docs HERE, we already have mechanisms by which an agent will go OFFLINE and then after a certain amount of time will be considered INACTIVE (user configurable) and removed from the default view Users can filter to the INACTIVE view to see these agents and determine whether further analysis is required.
In an INACTIVE state the API keys allocated to the agent are still valid, in case that agent becomes active again. After analysis, the user can "select all" and unenroll these agents. Where they are removed from view and all API keys are relinquished.
In environments with ephemeral agents, where VMs/Containers are continuously provisioned and de-provisioned, this approach may lead to many agents in the INACTIVE state consuming API keys. In these environments we should provide the users an opt-in option to automatically unenroll and clean up the agents in the INACTIVE view.
Task list
unenroll
action if the timeout is reached. This should only apply to agents that are already in the INACTIVE state.Unenrollment timeout
field from the UI to avoid confusionFollow up
.fleet*
indices related to agents that were already unenrolled #189506Original description
The text was updated successfully, but these errors were encountered: