-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
7.11.0/7.12.0 upgrade migrations take very long to complete or timeout due to huge number of saved objects #91869
Comments
Pinging @elastic/fleet (Team:Fleet) |
@nchaulet How often are these fleet-agent-events created? What actions triggers these? |
Every time an agent checkin with a new status change, that could be a lot. |
We should try the migration from 7.11 to 7.12 to see if v2 migrations are fixing this or not. |
I had a support case open for this (7.10 to 7.11 migration failing) before finding this thread. Even with batch size set to 1000 it took several hours for the migration to complete. I was assuming it was hanging after waiting an hour then cleaning up the migration indexes and restarting, but just waiting it out seemed to be the answer. Our kibana upgrade/migrations had never taken longer than a few minutes but this is the first upgrade since we had a small fleet deployment in the environment. |
@reighnman Can you share the output of:
(if you have already deleted fleet-agent-events and finished the upgrade, you would have to run the aggregation over an older index like If you have enough |
This is with the agent running on 8 servers for about a month using the windows integration (7.10). $result.aggregations.saved_object_type.buckets
Looking at .kibana_5 and the current .kibana_6 post upgrade the results are about the same. |
Issue
Some users have had migrations take really long to complete (> an hour) due to a huge number of
fleet-agent-events
,action_task_params
ortasks
documents (> 100k documents). To check how many of these documents you have run the following aggregation:fleet-agent-events
These saved objects are no longer used by the fleet plugin and can safely be deleted.action_task_params
This could potentially cause scheduled tasks that have not yet run to fail once.task
Before deleting failed tasks it's useful to understand which actions might be triggering the high number of failed tasks by running:
The text was updated successfully, but these errors were encountered: