-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Measure impact of enabling UA on upgrade failures #148765
Comments
Pinging @elastic/kibana-core (Team:Core) |
I validated this by looking at the service constructor logs for upgrades with Looking at the code Kibana assumes both conditions have a "critical" level but this is only true for the disk space watermarks, the shard limit creates a "warning" level deprecation https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/deprecation/src/test/java/org/elasticsearch/xpack/deprecation/ClusterDeprecationChecksTests.java#L76 We should fix this bug by adopting the new shard limit indicator in the health API #153051 and repeat the validation. |
I've analysed upgrades from 8.10 to 8.x in the last 3 months. We have not had any If upgrades fail due to insufficient disk space migrations would fail because of an unavailable shards error, but there could be other causes of this too. We continue to see many unavailable shards errors but in all analysed failures these were from indices that existed before the upgrade. So e.g. a user upgrades from 8.10 to 8.14 and we re-use the It's quite hard to establish the cause of an unassigned shard once it has been resolved, but in none of the cases I analysed ES reported high disk watermark warnings. So I'm reasonably confident that we achieved that outcome too. |
Issue scope:
Determining success metrics: How do we know we’ve achieved the outcomes here?
** Using Stack Telemetry, I created https://telemetry-v2-staging.elastic.dev/s/kibana-core/app/dashboards#/view/5ebf55d0-64d6-11ed-b77a-bd29ecb21612?_g=(filters%3A!()%2CrefreshInterval%3A(pause%3A!t%2Cvalue%3A0)%2Ctime%3A(from%3A'2022-03-31T22%3A00%3A00.000Z'%2Cto%3Anow) on staging with the data available.
Assess the impact on 8.6+ upgrades: create dashboards + show results
Add missing measurements: EBT + snapshot telemetry
Recommendations and follow-ups: Whats next?
The text was updated successfully, but these errors were encountered: