Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix delayed status API updates in alerting and task_manager #101778

Merged
merged 3 commits into from
Jun 14, 2021

Conversation

joshdover
Copy link
Contributor

@joshdover joshdover commented Jun 9, 2021

Summary

Minimal fix for #101049 (for safe backporting)
Holistic fix will be added by finishing #77965

The Alerting plugin's status Observable had two issues:

  • The custom status was being set after the setup lifecycle
  • The first status emission was delayed by 5 minutes which would cause the entire dependency tree status updates to be delayed.

This PR fixes the alerting and task_manager status Observables to be registered during setup and ensures that there is an initial status emitted (unavailable) while also making sure the first health check isn't delayed by 5 minutes.

Release note

An issue was resolved where Kibana's status API was reporting as 503 Service Unavailable due to a slow status check in Kibana Alerting.

Checklist

Delete any items that are not applicable to this PR.

For maintainers

@joshdover joshdover added release_note:fix Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc v7.13.2 v7.14.0 v8.0.0 labels Jun 9, 2021
@joshdover joshdover changed the title Fix delayed status API updates Fix delayed status API updates in alerting and task_manager Jun 12, 2021
@joshdover joshdover force-pushed the fx-status-delayed branch from 4f67d14 to 07e00a7 Compare June 12, 2021 11:23
Comment on lines +21 to +29
// Wait for status to become green
let status;
const start = Date.now();
do {
const resp = await supertest.get('/api/status');
status = resp.status;
// Stop polling once status stabilizes OR once 40s has passed
} while (status !== 200 && Date.now() - start < 40_000);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of logic is going to be more and more necessary in some tests. We should either:

  1. Create a centralized helper service for this like kibana.waitForAvailable(); OR
  2. Change how @kbn/test waits for Kibana to start. Currently it waits for a log message that the the server is ready. We could either add logic for polling the API or add a log message for when Kibana actually turns to available ("green") for the first time.

Copy link
Contributor

@mshustov mshustov Jun 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could either add logic for polling the API or add a log message for when Kibana actually turns to available ("green") for the first time.

IIRC @pgayvallet has encountered this problem before. do you remember if there were any problems with switching to this logic?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could either add logic for polling the API or add a log message for when Kibana actually turns to available ("green") for the first time

This is what I tried in #92568. Got reverted because it added a very significant time to the FTR CI tests. But maybe now that this PR fixes these delays, it would be better?

Copy link
Contributor Author

@joshdover joshdover Jun 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll explore this as a follow up and see how much it increases test time. I want to keep this PR clean enough to backport to 7.13.3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened a new PR for this along with an additional 5s improvement on the initial available: #102108

@joshdover joshdover marked this pull request as ready for review June 12, 2021 14:51
@joshdover joshdover requested a review from a team as a code owner June 12, 2021 14:51
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@joshdover joshdover force-pushed the fx-status-delayed branch from b2f624b to aeb03ad Compare June 14, 2021 10:52
@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

  • 💔 Build #130768 failed b2f624b739bf7f5f9706e75de21d4a0d974ab809
  • 💔 Build #130758 failed 07e00a71840b3d36dde1c99974fc15b4da029799
  • 💔 Build #130157 failed 4f67d14eeaa17212fa60ca8f9a2ce02897b033fa

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@joshdover joshdover added the auto-backport Deprecated - use backport:version if exact versions are needed label Jun 14, 2021
@joshdover joshdover enabled auto-merge (squash) June 14, 2021 13:55
Copy link
Contributor

@ymao1 ymao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for fixing!

@kibanamachine
Copy link
Contributor

💚 Backport successful

Status Branch Result
7.13
7.x

The backport PRs will be merged automatically after passing CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Deprecated - use backport:version if exact versions are needed release_note:fix Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc v7.13.3 v7.14.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants