Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for /api/status before Kibana completes startup #79012

Merged
merged 29 commits into from
Apr 26, 2021

Conversation

joshdover
Copy link
Contributor

@joshdover joshdover commented Sep 30, 2020

Summary

  • Add the /api/status endpoint to the "not ready server" that is served before Kibana starts
  • Makes this endpoint return a 503 until all services are degraded or available
  • Expose detailed information on the status API when migrations are waiting for other nodes to complete:
  {
    "id": "core:[email protected]",
    "message": "SavedObjects service is waiting for other nodes to complete the migration If no other Kibana instance is attempting migrations, you can get past this message by deleting index .kibana_2 and restarting Kibana.",
    "since": "2020-09-30T19:30:43.268Z",
    "state": "red",
    "icon": "danger",
    "uiColor": "danger"
  },

Checklist

Delete any items that are not applicable to this PR.

For maintainers

@joshdover
Copy link
Contributor Author

I haven't updated any unit tests yet, so this is sure to break some things right now.

src/core/server/http/http_service.ts Outdated Show resolved Hide resolved
src/core/server/http/http_service.ts Outdated Show resolved Hide resolved
src/core/server/http/http_server.ts Outdated Show resolved Hide resolved
src/core/server/http/http_server.ts Outdated Show resolved Hide resolved
this.notReadyServer.route({
this.notReadyServer = new HttpServer(this.logger, 'NotReady');
const notReadySetup = await this.notReadyServer.setup(config);
notReadySetup.server.route({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use newly added API instead of hapi one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to do that, but since we don't support some of the HTTP methods right now, I left it like this so that it covers any request.

src/core/server/status/legacy_status.ts Outdated Show resolved Hide resolved
src/core/server/status/status_service.ts Outdated Show resolved Hide resolved
@legrego
Copy link
Member

legrego commented Jan 26, 2021

@joshdover if you want to bring this back up-to-date with master, I can pull this down and try to work through some of the test failures that we discussed today.

@joshdover
Copy link
Contributor Author

@legrego Sure thing, I should be able to get to this tomorrow, if that's alright. I'm also thinking that we may need to do some refactoring for #89287. For one, this should probably be called something different than the "notReadyServer", maybe "initializationServer"?.

I also think we may need to formalize an API on the InternalHttpServiceSetup for the interactive setup process. I think it will be best if we don't initialize any plugins before we configure ES. This would make it more foolproof that we don't break any plugin code. Note that plugins are constructed during Server.setup and not during Server.prototype.constructor, so we should be able to block on the interactive setup before we call pluginsServer.setup.

@legrego
Copy link
Member

legrego commented Apr 7, 2021

@elasticmachine merge upstream

@legrego
Copy link
Member

legrego commented Apr 8, 2021

@joshdover I brought this PR back up to date with master, started addressing the first round of feedback from @mshustov, and fixed the test failures we discussed.

In short, there were three main reasons for the test failures:

  1. The bypassErrorFormatting was being applied too broadly. I updated the options for the custom response handler to allow consumers to decide if error formatting should be bypassed: 78cd007 (#79012)
  2. The status endpoint returns a 503 now if any service is unavailable. This caused the KbnClient to fail in those cases, so I introduced a mechanism for client callers to indicate that specific error codes were acceptable: e7f355b (#79012)
  3. Now that the NotReady server is a proper HttpServer instance, it ends up logging a message indicating that it has started up. This message was being improperly interpreted by the FTR, so the FTR thought that the real slim shady Kibana server was up and running. I updated this check to test for the Kibana server specifically: 8791a24 (#79012)

@legrego legrego added release_note:skip Skip the PR/issue when compiling release notes Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc v7.14.0 v8.0.0 labels Apr 22, 2021
@legrego legrego marked this pull request as ready for review April 22, 2021 15:48
@legrego legrego requested review from a team as code owners April 22, 2021 15:48
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

Copy link
Member

@jbudz jbudz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any expectation for supporting basic auth (from the browser) before startup completes? The login page is unavailable, and api/status will return a 401.

Have we put any thought into whether this is a breaking change, and documenting best practices for status monitoring? I'm okay with addressing this separately.

Changes LGTM otherwise.

@legrego
Copy link
Member

legrego commented Apr 22, 2021

Is there any expectation for supporting basic auth (from the browser) before startup completes? The login page is unavailable, and api/status will return a 401.

@jbudz great question -- the NotReady server can't support basic auth at this point, unless we were to hardcode credentials into kibana.yml or similar to use. It does make me wonder if we should only register the /api/status route on the NotReady server if Kibana has been configured to explicitly allow anonymous access to the status endpoint. @joshdover what are your thoughts on that?

Have we put any thought into whether this is a breaking change, and documenting best practices for status monitoring? I'm okay with addressing this separately.

I personally hadn't considered it, so thank you for raising this. Our docs only make a passing reference to /api/status, so in that regard I wouldn't consider this breaking.

That said, I don't know how other stack components consume this endpoint

@joshdover
Copy link
Contributor Author

It does make me wonder if we should only register the /api/status route on the NotReady server if Kibana has been configured to explicitly allow anonymous access to the status endpoint. @joshdover what are your thoughts on that?

Yeah that would make some sense, though it doesn't feel great to have this sometimes available. We'd likely replace this with a more limited health check endpoint that's safe to be public at some point. (#46984).

@legrego
Copy link
Member

legrego commented Apr 23, 2021

@elasticmachine merge upstream

@legrego
Copy link
Member

legrego commented Apr 23, 2021

@mshustov would you mind reviewing one last time to get the requested codeowner approval from kibana-core?

@legrego
Copy link
Member

legrego commented Apr 26, 2021

@elasticmachine merge upstream

@mshustov
Copy link
Contributor

@legrego I don't know why GH didn't count my approval #79012 (review) (probably because I did it on the draft state). Approved again.

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

Unknown metric groups

API count

id before after diff
core 2180 2182 +2

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@legrego legrego merged commit 48523e5 into elastic:master Apr 26, 2021
kibanamachine added a commit to kibanamachine/kibana that referenced this pull request Apr 26, 2021
@kibanamachine
Copy link
Contributor

💚 Backport successful

Status Branch Result
7.x

This backport PR will be merged automatically after passing CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Deprecated - use backport:version if exact versions are needed release_note:skip Skip the PR/issue when compiling release notes Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc v7.14.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants