Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log errors when polling for the ES version #97787

Closed
afharo opened this issue Apr 21, 2021 · 3 comments · Fixed by #100005
Closed

Log errors when polling for the ES version #97787

afharo opened this issue Apr 21, 2021 · 3 comments · Fixed by #100005
Labels
discuss enhancement New value added to drive a business result Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc

Comments

@afharo
Copy link
Member

afharo commented Apr 21, 2021

In pollEsNodesVersion, we are requesting the version every healthCheckInterval (defaults to 2.5s).

I think we should log when ES returns an error in this piece of logic. This validation can halt Kibana when an error connecting to ES occurs (both, during startup it holds SavedObjects migrations from happening, holding the rest of Kibana startup process, and during normal operations because the status will go RED and many services depending on it will fail). I'd say it'll be helpful if we log.error these to help with the troubleshooting.

What do you think?

Scope:

  • Add logging to ES errors before the first successful connection in the ES status check
  • Logging should be throttled (every 5 minutes) as to not cause very noisy logs
@afharo afharo added discuss Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc triage_needed labels Apr 21, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@pgayvallet
Copy link
Contributor

pgayvallet commented Apr 22, 2021

This means that, if the responses from ES take longer than that, we'll pile up requests, potentially killing Kibana due to an OOM.

I'm not sure if that's true, as we're using an exhaustMap operator which is supposed to ignore further projections until the current projection is completed (but I'm no rxjs expert)

exhaustMap
Returns an Observable that emits items based on applying a function that you supply to each item emitted by the source Observable, where that function returns an (so-called “inner”) Observable. When it projects a source value to an Observable, the output Observable begins emitting the items emitted by that projected Observable. However, exhaustMap ignores every new projected Observable if the previous projected Observable has not yet completed. Once that one completes, it will accept and flatten the next projected Observable and repeat this process.

return timer(0, healthCheckInterval).pipe(
exhaustMap(() => {
return from(
internalClient.nodes.info<NodesInfo>({
filter_path: ['nodes.*.version', 'nodes.*.http.publish_address', 'nodes.*.ip'],
})
).pipe(
map(({ body }) => body),
catchError((_err) => {
return of({ nodes: {} });
})
);
}),
map((nodesInfo: NodesInfo) =>
mapNodesVersionCompatibility(nodesInfo, kibanaVersion, ignoreVersionMismatch)
),
distinctUntilChanged(compareNodes) // Only emit if there are new nodes or versions
);

@afharo
Copy link
Member Author

afharo commented Apr 22, 2021

You're right! The exhaustMap holds any other emissions, so we're not accumulating requests.

However, I ran a local test with the healthCheckInterval set to 100ms and mocking ES to hold the requests for 10x that, and while I don't see new requests coming in, the memory grows (and releases after timeouts). Probably the timer's emitted values are enqueued? 🤷

In any case, I think the log entry would be much welcomed. I'll update the description. Thanks for pointing that out @pgayvallet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss enhancement New value added to drive a business result Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
Projects
None yet
4 participants