-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Kibana logging for OOM crashes #109602
Comments
Pinging @elastic/kibana-core (Team:Core) |
In this specific case it seems like the first batch is already causing the OOM, but when there's many batches memory could also keep on growing because we add #109540 will reduce the memory consumption by redacting the |
Confirmed that Node OOM logs are present in the docker logs, but not being indexed on ESS |
I don't think there's any trivial way to catch an OOM, as those are non-recoverable errors that don't simply bubble up. Looking at https://github.com/blueconic/node-oom-heapdump's implementation, it requires either c bindings, or monitoring the GC. Either way, it may be very tedious to be able to plug it into our logging system. |
Yeah, I don't think we can intercept it and log it, but these are logged to |
I assigned @rudolf as discussed yesterday since he is in contact with Cloud to ingest these logs. |
This has been fixed upstream (downstream?) in https://github.com/elastic/cloud/issues/88114 |
Summary
Given: an ESS instance with 1Gb RAM migrates from
v7.13.2
tov7.14.0
.Docker image crashes without any logs:
According to the logs, Kibana started SO migration and suddenly bootstrapped again:
The deployment has been re-configured to run a Kibana instance with 2Gb RAM. After that, the migration failed with
[undefined]: Response Error
message. This might be an indicator of413 payload too large
error we investigated in #107288The migration has been fixed by reducing
savedObjects.batchSize
from1000
to200
.According to the proxy logs, some of the responses of
/_search
ES endpoint during migration were 400Mb. This might have caused Kibana (or a Docker container) with 1Gb RAM to fail with the OOM exception. Note,max_old_space_size
for nodejs run on ESS instance with 1Gb RAM is set to 800MB.Impact and Concerns
We need to investigate how Kibana behaves in the case of the OOM exception in the ESS environment to improve error logging. Otherwise, it's extremely hard for users to investigate such problems without any actionable logs.
Acceptance criteria
Users can get feedback from Kibana whenever it fails in the ESS environment due to the OOM problems.
Users can find recommendations (in the logs or documentation) about alleviating the OOM problem.
The text was updated successfully, but these errors were encountered: