Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Journal logging doesn't reconnect if journald gets restarted #233

Closed
krobertson opened this issue May 26, 2017 · 2 comments · Fixed by #279
Closed

Journal logging doesn't reconnect if journald gets restarted #233

krobertson opened this issue May 26, 2017 · 2 comments · Fixed by #279

Comments

@krobertson
Copy link

We encountered this issue today twice on a production system. We haven't yet identified the root cause, however this appeared to be an ancillary issue.

The host was running out of memory, likely from journald. It attempted to compress and rotate logs, but was unable to allocate the memory. Journald died but was restarted. However' we're also using the docker journald log driver along with journald's syslog transport. After the restart, all docker containers failed to continue logging with EPIPE errors to stdout/stderr.

In looking at the journal code, it connects to journald on init, however doesn't have any error handling where it might reconnect if journald was restarted.

The log output we had was:

May 26th 2017, 05:34:03.204    System journal (/var/log/journal/) is 1.7G, max 2.6G, 931.8M free.
May 26th 2017, 05:34:04.220    Failed to initialize XZ encoder: code 5
May 26th 2017, 05:34:04.231    systemd-journald.service: Main process exited, code=dumped, status=6/ABRT
May 26th 2017, 05:34:04.232    Failed to compress (unnamed temporary file): Invalid argument
May 26th 2017, 05:34:04.242    Detected coredump of the journal daemon or PID 1, diverted to /var/lib/systemd/coredump/core.systemd-journal.0.9b634f7d87464833a67ca9124f25ab86.14979.1495802022000000.
May 26th 2017, 05:34:04.242    systemd-journald.service: Unit entered failed state.
May 26th 2017, 05:34:04.245    systemd-journald.service: Service has no hold-off time, scheduling restart.
May 26th 2017, 05:34:04.245    systemd-journald.service: Failed with result 'core-dump'.
May 26th 2017, 05:34:04.259    Stopped Flush Journal to Persistent Storage.
May 26th 2017, 05:34:04.259    Stopped Journal Service.
May 26th 2017, 05:34:04.259    Stopping Flush Journal to Persistent Storage...
May 26th 2017, 05:34:04.268    Starting Journal Service...

After that, we only get huge error spikes to New Relic with EPIPE.

@lucab
Copy link
Contributor

lucab commented May 29, 2017

See #218 for reference.

@ssgreg
Copy link

ssgreg commented Aug 30, 2017

Guys, please check the implementation: https://github.com/ssgreg/journald

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants