CPU usage on dockerd 100% #246

runningman84 · 2016-12-13T21:34:54Z

If I start logspout-logstash using this docker-compose the cpu usage spikes to 100% and stays there...

  logspout:
    image: bekt/logspout-logstash
    mem_limit: 256m
    restart: on-failure
    depends_on:
      - logstash
    environment:
      ROUTE_URIS: logstash://logstash:5000
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock

before starting logspout

 7015 root      20   0 1423204  80956  29440 S 106.3  1.3   1:25.72 dockerd

after starting logspout

 7015 root      20   0 1423204  80972  29440 S   0.3  1.3   1:58.57 dockerd

The text was updated successfully, but these errors were encountered:

garceri · 2017-01-26T17:38:55Z

Experiencing the same issue here:

$logspout --version
v3.2-dev-custom

$docker version
Client:
Version: 1.12.3
API version: 1.24
Go version: go1.6.3
Git commit: 6b644ec
Built: Wed Oct 26 23:26:11 2016
OS/Arch: linux/amd64

Server:
Version: 1.12.3
API version: 1.24
Go version: go1.6.3
Git commit: 6b644ec
Built: Wed Oct 26 23:26:11 2016
OS/Arch: linux/amd64

[root@ip-XXXXXXX rancher]# docker info
Containers: 56
Running: 44
Paused: 0
Stopped: 12
Images: 89
Server Version: 1.12.3
Storage Driver: overlay
Backing Filesystem: extfs
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local rancher-nfs
Network: null bridge overlay host
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: seccomp
Kernel Version: 4.4.24-rancher
Operating System: RancherOS v0.7.1
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 14.94 GiB
Name: ip-XXXXXXXXXXX.ec2.internal
ID: DVSV:NLW5:VD5U:UOYY:FIOB:4RTH:7NGQ:GIBO:KKTR:VPYO:J6A2:4SZE
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Insecure Registries:
127.0.0.0/8

Logspout container logs with DEBUG enabled show a constant stream of:

1/26/2017 12:37:44 PM2017/01/26 17:37:44 pump.pumpLogs(): 495069bafc93 stopped
1/26/2017 12:37:45 PM2017/01/26 17:37:45 pump.pumpLogs(): 495069bafc93 started
1/26/2017 12:37:45 PM2017/01/26 17:37:45 pump.pumpLogs(): 25635c95f804 stopped
1/26/2017 12:37:45 PM2017/01/26 17:37:45 pump.pumpLogs(): 25635c95f804 started
1/26/2017 12:37:46 PM2017/01/26 17:37:46 pump.pumpLogs(): 495069bafc93 stopped
1/26/2017 12:37:46 PM2017/01/26 17:37:46 pump.pumpLogs(): 495069bafc93 started
1/26/2017 12:37:46 PM2017/01/26 17:37:46 pump.pumpLogs(): a27d4bfb3a41 stopped
1/26/2017 12:37:46 PM2017/01/26 17:37:46 pump.pumpLogs(): a27d4bfb3a41 started
1/26/2017 12:37:46 PM2017/01/26 17:37:46 pump.pumpLogs(): 32e734dcf238 stopped
1/26/2017 12:37:46 PM2017/01/26 17:37:46 pump.pumpLogs(): 32e734dcf238 started
1/26/2017 12:37:47 PM2017/01/26 17:37:47 pump.pumpLogs(): 495069bafc93 stopped
1/26/2017 12:37:47 PM2017/01/26 17:37:47 pump.pumpLogs(): 495069bafc93 started
1/26/2017 12:37:48 PM2017/01/26 17:37:48 pump.pumpLogs(): 495069bafc93 stopped
1/26/2017 12:37:48 PM2017/01/26 17:37:48 pump.pumpLogs(): a27d4bfb3a41 stopped
1/26/2017 12:37:48 PM2017/01/26 17:37:48 pump.pumpLogs(): 495069bafc93 started
1/26/2017 12:37:49 PM2017/01/26 17:37:48 pump.pumpLogs(): a27d4bfb3a41 started

mattatcha · 2017-01-26T18:07:40Z

@pythianarceri Are you able to provide a docker-compose.yml I can use to reproduce the issue?

garceri · 2017-01-26T19:10:08Z

We use a custom built image compiled with the logstash output module,

logspout:
  image: pythiant9shared/logspout-logstash
  environment:
    DEBUG: '1'
    ROUTE_URIS: logstash://logstash:5000
  stdin_open: true
  external_links:
  - logstash/logstash-collector:logstash
  volumes:
  - /var/run/docker.sock:/var/run/docker.sock
  tty: true
  labels:
    io.rancher.scheduler.global: 'true'
    io.rancher.container.hostname_override: container_name

garceri · 2017-01-31T17:40:50Z

Any luck getting this fixed or at least the issue identified, is it normal for the pumps to be started/stopped all the time ?
We are running under RancherOS 1.3.3 just in case

mattdodge · 2017-03-14T00:20:30Z

Same thing here. I end up seeing a lot of these messages in /var/log/docker.log

time="2017-03-13T21:36:26.491710944Z" level=error msg="Error streaming logs: invalid character '\\x00' looking for beginning of value"
time="2017-03-13T21:36:26.514911587Z" level=error msg="Error streaming logs: invalid character '\\x00' looking for beginning of value"
time="2017-03-13T21:36:26.547530212Z" level=error msg="Error streaming logs: invalid character '\\x00' looking for beginning of value"

It's happened a couple times. Sometimes that "invalid character" is different but the gist of the log message is the same. We see dockerd up around 100% CPU and a constant stream of those errors.

byronwilliams · 2017-04-07T17:08:28Z

Hi all,

We've managed to reproduce the same issue as @pythianarceri - on Jan 26. To add some detail; running a docker logs -f {ContainerID} does not actually follow the logs and exits straight away. This would make sense with the pumpLogs starting and stopping as the HTTP connection being closed for some reason.

This is reproducible on the latest master commit (7c22df4) as well as 513b74c and 8b98313.

Our container is running a few processed inside, all of which have PYTHONUNBUFFERED=1.

Whilst I consider this to be a bug within our container not outputting to stdout and stderr correctly it also makes our docker nodes perform really badly when the containes are scheduled.

Can we add some kind of counter when the pump starts and if it restarts {n} times within a second make it backoff?.

# docker version
Client:
 Version:      17.03.0-ce
 API version:  1.26
 Go version:   go1.7.4
 Git commit:   60ccb22
 Built:        Mon Mar 13 18:07:51 UTC 2017
 OS/Arch:      linux/arm

Server:
 Version:      17.03.0-ce
 API version:  1.26 (minimum version 1.12)
 Go version:   go1.7.4
 Git commit:   60ccb22
 Built:        Mon Mar 13 18:07:51 UTC 2017
 OS/Arch:      linux/arm
 Experimental: false

woutor · 2017-05-03T16:06:00Z

Same issue here. Reverted back to docker image v3.1 and logging and CPU usage is back to normal.

sebgl · 2017-05-09T13:19:14Z

Same issue, we also reverted to docker image v3.1.

We are running Docker version 17.04.0-ce, build 4845c56, and deploying our logspout container with the new swarm mode ('docker stack deploy'). We're not using logstash, but forward logs to a remote syslog directly.

Here is the content of our docker-compose.yml file:

version: '3'

services:
  logspout:
    image: gliderlabs/logspout
    environment:
      SYSLOG_STRUCTURED_DATA: "- $SYSLOG_CONF"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    ports:
      - 10888:80
    command: syslog+tls://$LOGS_ENDPOINT
    deploy:
      mode: global

gaieges · 2017-05-31T19:36:37Z

I have the same problem, using the most recent master image (ef55cfcceda7) and version v3.3-dev on both docker 17.04.0-ce and 1.13.1.

It's fairly difficult to reproduce, and seems to occur only after a period of time - I'm not sure whether its a result of complex log messages (or potentially unexpected charsets), or just the volume of messages shipped off.

During the time that it's at high cpu usage, I also have seen a number of weird messages such as what mattdodge described

@woutor @sebgl after you reverted back to version 3.1 - did you see the problem pop up again?

woutor · 2017-05-31T21:18:29Z

@gaieges: Nope, with 3.1 the CPU usage remains normal.

gdurr · 2017-06-07T09:10:04Z

Same issue there with Docker 17.05.0-ce and logspout 3.3. We can't go back to 3.1 cause we need label filtering feature.

avoinea · 2017-06-07T09:17:50Z

Same issue here. Reverted back to docker image v3.1 and logging and CPU usage is back to normal.

michaelshobbs · 2017-06-07T14:52:28Z

I'm using v3.3 in production with papertrail. Is there anybody experiencing this issue that is NOT using logstash? I'm thinking the problem might be in that module...

gaieges · 2017-06-07T15:58:15Z

@michaelshobbs I don't use the logstash module, I just use the tcp+syslog output.

michaelshobbs · 2017-06-07T16:26:47Z

Interesting....anybody have any thoughts on this one?

SqiSch · 2017-06-16T09:56:58Z

Same issue here, but i cannot ack the high cpu load. Only a lot loglines in /var/log/syslog and the pump.pumpLogs() when debugging enabled.

i also using the logstash adapter

.....
165+02:00" level=error msg="Error streaming logs: invalid character 'l' after object key:value pair"
Jun 16 11:55:18 xyz docker[11404]: time="2017-06-16T11:55:18.307515574+02:00" level=error msg="Error streaming logs: invalid character 'l' after object key:value pair"
Jun 16 11:55:18 xyz docker[11404]: time="2017-06-16T11:55:18.432245150+02:00" level=error msg="Error streaming logs: invalid character 'l' after object key:value pair"
....

mrbobbytables · 2017-06-26T14:39:31Z

We encountered the same issue today randomly on one of hosts. I unfortunately do not have anymore info than what others have reported.
Docker 17.03.1-ce
RancherOS v1.0.2
logspout configured to send to syslog+tcp

sfrique · 2017-06-28T14:43:11Z

Hello folks,

I was having the same problem:

Jun 28 14:07:20 xxx docker[7686]: time="2017-06-28T14:07:20.892703572Z" level=error msg="Error streaming logs: invalid character 'l' after object key:value pair"
Jun 28 14:07:20 xxx docker[7686]: time="2017-06-28T14:07:20.894972987Z" level=error msg="Error streaming logs: invalid character 'l' after object key:value pair"

and it turns out to be corrupt json-file.log, it happened when I got a disk full so json-file.log was truncated.
I had some thing like this at my json-file.log after disk space got fixed and docker running again.

{"log":"    2017/06/28 07:46:00 [INFO] consul: adding WAN ser{"log":"    2017/06/28 07:46:00 [INFO] consul: adding WAN server xxx.dc1 (Addr: 10.240.0.21:8300) (DC: dc1)\n","stream":"stdout","time":"2017-06-28T07:46:00.5738544Z"}

So it isn't a valid json and it breaks docker logs, somehow it make logspout enter a loop and make docker print those lines.

If you try to run, docker logs $container on the container with that corrupt json-file.log, you will get one line of the error.

So look for corrupted/bad formatted json-files.log

I just removed that line and everything was working normally again.

Hope that helps.

gaieges · 2017-06-28T16:14:29Z

On @sfrique 's note - I wonder if using journald log driver (if possible) instead of json-file would help this?

alhafoudh · 2017-07-16T07:32:22Z

@gaieges I tried with journald, same problem

pauln · 2017-07-21T00:05:42Z

We're seeing the same thing - 100% CPU usage when running logspout (pumping to logstash). Enabling logspout's debug logs showed rapid repetition of log entries such as:

21/07/2017 11:57:202017/07/20 23:57:20 pump.pumpLogs(): fb6033455fd0 stopped
21/07/2017 11:57:202017/07/20 23:57:20 pump.pumpLogs(): fb6033455fd0 started

Initially, we were seeing this on most of our hosts, with two container IDs showing up on each. The containers in question were Rancher's metadata and DNS containers. We upgraded Rancher from 1.6.2 to 1.6.5 and upgraded all the system services, and logspout now seems to run happily on most of our hosts. However, we're still getting the stopped/started log spam (and corresponding CPU load) on our DB hosts - with only a single container ID flopping: mongodb.

garceri · 2017-07-21T16:45:12Z

The scenario described by @pauln is exactly what we are experiencing, (we haven't had chance to upgrade to 1.6.5 yet), Out of curiosity @pauln , what docker version are you running on your hosts ?

pauln · 2017-07-23T23:31:26Z

@garceri, we're currently running Docker 17.03 CE, though we're looking at upgrading to 17.06 CE in the near future.

robcza · 2017-07-26T07:23:07Z

My thoughts on this issue:

it is quite serious, because it can affect the whole host and performance of other containers
according to reaction of other people does not affect older logspout versions -> seems could be prevented on the logspout level. What is the difference between versions?
malformed logs seem to be the trouble. How they emerge? Could it be prevented on the docker daemon level?
can I help gathering any data while this happens? Running with DEBUG: true shows just the constant stream of pump start/stop

2017/07/26 07:09:48 pump.pumpLogs(): 21c90bb7ac5c stopped
2017/07/26 07:09:48 pump.pumpLogs(): 21c90bb7ac5c started
2017/07/26 07:09:48 pump.pumpLogs(): 21c90bb7ac5c stopped
2017/07/26 07:09:48 pump.pumpLogs(): 21c90bb7ac5c started

michaelshobbs · 2017-07-26T15:08:37Z

@robcza thanks for the additional details. can you provide a full debug log from start to stop of the logspout container that has entered into this state? looking for any other clues here.

To your question about difference between versions, we tag each release and thus you can browse the history by comparing tags. Additionally, we have a CHANGELOG.md with convenient links to compare each version.

ko-christ · 2017-07-28T11:03:28Z

Same for me on docker 17.03.1-ce using this image bekt/logspout-logstash:latest.

I have managed to get a full debug log since logspout container creation.
There is a constant start / stop stream for c21bde2e91e7.
I had this issue on other machines as well with various services.

# docker logs elk-logspout 
2017/07/28 10:50:57 setting allowTTY to: false
# logspout v3.3-dev-custom by gliderlabs
# adapters: raw udp tcp logstash
# options : debug:True persist:/mnt/routes
# jobs    : http[]:80 pump routes
# routes  :
#   ADAPTER	ADDRESS		CONTAINERS	SOURCES	OPTIONS
#   logstash+tcpprod00:1111				map[]
2017/07/28 10:50:57 pump.Run(): using inactivity timeout:  0s
2017/07/28 10:50:57 pump.pumpLogs(): 4dd58c94144a ignored: environ ignore
2017/07/28 10:50:57 pump.pumpLogs(): 741c807703d4 started
2017/07/28 10:50:57 pump.pumpLogs(): 03c007defa61 started
2017/07/28 10:50:57 pump.pumpLogs(): 0398b9b5542e started
2017/07/28 10:50:57 pump.pumpLogs(): 0b1b04fd17a9 started
2017/07/28 10:50:57 pump.pumpLogs(): 18b4f33a6b1d started
2017/07/28 10:50:57 pump.pumpLogs(): 18cd5d3b88cb started
2017/07/28 10:50:57 pump.pumpLogs(): dfebc3118ce3 started
2017/07/28 10:50:57 pump.pumpLogs(): e12708a548c4 started
2017/07/28 10:50:57 pump.pumpLogs(): 78423dfc9805 started
2017/07/28 10:50:57 pump.pumpLogs(): 872c498a0c2f started
2017/07/28 10:50:57 pump.pumpLogs(): 2f78048ee356 started
2017/07/28 10:50:57 pump.pumpLogs(): bcedcc0cd5d2 started
2017/07/28 10:50:57 pump.pumpLogs(): 0b8e0146d61c started
2017/07/28 10:50:57 pump.pumpLogs(): a1c7966d4ad3 started
2017/07/28 10:50:57 pump.pumpLogs(): d49788286a32 started
2017/07/28 10:50:57 pump.pumpLogs(): 41c5e05d0019 started
2017/07/28 10:50:57 pump.pumpLogs(): ce7030af1846 started
2017/07/28 10:50:57 pump.pumpLogs(): 78cff52b2cc8 started
2017/07/28 10:50:57 pump.pumpLogs(): c5b69f63a3e8 started
2017/07/28 10:50:57 pump.pumpLogs(): 52ab8eb7e542 started
2017/07/28 10:50:57 pump.pumpLogs(): f26b009798f4 started
2017/07/28 10:50:57 pump.pumpLogs(): 7b1f7d90859a started
2017/07/28 10:50:57 pump.pumpLogs(): a5b864e479b7 started
2017/07/28 10:50:57 pump.pumpLogs(): 08581c7c3829 started
2017/07/28 10:50:57 pump.pumpLogs(): 9e4497a0dcaf started
2017/07/28 10:50:57 pump.pumpLogs(): 227ca66d7f83 started
2017/07/28 10:50:57 pump.pumpLogs(): 6f6b327b7b9e started
2017/07/28 10:50:57 pump.pumpLogs(): 846ba10f0ee7 started
2017/07/28 10:50:57 pump.pumpLogs(): e672bc2d0fe2 started
2017/07/28 10:50:57 pump.pumpLogs(): f1d48ee0211f started
2017/07/28 10:50:57 pump.pumpLogs(): f8fdca2aff57 started
2017/07/28 10:50:57 pump.pumpLogs(): 5856ed01b7ff started
2017/07/28 10:50:57 pump.pumpLogs(): 57e77510b9de started
2017/07/28 10:50:57 pump.pumpLogs(): a05333be7c0f started
2017/07/28 10:50:57 pump.pumpLogs(): dcab95f7ca2f started
2017/07/28 10:50:57 pump.pumpLogs(): f28130fa353a started
2017/07/28 10:50:57 pump.pumpLogs(): 464d4d0b24bd started
2017/07/28 10:50:57 pump.pumpLogs(): 139fccfc578f started
2017/07/28 10:50:57 pump.pumpLogs(): 341f311f43c0 started
2017/07/28 10:50:57 pump.pumpLogs(): f08a73de23ab started
2017/07/28 10:50:57 pump.pumpLogs(): 7e88c606f51b started
2017/07/28 10:50:57 pump.pumpLogs(): c21bde2e91e7 started
2017/07/28 10:50:57 pump.pumpLogs(): 52fe6826666a started
2017/07/28 10:51:02 pump.pumpLogs(): c21bde2e91e7 stopped
2017/07/28 10:51:02 pump.pumpLogs(): c21bde2e91e7 started
2017/07/28 10:51:06 pump.pumpLogs(): c21bde2e91e7 stopped
2017/07/28 10:51:06 pump.pumpLogs(): c21bde2e91e7 started
2017/07/28 10:51:09 pump.pumpLogs(): c21bde2e91e7 stopped
2017/07/28 10:51:09 pump.pumpLogs(): c21bde2e91e7 started
2017/07/28 10:51:11 pump.pumpLogs(): c21bde2e91e7 stopped
2017/07/28 10:51:11 pump.pumpLogs(): c21bde2e91e7 started
2017/07/28 10:51:14 pump.pumpLogs(): c21bde2e91e7 stopped
2017/07/28 10:51:14 pump.pumpLogs(): c21bde2e91e7 started

michaelshobbs · 2017-07-28T16:22:46Z

Thank @ko-christ for more detail! Next time this happens it would be great to see if there's anything odd about the output of the container with the "thrashing" log pump. (i.e. docker logs c21bde2e91e7 in this particular case)

Same request to others that have noticed the stop/start thrash symptom.

shahar-davidson · 2017-08-10T05:59:55Z

Hi all,
We've just encountered this issue recently. (logspout starts and CPU jumps to ~100%)
Here are my insights after investigating this for a couple of hours:
In our environment, this was caused due to malformed JSON output in the container logs (when using json-file logging).
What apparently happened is that the machine ran out of space (we had a clear indication for that) and log lines could not be written properly - so we got json log line that were cut in the middle.
When that happens, docker will print out the following whenever someone tries to fetch the logs (even a simple docker logs command will reproduce this)

"Error streaming logs: invalid character 'l' after object key:value pair"

Apparently, docker will print this error and stop returning the log lines, even if there were many more lines after the corrupt json line.
On our machine we had about a dozen services constantly logging and this JSON line corruption was found in most of them.
It seems that docker encounters this and emits an error( https://github.com/moby/moby/blob/8af4db6f002ac907b6ef8610b237879dfcaa5b7a/daemon/logs.go#L109 ) yet Logspout immediately starts requesting logs again and, again, docker bombards the docker service logs with the above error message (tens per second).

Possible workarounds for such an issue (I didn't verify these options yet):

Setting the BACKLOG env var and restarting services
Stopping the service and moving/truncating the existing log files so that they would start off clean.

So this actually seems to be a docker issue. The best way to deal with this, IMHO, is to have docker detect such cases and roll the log file and put the corrupt log file aside (while prompting a meaningful warning about this)

One thing that puzzles me is why reverting to v3.1 "fixes" this issue (I didn't try reverting back to v3.1 yet, I'm using v3.3).

pythiancastaneda · 2017-08-31T00:06:55Z

Hello,
I am having the same issue. Docker log is flooded with messages like this:

time="2017-08-30T22:30:31.922624624Z" level=error msg="Error streaming logs: invalid character '\\x00' looking for beginning of value" 
time="2017-08-30T22:30:31.977376555Z" level=error msg="Error streaming logs: invalid character '\\x00' looking for beginning of value" 
time="2017-08-30T22:30:32.172320949Z" level=error msg="Error streaming logs: invalid character '\\x00' looking for beginning of value" 
time="2017-08-30T22:30:32.570219033Z" level=error msg="Error streaming logs: invalid character '\\x00' looking for beginning of value" 
time="2017-08-30T22:30:32.662642497Z" level=error msg="Error streaming logs: invalid character '\\x00' looking for beginning of value" 
time="2017-08-30T22:30:32.860254549Z" level=error msg="Error streaming logs: invalid character '\\x00' looking for beginning of value"

And the docker logs for bekt/logspout-logstash:master are:

2017/08/30 22:18:33 pump.pumpLogs(): 9972da45bf76 started
2017/08/30 22:18:33 pump.pumpLogs(): f6fddb18f835 started
2017/08/30 22:18:34 pump.pumpLogs(): 9972da45bf76 stopped
2017/08/30 22:18:34 pump.pumpLogs(): 9ab5e8cc0c36 stopped
2017/08/30 22:18:34 pump.pumpLogs(): 9972da45bf76 started
2017/08/30 22:18:34 pump.pumpLogs(): 9ab5e8cc0c36 started
2017/08/30 22:18:34 pump.Run() event: 42ad09d40141 network:connect
2017/08/30 22:18:34 pump.Run() event: orient-backu volume:mount
2017/08/30 22:18:34 pump.pumpLogs(): f6fddb18f835 stopped
2017/08/30 22:18:35 pump.pumpLogs(): f6fddb18f835 started
2017/08/30 22:18:35 pump.Run() event: orient-datab volume:mount
2017/08/30 22:18:35 pump.pumpLogs(): 9972da45bf76 stopped
2017/08/30 22:18:35 pump.Run() event: orient-log volume:mount
2017/08/30 22:18:35 pump.pumpLogs(): 9972da45bf76 started
2017/08/30 22:18:35 pump.Run() event: rancher-cni volume:mount
2017/08/30 22:18:35 pump.Run() event: efe51a4c5ed0 volume:mount
2017/08/30 22:18:36 pump.pumpLogs(): f6fddb18f835 stopped
2017/08/30 22:18:36 pump.pumpLogs(): 9972da45bf76 stopped
2017/08/30 22:18:36 pump.pumpLogs(): 9ab5e8cc0c36 stopped
2017/08/30 22:18:36 pump.Run() event: 58a80e107fc1 start
2017/08/30 22:18:36 pump.pumpLogs(): 9ab5e8cc0c36 started
2017/08/30 22:18:36 pump.pumpLogs(): 9972da45bf76 started
2017/08/30 22:18:36 pump.pumpLogs(): f6fddb18f835 started
2017/08/30 22:18:36 pump.pumpLogs(): 58a80e107fc1 started
2017/08/30 22:18:36 pump.pumpLogs(): 9972da45bf76 stopped
2017/08/30 22:18:36 pump.pumpLogs(): 9972da45bf76 started
2017/08/30 22:18:36 pump.pumpLogs(): 9ab5e8cc0c36 stopped
2017/08/30 22:18:36 pump.pumpLogs(): 9ab5e8cc0c36 started
2017/08/30 22:18:37 pump.pumpLogs(): 9972da45bf76 stopped
2017/08/30 22:18:37 pump.pumpLogs(): 9972da45bf76 started
2017/08/30 22:18:37 pump.pumpLogs(): f6fddb18f835 stopped
2017/08/30 22:18:37 pump.pumpLogs(): f6fddb18f835 started
2017/08/30 22:18:37 pump.pumpLogs(): 9ab5e8cc0c36 stopped
2017/08/30 22:18:37 pump.pumpLogs(): 9ab5e8cc0c36 started
2017/08/30 22:18:37 pump.pumpLogs(): 9972da45bf76 stopped
2017/08/30 22:18:37 pump.pumpLogs(): 9972da45bf76 started
2017/08/30 22:18:38 pump.pumpLogs(): be68d546188d stopped
2017/08/30 22:18:38 pump.pumpLogs(): be68d546188d started
2017/08/30 22:18:38 pump.pumpLogs(): f6fddb18f835 stopped
2017/08/30 22:18:38 pump.pumpLogs(): f6fddb18f835 started
2017/08/30 22:18:38 pump.pumpLogs(): 9972da45bf76 stopped
2017/08/30 22:18:38 pump.pumpLogs(): 9972da45bf76 started
2017/08/30 22:18:38 pump.pumpLogs(): 9ab5e8cc0c36 stopped
2017/08/30 22:18:38 pump.pumpLogs(): 9ab5e8cc0c36 started
2017/08/30 22:18:39 pump.pumpLogs(): 9972da45bf76 stopped
2017/08/30 22:18:39 pump.pumpLogs(): 9972da45bf76 started
2017/08/30 22:18:39 pump.pumpLogs(): 9ab5e8cc0c36 stopped
2017/08/30 22:18:39 pump.pumpLogs(): 9ab5e8cc0c36 started
2017/08/30 22:18:39 pump.pumpLogs(): f6fddb18f835 stopped
2017/08/30 22:18:39 pump.pumpLogs(): 9972da45bf76 stopped
2017/08/30 22:18:39 pump.pumpLogs(): f6fddb18f835 started
2017/08/30 22:18:39 pump.pumpLogs(): 9972da45bf76 started
2017/08/30 22:18:40 pump.pumpLogs(): 9ab5e8cc0c36 stopped
2017/08/30 22:18:40 pump.pumpLogs(): 9972da45bf76 stopped
2017/08/30 22:18:40 pump.pumpLogs(): 9972da45bf76 started
2017/08/30 22:18:40 pump.pumpLogs(): 9ab5e8cc0c36 started
2017/08/30 22:18:40 pump.pumpLogs(): f6fddb18f835 stopped
2017/08/30 22:18:40 pump.pumpLogs(): f6fddb18f835 started`

I have trimmed some of the docker-json logs and restarted the containers and it is not helping, once the logstput container starts it makes dockerd to eat almost all the CPU.

Thanks in advance!

michaelshobbs · 2017-08-31T00:43:53Z

@mattatcha thoughts on this given new info?

sfrique · 2017-08-31T03:41:02Z

@pythiancastaneda To figureout what container is giving you the trouble:

# tail -f /var/log/syslog | grep docker
Second term:
# docker logs "for each container you have"

That way you will see at logs:

time="2017-08-30T22:30:32.860254549Z" level=error msg="Error streaming logs: invalid character '\\x00' looking for beginning of value"

And then you can remove the log or the wrong line.

Hope that helps.

I recently had the same problem and i did the steps above and fixed the issue.

leriel · 2017-09-15T18:22:18Z

Thanks for the update @stevecalvert ! In my case issue was the same corrupt container logs. For me, 3.2.2 also was logging an additional error. Posting that just in case

logspout_1  | 2017/09/15 17:42:43 pump.pumpLogs(): 0e1c957edf2d started
logspout_1  | 2017/09/15 17:42:43 pump.pumpLogs(): 0e1c957edf2d stopped with error: Unrecognized input header: 3
logspout_1  | 2017/09/15 17:42:43 pump.pumpLogs(): 0e1c957edf2d started
logspout_1  | 2017/09/15 17:42:43 pump.pumpLogs(): 0e1c957edf2d stopped with error: Unrecognized input header: 3

Only way for me to fix that issue, like for others, was to remove and recreate container.

konvergence · 2017-09-20T17:55:44Z

@leriel
I get the same issue with logspout 3.2.2 and gelf plugins.

But i'm sure to understand the workarround ...

You recreate which container ? Logspout or containers with corrupted json-file logs ?

Is there any way to correct logspout code ?

konvergence · 2017-09-20T21:09:23Z

Don't know if thee is any relationship ...
but I get this message if I tried to tail on a *-json.log, given by logspout in debug mode

tail: inotify cannot be used, reverting to polling: Too many open files
seem to by a default limit on ubuntu ? that could explain a some corrupted json-file logs ?
see https://nefaria.com/2014/08/tail-inotify-resources-exhausted/

konvergence · 2017-09-20T21:50:45Z

Another info ...
when I purge all json-file logs given "stopped" by logspout, and restart it, then cpu usage on dockerd is flat

here my code:

for containerid in $(docker logs $(docker ps -a -q --filter "label=io.rancher.stack_service.name=graylog2/logspout")  2>&1| grep stopped | cut -d " " -f 4 | sort -u); do
     ls -l /var/lib/docker/containers/${containerid}*/${containerid}*-json.log
   > /var/lib/docker/containers/${containerid}*/${containerid}*-json.log
done

konvergence · 2017-09-21T05:10:23Z

One more information,
I use docker engine 17.03.2-ce. Last week, we had a power shutdown on the nodes.
In my case, I think that it was the cause, why we get corrupted json-file logs (Error streaming logs: invalid character 'l' after object key:value pair)
Do you know if these issue can be solved with a new docker-ce release ?

konvergence · 2017-09-21T10:12:09Z

Here some news ...
But don't know if there is a relationship

I get all containerid given stopped by logspout:
docker logs $(docker ps -a -q --filter "label=io.rancher.stack_service.name=graylog/logspout") 2>&1| grep stopped | cut -d " " -f 4 | sort -u

c062de60016b
c4d6b098968c
f8f18bba6203

I effectivelly found '\x00' of each file that is corrupted, and files are big


for containerid in $(docker ps -a -q); do
     grep -Pa '\x00' /var/lib/docker/containers/${containerid}*/${containerid}*-json.log
     if [ $? -eq 0 ]; then
          echo ${containerid}
          ls -lh /var/lib/docker/containers/${containerid}*/${containerid}*-json.log
    fi
done


{"log":"Creating volume folders...\n","stream":"stdout","time":"2017-09-16T15:50:34.379859731Z"}
c4d6b098968c
-rw-r----- 1 root root 564M Sep 21 12:11 /var/lib/docker/containers/c4d6b098968c475a82ed7ff34cc4e50344f097155f10d0d9d07528437b4417e7/c4d6b098968c475a82ed7ff34cc4e50344f097155f10d0d9d07528437b4417e7-json.log
{"log":"Creating volume folders...\n","stream":"stdout","time":"2017-09-16T15:50:33.076131673Z"}
c062de60016b
-rw-r----- 1 root root 478M Sep 21 12:11 /var/lib/docker/containers/c062de60016bd13faeebceb0301c121b35a0dfbc620048584a3d06c3d0449761/c062de60016bd13faeebceb0301c121b35a0dfbc620048584a3d06c3d0449761-json.log
{"log":"Creating volume folders...\n","stream":"stdout","time":"2017-09-16T15:50:47.996912867Z"}
f8f18bba6203
-rw-r----- 1 root root 806M Sep 21 12:11 /var/lib/docker/containers/f8f18bba62032e188dd75f76b9cc061936e804995289d7406b89d43e02d7cc36/f8f18bba62032e188dd75f76b9cc061936e804995289d7406b89d43e02d7cc36-json.log

pierreozoux · 2017-10-19T21:46:59Z

We experienced the same issue yesterday.
We don't use logspout, but it is the closest issue we found, with this kind of error logs:

Jun 28 14:07:20 xxx docker[7686]: time="2017-06-28T14:07:20.892703572Z" level=error msg="Error streaming logs: invalid character 'l' after object key:value pair"

And the invalid char being anything in our case.

We are still investigating, but would like to see if we can find the issue together.
It looks like with our current information that one of our containers is outputting some kind of json under specific conditions, and that dockerd doesn't like it at all. Didn't find any moby related issue though :/

We'll let you know of our findings.

Our setup is different though, docker 1.12 + k8s 1.7.5 with scalyr agent ingesting logs from dockerd.

mjablecnik · 2017-10-24T10:19:14Z

Hello I have:

$ docker version
Client:
 Version:      17.09.0-ce
 API version:  1.32
 Go version:   go1.8.3
 Git commit:   afdb6d4
 Built:        Tue Sep 26 22:42:09 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.09.0-ce
 API version:  1.32 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   afdb6d4
 Built:        Tue Sep 26 22:40:48 2017
 OS/Arch:      linux/amd64
 Experimental: false

System: Debian 9.1

logspout setup in Rancher:

and in syslog dockerd errors:

Oct 24 10:03:39 web4-vultr dockerd[31377]: time="2017-10-24T10:03:39.819294736+01:00" level=error msg="Error streaming logs: invalid character '\x00' looking for beginning of value" container=db62afc089ec method="(*Daemon).ContainerLogs" module=daemon
Oct 24 10:03:41 web4-vultr dockerd[31377]: time="2017-10-24T10:03:41.950449384+01:00" level=error msg="Error streaming logs: invalid character '\x00' looking for beginning of value" container=db62afc089ec method="(*Daemon).ContainerLogs" module=daemon
Oct 24 10:03:42 web4-vultr dockerd[31377]: time="2017-10-24T10:03:42.717452402+01:00" level=error msg="Error streaming logs: invalid character '\x00' looking for beginning of value" container=6488521fdc8f method="(*Daemon).ContainerLogs" module=daemon

and CPU usage is very high

I dont know why, but downgrade logspout to version v3.1 fix it..

rhuddleston · 2017-10-30T17:25:03Z

We're on AWS and see this issue regularly. We have ~100 EC2 instances and when AWS suddenly retires or shuts down an instance the server restarts but some of the docker json log files end up corrupt.

I can see where the corruption is and what line by doing the following:

for FILE in /var/lib/docker/containers/*/*-json.log; do cat $FILE | jq . > /dev/null; done

I used to just shutdown the container that corresponded to that corrupt logs(s), manually remove the corrupted (half written json line), then restart the container and all was well. Though with newer version of docker that no-longer fixes the issues I have to remove all the containers and re-deploy them to fix it now. I think this is really a Docker issue. If Docker detects a half written file I think it needs to handle this correctly. Basically 100% of the time AWS restarts a node unexpectedly this happens for us. Typically it's faster to completely replace the node than spend a bunch of time trying to fix this by hand.

ghost · 2017-12-12T06:48:00Z

We have the same issue but found the offending log files using grep below.

grep -P '\x00' /var/lib/docker/containers/**/*json.log

After manually removing the null "\x00" characters from the log the issue went away.
It seems to be a docker issue docker/for-linux#140 related to unexpected host/daemon restarts.

ppiccolo · 2018-02-14T20:22:26Z

me too, same issue ... any suggestion ?

shahar-davidson · 2018-03-06T10:19:27Z

We resolved to use #321 as a workaround for the time being. (note that BACKLOG should not be set, or be set to true).

mahnunchik · 2018-03-23T14:55:23Z

I've faced with 100% CPU usage and error:

Mar 23 16:48:52 burger dockerd[569]: time="2018-03-23T16:48:52.971499270+02:00" level=error msg="Error streaming logs: EOF" container=b1dc4286ba72 method="(*Daemon).ContainerLogs" module=daemon

on cadvisor container.

My config:

version: '2.3'

services:

  logspout:
    image: gliderlabs/logspout:v3.2.4
    container_name: logspout
    command: syslog+tcp://logs-01.loggly.com:514
    restart: always
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    environment:
      TAIL: 10

Details: * gliderlabs/logspout#246 * gliderlabs/logspout#321

arunesh90 · 2018-07-30T20:23:53Z

Any update on this? Just had it happen on a server which became pretty unresponsive because of it

kamaev · 2018-08-05T10:42:47Z

+upvoting rollback to 3.1

I've just force pushed ca675f3 and manually added:

- glide.lock
- glide.yaml
- Dockerfile
- build.sh
- and also /adapters/multiline/multiline.go to use multiline

from latest master to build an image
new container works good

biguphpc · 2021-01-22T18:50:20Z

I just wanted to confirm about this issue with the latest CE version of Docker (19.03.12) and logspout (syslog+tls). We've had production servers crashing because of this. Dockerd was using 1600% CPU and then became unresponsive on a 32 CPU server. Noticed a very high number of goroutines showing with docker info in debug mode. Stopping the logspout container and waiting a few hours resolved the issue and the Dockerd CPU was back to normal (along with the number of goroutines). Tried a workarounds like TAIL=0 without success. Reverting to v3.1 solves the issue for now.

We had about 30 containers running on that machine, logging was heavy but not excessive. It's easily reproducible with the latest logspout version: starting logspout with a lot of containers logging show dockerd using high CPU almost immediately and increasing over time.

Maybe v3.1 is querying the docker daemon differently than newer versions and flooding it with requests leading to a high number of goroutines and high CPU?

garceri mentioned this issue Feb 8, 2017

High CPU Usage on dockerd looplab/logspout-logstash#45

Open

michaelshobbs added the status: needs more info label Apr 11, 2017

mattdodge mentioned this issue May 31, 2017

Rancher agent causes docker daemon to eat up all CPU rancher/rancher#7976

Closed

michaelshobbs mentioned this issue Jun 16, 2017

Server crashes after starting the logspout container #311

Open

Crashthatch mentioned this issue Jun 21, 2017

Docker logs --follow sometimes doesn't work; requires container to be rebuilt docker/for-mac#307

Closed

stevecalvert mentioned this issue Sep 1, 2017

Tail env #321

Merged

rhuddleston mentioned this issue Oct 30, 2017

docker logs with the json-file driver can attempt to parse half-written events moby/moby#29511

Open

ghost mentioned this issue Dec 12, 2017

Error grabbing logs: invalid character '\x00' looking for beginning of value docker/for-linux#140

Closed

3 tasks

avoinea added a commit to eea/eea.rancher.catalog that referenced this issue Jun 14, 2018

Refs #96176 - logspout - Avoid CPU burst

6fedb71

Details: * gliderlabs/logspout#246 * gliderlabs/logspout#321

faryon93 mentioned this issue Feb 9, 2019

missing log lines logdna/logspout#6

Closed

ghost mentioned this issue Oct 18, 2021

CPU usage spike logdna/logspout#33

Open

CPU usage on dockerd 100% #246

CPU usage on dockerd 100% #246

Comments

runningman84 commented Dec 13, 2016

garceri commented Jan 26, 2017 • edited Loading

mattatcha commented Jan 26, 2017

garceri commented Jan 26, 2017

garceri commented Jan 31, 2017

mattdodge commented Mar 14, 2017

byronwilliams commented Apr 7, 2017

woutor commented May 3, 2017

sebgl commented May 9, 2017 • edited Loading

gaieges commented May 31, 2017

woutor commented May 31, 2017

gdurr commented Jun 7, 2017

avoinea commented Jun 7, 2017

michaelshobbs commented Jun 7, 2017

gaieges commented Jun 7, 2017

michaelshobbs commented Jun 7, 2017

SqiSch commented Jun 16, 2017 • edited Loading

mrbobbytables commented Jun 26, 2017

sfrique commented Jun 28, 2017

gaieges commented Jun 28, 2017

alhafoudh commented Jul 16, 2017

pauln commented Jul 21, 2017

garceri commented Jul 21, 2017 • edited Loading

pauln commented Jul 23, 2017

robcza commented Jul 26, 2017 • edited Loading

michaelshobbs commented Jul 26, 2017

ko-christ commented Jul 28, 2017

michaelshobbs commented Jul 28, 2017

shahar-davidson commented Aug 10, 2017

pythiancastaneda commented Aug 31, 2017

michaelshobbs commented Aug 31, 2017

sfrique commented Aug 31, 2017

leriel commented Sep 15, 2017

konvergence commented Sep 20, 2017 • edited Loading

konvergence commented Sep 20, 2017

konvergence commented Sep 20, 2017

konvergence commented Sep 21, 2017

konvergence commented Sep 21, 2017

pierreozoux commented Oct 19, 2017 • edited Loading

mjablecnik commented Oct 24, 2017

rhuddleston commented Oct 30, 2017 • edited Loading

ghost commented Dec 12, 2017

ppiccolo commented Feb 14, 2018 • edited Loading

shahar-davidson commented Mar 6, 2018

mahnunchik commented Mar 23, 2018

arunesh90 commented Jul 30, 2018

kamaev commented Aug 5, 2018

biguphpc commented Jan 22, 2021 • edited Loading

garceri commented Jan 26, 2017 •

edited

Loading

sebgl commented May 9, 2017 •

edited

Loading

SqiSch commented Jun 16, 2017 •

edited

Loading

garceri commented Jul 21, 2017 •

edited

Loading

robcza commented Jul 26, 2017 •

edited

Loading

konvergence commented Sep 20, 2017 •

edited

Loading

pierreozoux commented Oct 19, 2017 •

edited

Loading

rhuddleston commented Oct 30, 2017 •

edited

Loading

ppiccolo commented Feb 14, 2018 •

edited

Loading

biguphpc commented Jan 22, 2021 •

edited

Loading