-
Notifications
You must be signed in to change notification settings - Fork 679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPU usage on dockerd 100% #246
Comments
Experiencing the same issue here: $logspout --version $docker version Server: [root@ip-XXXXXXX rancher]# docker info Logspout container logs with DEBUG enabled show a constant stream of:
|
@pythianarceri Are you able to provide a docker-compose.yml I can use to reproduce the issue? |
We use a custom built image compiled with the logstash output module,
|
Any luck getting this fixed or at least the issue identified, is it normal for the pumps to be started/stopped all the time ? |
Same thing here. I end up seeing a lot of these messages in
It's happened a couple times. Sometimes that "invalid character" is different but the gist of the log message is the same. We see |
Hi all, We've managed to reproduce the same issue as @pythianarceri - on Jan 26. To add some detail; running a This is reproducible on the latest Our container is running a few processed inside, all of which have Whilst I consider this to be a bug within our container not outputting to Can we add some kind of counter when the
|
Same issue here. Reverted back to docker image v3.1 and logging and CPU usage is back to normal. |
Same issue, we also reverted to docker image v3.1. We are running Docker version 17.04.0-ce, build 4845c56, and deploying our logspout container with the new swarm mode ('docker stack deploy'). We're not using logstash, but forward logs to a remote syslog directly. Here is the content of our docker-compose.yml file:
|
I have the same problem, using the most recent master image ( It's fairly difficult to reproduce, and seems to occur only after a period of time - I'm not sure whether its a result of complex log messages (or potentially unexpected charsets), or just the volume of messages shipped off. During the time that it's at high cpu usage, I also have seen a number of weird messages such as what mattdodge described @woutor @sebgl after you reverted back to version 3.1 - did you see the problem pop up again? |
@gaieges: Nope, with 3.1 the CPU usage remains normal. |
Same issue there with Docker 17.05.0-ce and logspout 3.3. We can't go back to 3.1 cause we need label filtering feature. |
Same issue here. Reverted back to docker image v3.1 and logging and CPU usage is back to normal. |
I'm using v3.3 in production with papertrail. Is there anybody experiencing this issue that is NOT using logstash? I'm thinking the problem might be in that module... |
@michaelshobbs I don't use the logstash module, I just use the tcp+syslog output. |
Interesting....anybody have any thoughts on this one? |
Same issue here, but i cannot ack the high cpu load. Only a lot loglines in /var/log/syslog and the pump.pumpLogs() when debugging enabled. i also using the logstash adapter
|
We encountered the same issue today randomly on one of hosts. I unfortunately do not have anymore info than what others have reported. |
Hello folks, I was having the same problem:
and it turns out to be corrupt
So it isn't a valid json and it breaks If you try to run, So look for corrupted/bad formatted I just removed that line and everything was working normally again. Hope that helps. |
On @sfrique 's note - I wonder if using journald log driver (if possible) instead of json-file would help this? |
@gaieges I tried with journald, same problem |
We're seeing the same thing - 100% CPU usage when running logspout (pumping to logstash). Enabling logspout's debug logs showed rapid repetition of log entries such as:
Initially, we were seeing this on most of our hosts, with two container IDs showing up on each. The containers in question were Rancher's metadata and DNS containers. We upgraded Rancher from 1.6.2 to 1.6.5 and upgraded all the system services, and logspout now seems to run happily on most of our hosts. However, we're still getting the stopped/started log spam (and corresponding CPU load) on our DB hosts - with only a single container ID flopping: mongodb. |
@garceri, we're currently running Docker 17.03 CE, though we're looking at upgrading to 17.06 CE in the near future. |
My thoughts on this issue:
|
@robcza thanks for the additional details. can you provide a full debug log from start to stop of the logspout container that has entered into this state? looking for any other clues here. To your question about difference between versions, we tag each release and thus you can browse the history by comparing tags. Additionally, we have a |
Same for me on docker I have managed to get a full debug log since logspout container creation.
|
Thank @ko-christ for more detail! Next time this happens it would be great to see if there's anything odd about the output of the container with the "thrashing" log pump. (i.e. Same request to others that have noticed the stop/start thrash symptom. |
Hi all,
Apparently, docker will print this error and stop returning the log lines, even if there were many more lines after the corrupt json line. Possible workarounds for such an issue (I didn't verify these options yet):
So this actually seems to be a docker issue. The best way to deal with this, IMHO, is to have docker detect such cases and roll the log file and put the corrupt log file aside (while prompting a meaningful warning about this) One thing that puzzles me is why reverting to v3.1 "fixes" this issue (I didn't try reverting back to v3.1 yet, I'm using v3.3). |
Hello,
And the docker logs for
I have trimmed some of the docker-json logs and restarted the containers and it is not helping, once the logstput container starts it makes dockerd to eat almost all the CPU. Thanks in advance! |
@mattatcha thoughts on this given new info? |
@pythiancastaneda To figureout what container is giving you the trouble:
That way you will see at logs:
And then you can remove the log or the wrong line. Hope that helps. I recently had the same problem and i did the steps above and fixed the issue. |
Thanks for the update @stevecalvert ! In my case issue was the same corrupt container logs. For me, 3.2.2 also was logging an additional error. Posting that just in case
Only way for me to fix that issue, like for others, was to remove and recreate container. |
@leriel But i'm sure to understand the workarround ... You recreate which container ? Logspout or containers with corrupted json-file logs ? Is there any way to correct logspout code ? |
Don't know if thee is any relationship ...
|
Another info ... here my code:
|
One more information, |
Here some news ... I get all containerid given stopped by logspout:
I effectivelly found '\x00' of each file that is corrupted, and files are big
|
We experienced the same issue yesterday.
And the invalid char being anything in our case. We are still investigating, but would like to see if we can find the issue together. We'll let you know of our findings. Our setup is different though, docker 1.12 + k8s 1.7.5 with scalyr agent ingesting logs from dockerd. |
Hello I have:
System: Debian 9.1 and in syslog dockerd errors:
and CPU usage is very high I dont know why, but downgrade logspout to version v3.1 fix it.. |
We're on AWS and see this issue regularly. We have ~100 EC2 instances and when AWS suddenly retires or shuts down an instance the server restarts but some of the docker json log files end up corrupt. I can see where the corruption is and what line by doing the following:
I used to just shutdown the container that corresponded to that corrupt logs(s), manually remove the corrupted (half written json line), then restart the container and all was well. Though with newer version of docker that no-longer fixes the issues I have to remove all the containers and re-deploy them to fix it now. I think this is really a Docker issue. If Docker detects a half written file I think it needs to handle this correctly. Basically 100% of the time AWS restarts a node unexpectedly this happens for us. Typically it's faster to completely replace the node than spend a bunch of time trying to fix this by hand. |
We have the same issue but found the offending log files using grep below.
After manually removing the null "\x00" characters from the log the issue went away. |
me too, same issue ... any suggestion ? |
We resolved to use #321 as a workaround for the time being. (note that BACKLOG should not be set, or be set to true). |
I've faced with 100% CPU usage and error:
on My config: version: '2.3'
services:
logspout:
image: gliderlabs/logspout:v3.2.4
container_name: logspout
command: syslog+tcp://logs-01.loggly.com:514
restart: always
volumes:
- /var/run/docker.sock:/var/run/docker.sock
environment:
TAIL: 10 |
Any update on this? Just had it happen on a server which became pretty unresponsive because of it |
+upvoting rollback to 3.1 I've just force pushed ca675f3 and manually added:
from latest master to build an image |
I just wanted to confirm about this issue with the latest CE version of Docker (19.03.12) and logspout (syslog+tls). We've had production servers crashing because of this. Dockerd was using 1600% CPU and then became unresponsive on a 32 CPU server. Noticed a very high number of We had about 30 containers running on that machine, logging was heavy but not excessive. It's easily reproducible with the latest logspout version: starting logspout with a lot of containers logging show dockerd using high CPU almost immediately and increasing over time. Maybe v3.1 is querying the docker daemon differently than newer versions and flooding it with requests leading to a high number of goroutines and high CPU? |
If I start logspout-logstash using this docker-compose the cpu usage spikes to 100% and stays there...
before starting logspout
after starting logspout
The text was updated successfully, but these errors were encountered: