Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High CPU Load and intermediate image count failure #1342

Closed
sukrit007 opened this issue Feb 3, 2015 · 5 comments · Fixed by #1345
Closed

High CPU Load and intermediate image count failure #1342

sukrit007 opened this issue Feb 3, 2015 · 5 comments · Fixed by #1345

Comments

@sukrit007
Copy link

When there are lot of intermediate docker images, the following line leads to high cpu utilization by docker daemon:

https://github.com/DataDog/dd-agent/blob/5.1.x/checks.d/docker.py#L150

Also the call fails due to timeout .

Can we make intermediate image counter configurable (and turn it off by default ?).

Feb 03 06:32:23 ip-10-102-169-13.ec2.internal dockerd[21250]: time="2015-02-03T06:32:23Z" level="info" msg="GET /containers/json?all=False&size=False"
Feb 03 06:32:23 ip-10-102-169-13.ec2.internal dockerd[21250]: time="2015-02-03T06:32:23Z" level="info" msg="+job containers()"
Feb 03 06:32:23 ip-10-102-169-13.ec2.internal dockerd[21250]: time="2015-02-03T06:32:23Z" level="info" msg="-job containers() = OK (0)"
Feb 03 06:32:23 ip-10-102-169-13.ec2.internal dockerd[21250]: time="2015-02-03T06:32:23Z" level="info" msg="GET /containers/json?all=True&size=False"
Feb 03 06:32:23 ip-10-102-169-13.ec2.internal dockerd[21250]: time="2015-02-03T06:32:23Z" level="info" msg="+job containers()"
Feb 03 06:32:23 ip-10-102-169-13.ec2.internal dockerd[21250]: time="2015-02-03T06:32:23Z" level="info" msg="-job containers() = OK (0)"
Feb 03 06:32:23 ip-10-102-169-13.ec2.internal dockerd[21250]: time="2015-02-03T06:32:23Z" level="info" msg="GET /events?since=1422945112&until=1422945143"
Feb 03 06:32:23 ip-10-102-169-13.ec2.internal dockerd[21250]: time="2015-02-03T06:32:23Z" level="info" msg="+job events()"
Feb 03 06:32:23 ip-10-102-169-13.ec2.internal dockerd[21250]: time="2015-02-03T06:32:23Z" level="info" msg="-job events() = OK (0)"
Feb 03 06:32:26 ip-10-102-169-13.ec2.internal dockerd[21250]: write unix @: broken pipe
Feb 03 06:32:26 ip-10-102-169-13.ec2.internal dockerd[21250]: time="2015-02-03T06:32:26Z" level="info" msg="-job images() = ERR (1)"
Feb 03 06:32:26 ip-10-102-169-13.ec2.internal dockerd[21250]: time="2015-02-03T06:32:26Z" level="error" msg="Handler for GET /images/json returned error: write unix @: broken pipe"
Feb 03 06:32:26 ip-10-102-169-13.ec2.internal dockerd[21250]: time="2015-02-03T06:32:26Z" level="error" msg="HTTP Error: statusCode=500 write unix @: broken pipe"

Env Info:
Docker version: Docker version 1.4.1, build 5bc2ff8-dirty
Agent Image: datadog/docker-dd-agent:latest image-id: 8f65aeca1481

@LotharSee
Copy link
Contributor

Hi @sukrit007 ,

Can you try to use the socket_timeout option and increase it?
https://github.com/DataDog/dd-agent/blob/5.1.x/conf.d/docker.yaml.example#L14

A big list of images may take longer than the default timeout (5 seconds).
If it doesn't help, please share your collector logs with the log_level: DEBUG.

@sukrit007
Copy link
Author

In theory I think that should get rid of timeouts. However, this call seems a bit expensive when used with all=true parameter. (results in CPU spikes). Wondering if this can be made configurable.

Ideally I would have hoped docker to provide api to publish the counts. But intermediate image layers might not have much relevance. So would prefer to turn it off.

LeoCavaille added a commit that referenced this issue Feb 3, 2015
Fixes #1342.
We should always cast boolean options from the config with
the `_is_affirmative` tool, otherwise you're exposed to
oddities like bool('false') == True.
This introduces a new option `collect_images_stats` that is
enabled by default and skips the collection of metrics
`docker.images.available` and `docker.images.intermediate` that
sometimes is very slow through docker API if you have a lot of
intermediate layer images.
@sukrit007
Copy link
Author

Is there easy way to install from latest source ?
I was looking at https://github.com/DataDog/dd-agent/blob/master/packaging/datadog-agent/source/setup_agent.sh , but it seems to be installing from 5.1.1 tag or when will this be published to datadog/docker-dd-agent ?

@LotharSee
Copy link
Contributor

This will be part of the Agent 5.2, which will be release this month. But the packages and the container will be upgraded together.

But if you can't wait, you can put a modified version of the docker check into /etc/dd-agent/checks.d. This will override the default one.

You can copy the check and remove this line https://github.com/DataDog/dd-agent/blob/5.1.x/checks.d/docker.py#L130
This way it will no longer collect images metrics.

@sukrit007
Copy link
Author

Thanks. for now I have created a customized docker image and commented the line you mentioned:

https://github.com/totem/dd-agent/blob/master/Dockerfile#L4

Note: I had to modify file:

/opt/datadog-agent/agent/checks.d/docker.py

CPU load is much better as compared to before. I will investigate to see if i run into anything else. Thanks for quick turnaround.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants