Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stats: add a interval parameter to cli and api stats streaming #11003

Merged
merged 2 commits into from
Aug 4, 2021

Conversation

towe75
Copy link
Contributor

@towe75 towe75 commented Jul 21, 2021

podman stats cli and api polls reports by default in a 1 sec period.

This can put some load on a machine if you run many containers.

A simple test is to run e.g. a range of pause containers like so:

seq 35 |xargs -rn1 podman run -d --rm gcr.io/google_containers/pause

Next, run "podman stats" (or the API request), and monitor e.g. "Top" to see the load.
I found that the compat api endpoint polls by default in a 5 sec period which is not so aggressive.

Making it configurable seemed to be a good solution, hence this PR. You can now easily align the poll period and e.g. your metrics collector interval.
There was also a unused 5 sec const in the libpod endpoint, i removed it.

You can now change this interval with a optional --interval cli flag.
The api request got a interval query parameter for the same purpose.
Default behavior is not changed.

I could not find a unit or e2e test for the stats streaming mode, so i was unsure how to provide/enhance a test.
CLI Manual and REST API doc should be covered in the PR.

Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM, thank you! Shall we do the same pod stats?

I have no good idea how we could test that (Cc @edsantiago).

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.
Some comments which should be addressed. Also you have to add this new parameter to the pkg/bindings here to make it work with podman-remote, make sure to run make generate-bindings afterwards.

Your commit is also missing the sign-off-by line.

cmd/podman/containers/stats.go Outdated Show resolved Hide resolved
docs/source/markdown/podman-stats.1.md Outdated Show resolved Hide resolved
pkg/api/server/register_containers.go Outdated Show resolved Hide resolved
@edsantiago
Copy link
Member

Possible test, e.g. 460-stats.bats

#!/usr/bin/env bats   -*- bats -*-
#
# tests for podman stats
#

load helpers

@test "podman stats --interval" {
    # Simple boring process
    run_podman run -d --name foo $IMAGE sleep 30

    # Run stats for 15 seconds. This blocks.
    PODMAN_TIMEOUT=15 run_podman 124 stats --interval 10

    # Confirm that we see some headers, on first and third lines
    is "${lines[0]}" "ID.*NAME.*CPU.*MEM USAGE.*CPU TIME" "stat headers seen"
    is "${lines[2]}" "ID.*NAME.*CPU.*MEM USAGE.*CPU TIME" \
       "stat headers seen (second iteration)"

    # Last line should be a timeout
    is "${lines[4]}" "timeout: sending signal TERM.*" "timeout received"

    # Confirm that --interval worked (remember that we have a timeout line)
    is "${#lines[*]}" "5" "number of stat lines in 15 seconds"

    run_podman stop --time=0 foo
    run_podman rm -f foo
}

# vim: filetype=sh

I really hate this kind of test, because it adds wall-clock time to our already-much-too-long CI. I'm commenting simply to offer the option, but I hope that others decide it is not worth the CI bloat.

@rhatdan
Copy link
Member

rhatdan commented Jul 21, 2021

I am fine with not testing this.

@mheon
Copy link
Member

mheon commented Jul 21, 2021

Can we add a test that invalid times are rejected (e.g. --interval 0 or --interval -1)?

Otherwise LGTM
/approve

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 21, 2021
@rhatdan
Copy link
Member

rhatdan commented Jul 21, 2021

LGTM, after you add invalid --interval check.

@towe75
Copy link
Contributor Author

towe75 commented Jul 22, 2021

@mheon before i write the test: do you expect that the code actively rejects (fails) when called with zero value or negative interval? Currently it makes a graceful fallback to 1 second. Failing would make a cli and api test simpler, of course.

@rhatdan
Copy link
Member

rhatdan commented Jul 22, 2021

Yes the code should fail, on invalid values being passed in.

@Luap99
Copy link
Member

Luap99 commented Jul 22, 2021

Sry I forgot you also have to add this for podman-remote as well

diff --git a/pkg/domain/infra/tunnel/containers.go b/pkg/domain/infra/tunnel/containers.go
index c17d7b54f..1df79c373 100644
--- a/pkg/domain/infra/tunnel/containers.go
+++ b/pkg/domain/infra/tunnel/containers.go
@@ -871,7 +871,7 @@ func (ic *ContainerEngine) ContainerStats(ctx context.Context, namesOrIds []stri
        if options.Latest {
                return nil, errors.New("latest is not supported for the remote client")
        }
-       return containers.Stats(ic.ClientCtx, namesOrIds, new(containers.StatsOptions).WithStream(options.Stream))
+       return containers.Stats(ic.ClientCtx, namesOrIds, new(containers.StatsOptions).WithStream(options.Stream).WithInterval(options.Interval))
 }
 
 // ShouldRestart reports back whether the container will restart

@mheon
Copy link
Member

mheon commented Jul 22, 2021

Concur with @rhatdan - we should actively error on bad values

Copy link
Member

@TomSweeneyRedHat TomSweeneyRedHat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts on adding this to the pods side of things too? https://github.com/containers/podman/blob/main/cmd/podman/pods/stats.go

@towe75
Copy link
Contributor Author

towe75 commented Jul 24, 2021

@Luap99 : is addressed in latest commit
@rhatdan, @mheon : function call fails now with appropriate message, i will add a test, soon.
@TomSweeneyRedHat : maybe it can go into a separate issue/PR if there is a demand for it?

@towe75
Copy link
Contributor Author

towe75 commented Jul 25, 2021

I tried to get into your e2e tests in order to extend the existing stats_test.go.
I burned some time because i was not able to run the tests. Gingko seemed to ignore this file when invoked with

TESTFLAGS='--dryRun --focus="stats"' make localintegration

After an hour or so i found the "// +build" line at the top of the file and removed it. It's the only test with exactly this annotation. Is it intentional behavior or a copy/paste bug?

Also i would recommend to change the spec descriptions in pod_stats_test.go to differ from stats_test.go. Eg. "podman pod stats ..." vs "podman stats ...". Should i include it in this PR?

@Luap99
Copy link
Member

Luap99 commented Jul 26, 2021

I tried to get into your e2e tests in order to extend the existing stats_test.go.
I burned some time because i was not able to run the tests. Gingko seemed to ignore this file when invoked with

TESTFLAGS='--dryRun --focus="stats"' make localintegration

After an hour or so i found the "// +build" line at the top of the file and removed it. It's the only test with exactly this annotation. Is it intentional behavior or a copy/paste bug?

That's definitely a real bug, please remove the // +build

Also i would recommend to change the spec descriptions in pod_stats_test.go to differ from stats_test.go. Eg. "podman pod stats ..." vs "podman stats ...". Should i include it in this PR?

Yes

@towe75
Copy link
Contributor Author

towe75 commented Jul 26, 2021

Ok, seems like i found a rabbit hole here. Re-enabled the stats tests, found broken test, found that podman stats exits with 0 if a invalid format string is given.

Find my tests for the interval flag for negative, zero and positive value. Hope everything is fine now.

@edsantiago
Copy link
Member

@towe75 the failing check is (correctly) noting that your commit message summary (the first line) is too long.

podman stats polled by default in a 1 sec period.
This can put quite some load on a machine if you run many containers.

The default value is now 5 seconds.
You can change this interval with a new, optional, --interval, -i cli flag.
The api request got also a interval query parameter for the same purpose.

Additionally a unused const was removed.
Api and cli will fail the request if a 0 or negative value is passed in.

Signed-off-by: Thomas Weber <[email protected]>
@towe75
Copy link
Contributor Author

towe75 commented Jul 27, 2021

@edsantiago : sorry, fixed

Something is still wrong with the tests. My added tests are fine when running "localintegration". I can see that your CI tests invoke tests using podman remote and errors/exit-code is not propagated automatically/correctly.

  Running: podman-remote [options] stats -a --no-reset --no-stream --interval=-1

Did i miss some plumbing?

@Luap99
Copy link
Member

Luap99 commented Aug 2, 2021

@towe75 Sorry for the silence, I lost track of this PR. You did nothing wrong the stats binding ignored server errors. This diff should make it work:

diff --git a/pkg/bindings/containers/containers.go b/pkg/bindings/containers/containers.go
index 86304f392..bc7b0c8c9 100644
--- a/pkg/bindings/containers/containers.go
+++ b/pkg/bindings/containers/containers.go
@@ -223,6 +223,9 @@ func Stats(ctx context.Context, containers []string, options *StatsOptions) (cha
        if err != nil {
                return nil, err
        }
+       if !response.IsSuccess() {
+               return nil, response.Process(nil)
+       }
 
        statsChan := make(chan entities.ContainerStatsReport)

Renamed podman pod stats test specs to distinguish them from podman stats tests.
podman stats tests where disabled by a +build flag.
Fix podman stats format test, add negative test.
Fix podman stats cli command, exit non-zero on invalid format string.
Add tests for podman stats interval flag.

Signed-off-by: Thomas Weber <[email protected]>
@towe75
Copy link
Contributor Author

towe75 commented Aug 4, 2021

@Luap99 i think it's ok now?

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
I just want to point out that is is changing the default interval from 1 to 5 seconds. I am fine with that. Let's hear what other Maintainers think. @containers/podman-maintainers PTAL

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 4, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Luap99, mheon, towe75

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rhatdan
Copy link
Member

rhatdan commented Aug 4, 2021

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 4, 2021
@openshift-ci openshift-ci bot merged commit 3a922cb into containers:main Aug 4, 2021
@towe75 towe75 deleted the f_stats branch August 4, 2021 13:04
@Luap99 Luap99 mentioned this pull request Aug 17, 2021
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 22, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants