Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Added support for log policy and log rate limit in conmon #4663

Closed
wants to merge 2 commits into from

Conversation

syedriko
Copy link

@syedriko syedriko commented Dec 9, 2019

Another PR to address containers/conmon#84
It's sister PR is containers/conmon#92

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 9, 2019
@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: syedriko
To complete the pull request process, please assign umohnani8
You can assign the PR to them by writing /assign @umohnani8 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Dec 9, 2019
@openshift-ci-robot
Copy link
Collaborator

Hi @syedriko. Thanks for your PR.

I'm waiting for a containers member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mheon
Copy link
Member

mheon commented Dec 9, 2019

/ok-to-test

I'll do a full review once you drop the WIP, but on the whole, this looks fine.

@openshift-ci-robot openshift-ci-robot added ok-to-test and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 9, 2019
return define.ErrCtrFinalized
}
if rateLimit == "" {
return errors.Wrapf(define.ErrInvalidArg, "log rate limit must be set")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the rate limit specified if the policy is ignore or passthrough?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A rate limit sure makes no sense for ignore or passthrough (which is the current unrestricted behavior, btw). How do you think should the CLI behave here, error out in that case, ignore, warn?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So with no changes by the user, they will get passthrough. So that case should work today, which I think this code as it stands will raise an error.

If they specify ignore or passthrough, then they should not specify a rate, and it should be an error if they do.

If they specify backpressure or drop, then they must specify the rate.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, makes perfect sense.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're writing to files and the file writes block until the file is written, so the current default is "backpressure" with no rate-limit. I think you could simplify validation so that rate-limit is optional regardless of policy. The "ignore" policy trivially respects any rate-limit (even 0) There is no such thing as "passthrough" - the existing behaviour is backpressure with no rate-limit.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantics starts to get interesting if we consider where the backpressure comes from, from the code we're looking at, where the rate would sure apply, or downstream from this code. In that frame of reference, how would we implement drop without a rate limit, by making the log fd non-blocking, etc?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A non-blocking partial write followed by a rate calculation I'd say.

The whole thing would be simpler as a poll loop with non-blocking reads/writes and a timer fd.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without a rate limit there's no basis for calculation though. A simple implementation can attempt to write to the non-blocking log fd and drop on seeing EAGAIN. A more involved one could poll() the fd and flip a "drop" flag.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For backpressure there's nothing to calculate, you just block on write and delay reading as a consequence - the downstream determines how fast you go. For drop what you describe is right, but I would add a small buffer to avoid dropping due to minor jitters in read/write rate. Then drop on read with buffer-full.

Copy link

@portante portante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to read from some sort of "sysconfig" file for defaults, and to allow the SRE to specify the logging rate which the user can't exceed?

@syedriko
Copy link
Author

Do we need to read from some sort of "sysconfig" file for defaults, and to allow the SRE to specify the logging rate which the user can't exceed?

I have to defer to someone who's familiar with the UX conventions in this area, which are likely to be different for podman and cri-o...

@mheon
Copy link
Member

mheon commented Dec 13, 2019

The eventual containers.conf work will unify most configuration for our container tools, but that's not ready yet.

@rhatdan
Copy link
Member

rhatdan commented Dec 13, 2019

Well containers.conf is getting closer, and I would love to have a PR from the logging guys with the suggested fields, that they would like to see in the config file.

@syedriko
Copy link
Author

Well containers.conf is getting closer, and I would love to have a PR from the logging guys with the suggested fields, that they would like to see in the config file.
@rhatdan #4569 - is this the container.conf work?

@rhatdan
Copy link
Member

rhatdan commented Dec 13, 2019

@syedriko It used to be. Replaced by #4698

@syedriko
Copy link
Author

@syedriko It used to be. Replaced by #4698

@rhatdan Our logging modifications aren't quite ready yet, let us circle back to this when they are.

@rh-atomic-bot
Copy link
Collaborator

☔ The latest upstream changes (presumably #4805) made this pull request unmergeable. Please resolve the merge conflicts.

@openshift-ci-robot
Copy link
Collaborator

@syedriko: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 10, 2020
@rhatdan
Copy link
Member

rhatdan commented Jan 13, 2020

@syedriko now that we are into the new year, can you continue working on this?

@syedriko
Copy link
Author

@syedriko now that we are into the new year, can you continue working on this?

@rhatdan Short term, I got pulled back into Knative. I will come back to this as soon as wrap up a few items there.

@github-actions
Copy link

A friendly reminder that this PR had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Feb 13, 2020

@syedriko @haircommander I would love to get this one started again.

@syedriko
Copy link
Author

@syedriko @haircommander I would love to get this one started again.

@rhatdan I'm getting back to this in the next sprint, just need to wrap up a few things first.

@vrothberg
Copy link
Member

Friendly ping.

@syedriko
Copy link
Author

syedriko commented May 4, 2020

@vrothberg Thanks for this ping, I'll close this PR. The work on making log collection more reliable is ongoing, but this branch as it stands is unlikely to be directly usable.

@syedriko syedriko closed this May 4, 2020
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 25, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 25, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. ok-to-test stale-pr
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants