-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix ever-accumulating memory in logger #5284
Conversation
/cherry-pick release-1.16 |
@nojnhuh: once the present PR merges, I will cherry-pick it on top of In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Thanks to @andreipantelimon and @dkoshkin for all the detail provided in #5245 and #2639, that was immensely helpful! |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5284 +/- ##
==========================================
- Coverage 53.00% 52.53% -0.47%
==========================================
Files 272 272
Lines 29429 29433 +4
==========================================
- Hits 15598 15463 -135
- Misses 13027 13167 +140
+ Partials 804 803 -1 ☔ View full report in Codecov by Sentry. 🚨 Try these New Features:
|
func (s *spanLogSink) WithValues(keysAndValues ...interface{}) logr.LogSink { | ||
s.vals = append(s.vals, keysAndValues...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue with this is that s
is being mutated through the pointer receiver, so this slice will grow forever and can't be garbage collected. The method receiver here is changed to a value, which fixes the memory issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how the default klog
implementation is set up: https://github.com/kubernetes/klog/blob/75663bb798999a49e3e4c0f2375ed5cca8164194/klogr.go#L94
// always create a new slice to avoid multiple loggers writing to the same backing array | ||
vals := make([]interface{}, len(s.vals)+len(keysAndValues)) | ||
copy(vals, s.vals) | ||
copy(vals[len(s.vals):], keysAndValues) | ||
s.vals = vals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beyond just the memory issue, I noticed a separate issue where a plain append
could be clobbering the values in other sibling loggers when the backing array happens to have extra capacity. Replacing all of this with s.vals = append(s.vals, keysAndValues...)
and running the new unit test should show that issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
LGTM label has been added. Git tree hash: b9d9a1484b33849f7c9ffce92f878b4365821cf4
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: willie-yao The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@nojnhuh: new pull request created: #5285 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@nojnhuh: new pull request created: #5286 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Hey, thank you for the fix. It will greatly improve our system! |
We are planning to release a new minor version along with patch releases this week! |
@willie-yao: new pull request could not be created: failed to create pull request against kubernetes-sigs/cluster-api-provider-azure#release-1.16 from head k8s-infra-cherrypick-robot:cherry-pick-5284-to-release-1.16: status code 422 not one of [201], body: {"message":"Validation Failed","errors":[{"resource":"PullRequest","code":"custom","message":"No commits between kubernetes-sigs:release-1.16 and k8s-infra-cherrypick-robot:cherry-pick-5284-to-release-1.16"}],"documentation_url":"https://docs.github.com/rest/pulls/pulls#create-a-pull-request","status":"422"} In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/cherry-pick release-1.17 |
@willie-yao: new pull request could not be created: failed to create pull request against kubernetes-sigs/cluster-api-provider-azure#release-1.17 from head k8s-infra-cherrypick-robot:cherry-pick-5284-to-release-1.17: status code 422 not one of [201], body: {"message":"Validation Failed","errors":[{"resource":"PullRequest","code":"custom","message":"No commits between kubernetes-sigs:release-1.17 and k8s-infra-cherrypick-robot:cherry-pick-5284-to-release-1.17"}],"documentation_url":"https://docs.github.com/rest/pulls/pulls#create-a-pull-request","status":"422"} In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
ah it has already been cherry-picked, my b! |
What type of PR is this?
/kind bug
What this PR does / why we need it:
This PR fixes an issue where the long-lived loggers created in the
SetupWithManager
methods for each controller were accumulating a constantly-growing set of key/value pairs that could never be garbage collected.Here is a Prometheus graph showing the fix enabled halfway through:
![memleak](https://private-user-images.githubusercontent.com/16093815/387325501-4a310fc1-ec92-474a-bc41-1e54811749aa.jpeg?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkxNzcxOTEsIm5iZiI6MTczOTE3Njg5MSwicGF0aCI6Ii8xNjA5MzgxNS8zODczMjU1MDEtNGEzMTBmYzEtZWM5Mi00NzRhLWJjNDEtMWU1NDgxMTc0OWFhLmpwZWc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjEwJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIxMFQwODQxMzFaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT04YTUxYzBkZTdjYmM1MjQ1MDU2ODA0Y2ZiYjgzNWIyY2MwM2I0ZmJhNjE5NzJkZmFlOWNmOTAyY2M5MDkxYjhlJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.3WJ6z_5aWolrbKZFhpSAg79vgI4zmrNszQzpfphF17k)
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #5245
Special notes for your reviewer:
Reproducing this is actually fairly straightforward. I was testing this by creating about 10
spec.paused: true
Clusters from themachinepool
flavor in Tilt, then running this command in 3 or 4 terminal windows to spam updates to the CAPZ resources to generate logs:Pretty much immediately in Prometheus you'll see memory start to climb.
TODOs:
Release note: