-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce memory usage of cainjector by only caching the metadata of Secret resources #7161
Reduce memory usage of cainjector by only caching the metadata of Secret resources #7161
Conversation
04f1386
to
a376b7b
Compare
Signed-off-by: Richard Wall <[email protected]>
Signed-off-by: Richard Wall <[email protected]>
a376b7b
to
15084fd
Compare
Signed-off-by: Richard Wall <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
📖 Documentation for builder.OnlyMetadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This documentation brings back memories... I might have been the one who added these two emojis in the comments: kubernetes-sigs/controller-runtime#1747 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
📖 Documentation for client.CacheOptions.DisableFor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the signature of owningCertForSecret
so that it can be passed
- a
metav1.PartialObjectMetadata
when called from theWatch
map function, which is now operating on a metadata-only informer cache - a
corev1.Secret
when called from thecertificateSource.ReadCA
function, where the full Secret (including data) has been read direct from the API server.
Hey, well done with the benchmarks on this PR. Super happy to see more data-driven approaches like this one. And thank you for providing the detailed instructions for reproducing your experiments. 👏
Are there any downsides to disabling caching? When using Upbound's Crossplane, at least 900 CRDs are installed and are expected to be CA-injected by cert-manager's cainjector. Would disabling be less favorable in that case? I imagine that on startup all the secrets will need to be listed to check that they match the contents of the CA injected in the CRDs.
I think the improvement would be much more dramatic in a cluster with lots of large Helm release secrets hanging around. This is something you will often see in prod clusters.
Are you sure the I haven't reviewed the code yet, I'll come back to it later. |
@maelvls I added some example curl commands to the description showing the difference in response size when you send the PartialObjectMetadataList content negotiation header. |
Richard presented this PR to yesterday's dev biweekly, some notes:
|
// NOTE: "owning" here does not mean [ownerReference][1], because | ||
// cert-manager does not set the ownerReference of the Secret, | ||
// unless the [`--enable-certificate-owner-ref` flag is true][2]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification!!
// Only use Secrets that have been created by this Certificate. | ||
// The Secret must have a `cert-manager.io/certificate-name` annotation | ||
// value matching the name of this Certificate.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for adding these bits of information! Nit:
// Only use Secrets that have been created by this Certificate. | |
// The Secret must have a `cert-manager.io/certificate-name` annotation | |
// value matching the name of this Certificate.. | |
// Only use Secrets that have been created by this Certificate. | |
// The Secret must have a `cert-manager.io/certificate-name` annotation | |
// value matching the name of this Certificate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have reviewed the changes, the changes look straightforward. Thank you again for having spent the time writing additional comments. These are super useful, especially regarding "owners" vs. ownerReferences
.
/lgtm
/approve
/hold in case you want to fix the nit
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: maelvls The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Thanks @maelvls I'll fix that nit in another PR to save another round of e2e tests and review. /unhold |
Update cainjector to use metadata-only caching for Secret resources:
With 100M of Secrets in the cluster, this reduces the maximum memory usage from ~290M to ~46M:
Behind the scenes the reflector is asking the server for a
PartialObjectMetadataList
projection instead of the full dataSecretList
.You can see the reduction in the response size using curl as follows:
Background
Fixes: #7147, #3748
/kind feature
Testing
Measuring peak memory use at startup
I used
time
to measure the maximum resident set size of cainjector when it starts up with a cluster containing 100M of Secrets.I create a cluster, load it with 100M of Secrets and then run
time cainjector
on my laptop to measure its resource usage for a few seconds, as it starts up.Leader election is disabled, so that cainjector can immediately begin listing and watching resources.
Benchmarks
I ran
tlspk-bench
to create 1000 RSA 2048 Certificates to compare the memory usage of cert-manager from master and this branch.1000 RSA 2048 Secrets amounts to ~6MB of data so the difference in memory usage is not dramatic.
But it is visible in the graphs below.
master
branch