-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possbile memory leak in API server #8383
Comments
Hi, same here with kubernetes v1.28.4 installed with kubeadm and calico 3.27.0 installed with helm tigera-operator all default settings. calico-apiserver calico-apiserver-6b4865847c-l792r 2m 5292Mi |
Same here for one of my clients running Kubernetes 1.28.4 clusters with Calico 3.27.0 with Containerd 1.6.26 on Ubuntu 22.04.3. Those clusters, too, were updated just before Christmas, and now nodes need restarting every few days. Calico was installed and upgraded via Helm. The issue did not occur with Kubernetes 1.27.3 clusters with Calico 3.26.1 with Containerd 1.6.21 (Kubernetes upgraded via 1.27.8 at the same time as the Calico and Containerd upgrades). |
We believe this issue is related to the Unfortunately, even with the Noop TracerProvider, we notice growing memory usage by the opentelemetry internal int64/float64 histograms. As an extension apiserver, we don't config external etcd servers so the health check is disabled in #8394 (with some other unsued profiling and metrics). We have been running this fix over the weekend and it seems to stabilize the memory usage. |
Thanks for the fix! Has the date for releasing this been finalised? Our nodes are running out of memory. We have limited the resources for Calico, but would be nice to get this fix deployed soon. |
Same here. |
I have downgraded back to 3.26 without any problem and it resolved the memory issue. I have operator installation |
Thanks for the fast reply! I will give it a look if there is no fix release expected soon. |
The fix for this is now released in v3.27.2 (v3.27.1 was never released publicly due to a build issue). Please open a new issue if you can still hit it on v3.27.2 |
Over Christmas and the new year, we saw a high amount of memory pressure in our Kubernetes Cluster. After some research, we figured out
calico-apiserver
was eating away most of the memory on the affected nodes. Based on our memory metrics, we believe there is an memory leak incalico-apiserver
.We upgraded to Calico 3.27.0
Expected Behavior
We upgraded to Calico 3.27.0 just before Christmas. Before the upgrade we were using Calico 3.25.x and we were not seeing these kind of RAM usages over time. We expect Calico 3.27.0 to behave in the same fashion.
Current Behavior
As is visible from the metrics shared above, all
calico-apiserver
replica's across multiple clusters will start eating more and more RAM over time. We see a somewhat similar usage for CPU resources over time, but we suspect that is because the Go garbage collector has to perform more work over time.Possible Solution
We have not figured out the cause of the memory leak as of yet.
We mitigated the problem by setting up more strict memory limits for
calico-apiserver
in the first place. That causescalico-apiserver
to be periodically OOMKilled but prevents our nodes from running out of memory.Steps to Reproduce (for bugs)
calico-apiserver
Context
Our Calico API server is deployed through the Tigera Operator. I do not think that is relevant for the problem but I wanted to mention this anyway.
Your Environment
The text was updated successfully, but these errors were encountered: