-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New component: cgroup aware go runtime extension #30289
Comments
cc @open-telemetry/helm-maintainers would love to know what you think (is this something we would want on the Helm chart? If so, anything specific about the design we should take into account?) |
Seems reasonable to me. In the helm chart an extension like this would allow us to remove some templates like https://github.com/open-telemetry/opentelemetry-helm-charts/blob/main/charts/opentelemetry-collector/templates/_helpers.tpl#L169. A solution like this also helps users running the collector outside of kubernetes. I'd like to see some example configs of how the extension would be configured. |
maybe something like:
|
Would |
I assume that the linux and go relevant interfaces should be stable for long, so I guess no. There may be additional sections. |
Setting gomaxprocs as env var does not reduce core usage as expected . |
It would be interesting to have this extension. Recently, Prometheus added automatic memory limit handling based on the same mechanism. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping |
I wont be able to work on this issue. @mx-psi do you have someone to assign? |
@tomershafir no, but I can remove the |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping |
@tomershafir @mx-psi I can help with the implementation of this extension. I have started a PoC that is working as expected, but it completely relies on the used packages: #35472
My main concern is regarding the testing, after taking a look at the packages, they seem not to provide an easy way to mock the |
Just to clarify: I am still willing to sponsor this component :) |
**Description:** <Describe what has changed.> <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> This PR adds the initial implementation of a new component to dynamically set the values of `GOMEMLIMIT` and `GOMAXPROCS` used by the Go runtime. Those values are normally manually aligned with the cgroup resource limit to prevent cpu throttling or out of memory scenarios. The component would ease the manual steps of configuring these environment variables in K8s deployments (e.g Helm [templates](https://github.com/open-telemetry/opentelemetry-helm-charts/blob/main/charts/opentelemetry-collector/templates/_helpers.tpl#L169)) in addition to have fine-grained values (e.g. 90% of the resource memory limits). **Link to tracking Issue:** <Issue number if applicable> #30289 **Testing:** <Describe what testing was performed and which tests were added.> Unit testing for the component has been added (config and extension start/stop). But ideally, an integration test that actually asserts the runtime modifications should be added as well. The extension relies on "github.com/KimMachineGun/automemlimit/memlimit" and "go.uber.org/automaxprocs/maxprocs" packages for the runtime modifications, but they don't provide a way to mock the "cgroups" file system which is the one they read to get the resource quota limits. - Automemlimit package tests expect to run in a cgroup environment: https://github.com/KimMachineGun/automemlimit/blob/main/memlimit/cgroups_test.go#L18 - Automaxprocs does not expose the cpu quota retrieval https://github.com/uber-go/automaxprocs/blob/master/maxprocs/maxprocs.go#L41 Any suggestion on how to perform this integration tests in the contrib repository? One possibility is to use the https://github.com/containerd/cgroups package to set the quota, but this requires privileged permissions (also in the GHA) **Documentation:** <Describe the documentation added.> --------- Co-authored-by: Pablo Baeyens <[email protected]>
**Description:** <Describe what has changed.> <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> This PR adds the initial implementation of a new component to dynamically set the values of `GOMEMLIMIT` and `GOMAXPROCS` used by the Go runtime. Those values are normally manually aligned with the cgroup resource limit to prevent cpu throttling or out of memory scenarios. The component would ease the manual steps of configuring these environment variables in K8s deployments (e.g Helm [templates](https://github.com/open-telemetry/opentelemetry-helm-charts/blob/main/charts/opentelemetry-collector/templates/_helpers.tpl#L169)) in addition to have fine-grained values (e.g. 90% of the resource memory limits). **Link to tracking Issue:** <Issue number if applicable> open-telemetry#30289 **Testing:** <Describe what testing was performed and which tests were added.> Unit testing for the component has been added (config and extension start/stop). But ideally, an integration test that actually asserts the runtime modifications should be added as well. The extension relies on "github.com/KimMachineGun/automemlimit/memlimit" and "go.uber.org/automaxprocs/maxprocs" packages for the runtime modifications, but they don't provide a way to mock the "cgroups" file system which is the one they read to get the resource quota limits. - Automemlimit package tests expect to run in a cgroup environment: https://github.com/KimMachineGun/automemlimit/blob/main/memlimit/cgroups_test.go#L18 - Automaxprocs does not expose the cpu quota retrieval https://github.com/uber-go/automaxprocs/blob/master/maxprocs/maxprocs.go#L41 Any suggestion on how to perform this integration tests in the contrib repository? One possibility is to use the https://github.com/containerd/cgroups package to set the quota, but this requires privileged permissions (also in the GHA) **Documentation:** <Describe the documentation added.> --------- Co-authored-by: Pablo Baeyens <[email protected]>
**Description:** <Describe what has changed.> <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> This PR adds the initial implementation of a new component to dynamically set the values of `GOMEMLIMIT` and `GOMAXPROCS` used by the Go runtime. Those values are normally manually aligned with the cgroup resource limit to prevent cpu throttling or out of memory scenarios. The component would ease the manual steps of configuring these environment variables in K8s deployments (e.g Helm [templates](https://github.com/open-telemetry/opentelemetry-helm-charts/blob/main/charts/opentelemetry-collector/templates/_helpers.tpl#L169)) in addition to have fine-grained values (e.g. 90% of the resource memory limits). **Link to tracking Issue:** <Issue number if applicable> open-telemetry#30289 **Testing:** <Describe what testing was performed and which tests were added.> Unit testing for the component has been added (config and extension start/stop). But ideally, an integration test that actually asserts the runtime modifications should be added as well. The extension relies on "github.com/KimMachineGun/automemlimit/memlimit" and "go.uber.org/automaxprocs/maxprocs" packages for the runtime modifications, but they don't provide a way to mock the "cgroups" file system which is the one they read to get the resource quota limits. - Automemlimit package tests expect to run in a cgroup environment: https://github.com/KimMachineGun/automemlimit/blob/main/memlimit/cgroups_test.go#L18 - Automaxprocs does not expose the cpu quota retrieval https://github.com/uber-go/automaxprocs/blob/master/maxprocs/maxprocs.go#L41 Any suggestion on how to perform this integration tests in the contrib repository? One possibility is to use the https://github.com/containerd/cgroups package to set the quota, but this requires privileged permissions (also in the GHA) **Documentation:** <Describe the documentation added.> --------- Co-authored-by: Pablo Baeyens <[email protected]>
The purpose and use-cases of the new component
Set go runtime variables based on linux cgroupfs automatically, or let the user set a cgroup relative value. For example, set GOMAXPROCS and GOMEMLIMIT by importing https://github.com/uber-go/automaxprocs and https://github.com/KimMachineGun/automemlimit.
Example configuration for the component
Proxy config for https://github.com/uber-go/automaxprocs and https://github.com/KimMachineGun/automemlimit
Telemetry data types supported
It is data type independent
Is this a vendor-specific component?
Code Owner(s)
No response
Sponsor (optional)
@mx-psi
Additional context
I think about a single extension that should theoretically cover the entire go runtime, starting with the 2 variables mentioned above.
Core issue ref: open-telemetry/opentelemetry-collector#9203
The text was updated successfully, but these errors were encountered: