-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bring in newer cryptnono version #3569
Conversation
I've been upgrading cryptnono quite a bit over the last few months, bringing in new detectors that have been quite effective on mybinder.org. We automatically bump cryptnono on our clusters (2i2c-org#3482), but recent progress have included some breaking changes to the helm chart config. This PR just brings in the new config changes, but does not change behavior in any real way. No new detectors are enabled. I've re-measured resource usage for the individual daemonset container (rather than the initContainer) as that can now be set separately. This probably requires us to redo some of the resource allocation generated profiles, which I'll do once this is merged. However, it is an overall reduction in daemonset requests, so deploying this shouldn't result in any profile being undeployable. Merging this should allow 2i2c-org#3482 to move forward as well.
Merging this PR will trigger the following deployment actions. Support and Staging deployments
Production deployments
|
helm-charts/support/values.yaml
Outdated
cpu: 0.005 | ||
requests: | ||
memory: 64Mi | ||
cpu: 0.0001 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the lowest resolution is 1/1024 of a CPU, and I recall that 2m was the practical minimum value for dockerd and containerd. I'm not sure if there are any enforcement to prevent specification of 0.1m or extremely low values, but I'd be inclined to not optimize this extreme and go for a value of at least 1m to avoid issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, @consideRatio. Based on the 'note' in https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-cpu, I switched to using m
units and specified 1m - 0.0001 is definitely not enforceable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a better-safe-than-sorry comment on a detail, but this lgtm!
🎉🎉🎉🎉 Monitor the deployment of the hubs here 👉 https://github.com/2i2c-org/infrastructure/actions/runs/7411484364 |
2i2c-org#3569 changed the cryptnono daemonset to have different resource requests for the init containers as well as the container. While working on 2i2c-org#3566, I noticed this was generating wrong choices - the overhead was calculated wrong (too small). We were intentionally ignoring init containers while calculating overhead, and turns out the scheduler and the autoscaler both do take it into consideration. The effective resource request for a pod is the higher of the resource requests for the containers *or* the init containers - this ensures that a pod with higher requests for init containers than containers (like our cryptnono pod!) will actually run. This is documented at https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#resource-sharing-within-containers, and implemented in Kubernetes itself at https://github.com/kubernetes/kubernetes/blob/9bd0ef5f173de3cc2d1d629a4aee499d53690aee/pkg/api/v1/resource/helpers.go#L50 (this is the library code that the cluster autoscaler uses). This PR updates the two places we currently have that calculate effective resource requests (I assume eventually these will be merged into one - I haven't kept up with the team's work last quarter here). I've updated the node-capacity-info.json file, which is what seems to be used by the generator script right now.
2i2c-org#3569 changed the cryptnono daemonset to have different resource requests for the init containers as well as the container. While working on 2i2c-org#3566, I noticed this was generating wrong choices - the overhead was calculated wrong (too small). We were intentionally ignoring init containers while calculating overhead, and turns out the scheduler and the autoscaler both do take it into consideration. The effective resource request for a pod is the higher of the resource requests for the containers *or* the init containers - this ensures that a pod with higher requests for init containers than containers (like our cryptnono pod!) will actually run. This is documented at https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#resource-sharing-within-containers, and implemented in Kubernetes itself at https://github.com/kubernetes/kubernetes/blob/9bd0ef5f173de3cc2d1d629a4aee499d53690aee/pkg/api/v1/resource/helpers.go#L50 (this is the library code that the cluster autoscaler uses). This PR updates the two places we currently have that calculate effective resource requests (I assume eventually these will be merged into one - I haven't kept up with the team's work last quarter here). I've updated the node-capacity-info.json file, which is what seems to be used by the generator script right now.
I've been upgrading cryptnono quite a bit over the last few months, bringing in new detectors that have been quite effective on mybinder.org. We automatically bump cryptnono on our clusters (#3482), but recent progress have included some breaking changes to the helm chart config.
This PR just brings in the new config changes, but does not change behavior in any real way. No new detectors are enabled.
I've re-measured resource usage for the individual daemonset container (rather than the initContainer) as that can now be set separately. This probably requires us to redo some of the resource allocation generated profiles, which I'll do once this is merged. However, it is an overall reduction in daemonset requests, so deploying this shouldn't result in any profile being undeployable.
Merging this should allow #3482 to move forward as well.