Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bring in newer cryptnono version #3569

Merged
merged 2 commits into from
Jan 4, 2024
Merged

Conversation

yuvipanda
Copy link
Member

I've been upgrading cryptnono quite a bit over the last few months, bringing in new detectors that have been quite effective on mybinder.org. We automatically bump cryptnono on our clusters (#3482), but recent progress have included some breaking changes to the helm chart config.

This PR just brings in the new config changes, but does not change behavior in any real way. No new detectors are enabled.

I've re-measured resource usage for the individual daemonset container (rather than the initContainer) as that can now be set separately. This probably requires us to redo some of the resource allocation generated profiles, which I'll do once this is merged. However, it is an overall reduction in daemonset requests, so deploying this shouldn't result in any profile being undeployable.

Merging this should allow #3482 to move forward as well.

I've been upgrading cryptnono quite a bit over the last few months,
bringing in new detectors that have been quite effective on mybinder.org.
We automatically bump cryptnono on our clusters (2i2c-org#3482),
but recent progress have included some breaking changes to the helm
chart config.

This PR just brings in the new config changes, but does not change
behavior in any real way. No new detectors are enabled.

I've re-measured resource usage for the individual daemonset container
(rather than the initContainer) as that can now be set separately.
This probably requires us to redo some of the resource allocation
generated profiles, which I'll do once this is merged. However,
it is an overall reduction in daemonset requests, so deploying
this shouldn't result in any profile being undeployable.

Merging this should allow 2i2c-org#3482
to move forward as well.
@yuvipanda yuvipanda requested a review from a team as a code owner January 4, 2024 01:57
Copy link

github-actions bot commented Jan 4, 2024

Merging this PR will trigger the following deployment actions.

Support and Staging deployments

Cloud Provider Cluster Name Upgrade Support? Reason for Support Redeploy Upgrade Staging? Reason for Staging Redeploy
aws gridsst Yes Support helm chart has been modified No
gcp linked-earth Yes Support helm chart has been modified No
aws nasa-esdis Yes Support helm chart has been modified No
gcp hhmi Yes Support helm chart has been modified No
gcp pangeo-hubs Yes Support helm chart has been modified No
aws jupyter-meets-the-earth Yes Support helm chart has been modified No
aws nasa-ghg Yes Support helm chart has been modified No
gcp 2i2c-uk Yes Support helm chart has been modified No
gcp awi-ciroh Yes Support helm chart has been modified No
gcp meom-ige Yes Support helm chart has been modified No
kubeconfig utoronto Yes Support helm chart has been modified No
aws smithsonian Yes Support helm chart has been modified No
gcp leap Yes Support helm chart has been modified No
gcp qcl Yes Support helm chart has been modified No
aws openscapes Yes Support helm chart has been modified No
aws nasa-veda Yes Support helm chart has been modified No
gcp catalystproject-latam Yes Support helm chart has been modified No
aws ubc-eoas Yes Support helm chart has been modified No
aws catalystproject-africa Yes Support helm chart has been modified No
aws nasa-cryo Yes Support helm chart has been modified No
gcp 2i2c Yes Support helm chart has been modified No
aws victor Yes Support helm chart has been modified No
gcp cloudbank Yes Support helm chart has been modified Yes Following prod hubs require redeploy: csulb
gcp callysto Yes Support helm chart has been modified No
aws 2i2c-aws-us Yes Support helm chart has been modified No

Production deployments

Cloud Provider Cluster Name Hub Name Reason for Redeploy
gcp cloudbank csulb Following helm chart values files were modified: csulb.values.yaml

cpu: 0.005
requests:
memory: 64Mi
cpu: 0.0001
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the lowest resolution is 1/1024 of a CPU, and I recall that 2m was the practical minimum value for dockerd and containerd. I'm not sure if there are any enforcement to prevent specification of 0.1m or extremely low values, but I'd be inclined to not optimize this extreme and go for a value of at least 1m to avoid issues.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, @consideRatio. Based on the 'note' in https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-cpu, I switched to using m units and specified 1m - 0.0001 is definitely not enforceable.

Copy link
Contributor

@consideRatio consideRatio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a better-safe-than-sorry comment on a detail, but this lgtm!

@yuvipanda yuvipanda merged commit 9a16b57 into 2i2c-org:master Jan 4, 2024
32 checks passed
Copy link

github-actions bot commented Jan 4, 2024

🎉🎉🎉🎉

Monitor the deployment of the hubs here 👉 https://github.com/2i2c-org/infrastructure/actions/runs/7411484364

yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this pull request Jan 4, 2024
2i2c-org#3569 changed
the cryptnono daemonset to have different resource requests
for the init containers as well as the container. While working
on 2i2c-org#3566, I noticed
this was generating wrong choices - the overhead was calculated
wrong (too small).

We were intentionally ignoring init containers while calculating
overhead, and turns out the scheduler and the autoscaler both
do take it into consideration. The effective resource request
for a pod is the higher of the resource requests for the containers
*or* the init containers - this ensures that a pod with higher
requests for init containers than containers (like our cryptnono pod!)
will actually run. This is documented at
https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#resource-sharing-within-containers,
and implemented in Kubernetes itself at
https://github.com/kubernetes/kubernetes/blob/9bd0ef5f173de3cc2d1d629a4aee499d53690aee/pkg/api/v1/resource/helpers.go#L50
(this is the library code that the cluster autoscaler uses).

This PR updates the two places we currently have that calculate
effective resource requests (I assume eventually these will be
merged into one - I haven't kept up with the team's work last
quarter here).

I've updated the node-capacity-info.json file, which is what seems
to be used by the generator script right now.
yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this pull request Jan 4, 2024
2i2c-org#3569 changed
the cryptnono daemonset to have different resource requests
for the init containers as well as the container. While working
on 2i2c-org#3566, I noticed
this was generating wrong choices - the overhead was calculated
wrong (too small).

We were intentionally ignoring init containers while calculating
overhead, and turns out the scheduler and the autoscaler both
do take it into consideration. The effective resource request
for a pod is the higher of the resource requests for the containers
*or* the init containers - this ensures that a pod with higher
requests for init containers than containers (like our cryptnono pod!)
will actually run. This is documented at
https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#resource-sharing-within-containers,
and implemented in Kubernetes itself at
https://github.com/kubernetes/kubernetes/blob/9bd0ef5f173de3cc2d1d629a4aee499d53690aee/pkg/api/v1/resource/helpers.go#L50
(this is the library code that the cluster autoscaler uses).

This PR updates the two places we currently have that calculate
effective resource requests (I assume eventually these will be
merged into one - I haven't kept up with the team's work last
quarter here).

I've updated the node-capacity-info.json file, which is what seems
to be used by the generator script right now.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Done 🎉
Development

Successfully merging this pull request may close these issues.

2 participants