Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add variable for disk size and increase default #782

Merged
merged 2 commits into from
Aug 25, 2023

Conversation

ananth102
Copy link
Contributor

@ananth102 ananth102 commented Aug 23, 2023

Which issue is resolved by this Pull Request:

DLC gpu images take up too much disk space, which leads to disk pressure errors.

Second revision:

Adding a variable to configure disk size.

Description of your changes:
This PR increases the disk size from 50gb (default) to 75gb for gpu nodes.

Testing:

  • Unit tests pass
  • e2e tests pass
  • Details about new tests (If this PR adds a new feature)
  • Details about any manual tests performed
  • Manually tested with public.ecr.aws/kubeflow-on-aws/notebook-servers/jupyter-pytorch:2.0.0-gpu-py310-cu118-ubuntu20.04-ec2-v1.0, no disk pressure errors.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@ananth102 ananth102 changed the title Increasing disk size for gpu nodes Increase disk size for gpu nodes Aug 23, 2023
@ryansteakley
Copy link
Contributor

Have we verified that this change, resolved the issue you encountered with disk-space?

@ananth102
Copy link
Contributor Author

Have we verified that this change, resolved the issue you encountered with disk-space?

Yea

@@ -40,6 +40,7 @@ locals {
desired_size = 3
max_size = 5
ami_type = "AL2_x86_64_GPU"
disk_size = 75
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make this a configurable var instead?

@ananth102 ananth102 changed the title Increase disk size for gpu nodes Add variable for disk size and increase default Aug 25, 2023
@ananth102 ananth102 merged commit 12b51ce into awslabs:main Aug 25, 2023
rakuto pushed a commit to tne-ai/kubeflow-manifests that referenced this pull request Sep 27, 2023
**Which issue is resolved by this Pull Request:**

DLC gpu images take up too much disk space, which leads to disk pressure
errors.

Second revision:

Adding a variable to configure disk size.

**Description of your changes:**
This PR increases the disk size from 50gb (default) to 75gb for gpu
nodes.

**Testing:**
- [ ] Unit tests pass
- [x] e2e tests pass
- Details about new tests (If this PR adds a new feature)
- Details about any manual tests performed
- Manually tested with
public.ecr.aws/kubeflow-on-aws/notebook-servers/jupyter-pytorch:2.0.0-gpu-py310-cu118-ubuntu20.04-ec2-v1.0,
no disk pressure errors.

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants