-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds swap space for ALI runners and report to ci and metrics if it is used #6058
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Skipped Deployment
|
@@ -138,6 +138,11 @@ fi | |||
|
|||
${post_install} | |||
|
|||
sudo fallocate -l 3G /swapfile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have a swap file that scales with the amount of memory on the runner, i.e. https://github.com/pytorch/pytorch/pull/142293/files#diff-b317d4da565a9e329ccf67e669c2ff1f4d4bc5fb0ffa4d74132545ad66f84339R229?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I'll test this out and land this next Monday to avoid deploying this during the weekend
Fix a potential bug from #6058, here is what I see in the CI when it happens https://github.com/pytorch/pytorch/actions/runs/12360769784/job/34497379880?pr=143316#step:5:165
A swapfile on Linux runner has been prepared by pytorch/test-infra#6058. So this PR does 2 things: * Start using the swapfile on all Linux build and test jobs * Testing the rollout https://github.com/pytorch-labs/pytorch-gha-infra/pull/582 ### Testing Run `swapon` inside the container and the swapfile shows up correctly: ``` jenkins@259dfb0a314c:~/workspace$ swapon NAME TYPE SIZE USED PRIO /swapfile file 3G 256K -2 ``` Pull Request resolved: #143316 Approved by: https://github.com/ZainRizvi, https://github.com/atalman
A swapfile on Linux runner has been prepared by pytorch/test-infra#6058. So this PR does 2 things: * Start using the swapfile on all Linux build and test jobs * Testing the rollout https://github.com/pytorch-labs/pytorch-gha-infra/pull/582 ### Testing Run `swapon` inside the container and the swapfile shows up correctly: ``` jenkins@259dfb0a314c:~/workspace$ swapon NAME TYPE SIZE USED PRIO /swapfile file 3G 256K -2 ``` Pull Request resolved: pytorch#143316 Approved by: https://github.com/ZainRizvi, https://github.com/atalman
Adds a swap space for the autoscaled runners.
Prints on post_job step if the swap usage was detected during the job running, and sends metrics related to swap usage per job.