Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove delete-temp-images job #709

Merged
merged 2 commits into from
Aug 22, 2024

Conversation

jameslamb
Copy link
Member

@jameslamb jameslamb commented Aug 20, 2024

Follow-up to #708.

Proposes completely removing the delete-temp-images job, in favor of relying on the scheduled nightly cleanup at https://github.com/rapidsai/workflows/blob/main/.github/workflows/cleanup_staging.yaml.

Notes for Reviewers

Details

CI here writes images to the rapidsai/staging repo on DockerHub, then later copies them to individual user-facing repos.
To avoid those temporary CI artifacts piling up in the rapidsai/staging repo, pull requests and branch builds run a job called delete-temp-images which does what it sounds like.

In exchange for more aggressive cleaning, this job introduces significant complexity for development here. Most notably, we've observed several instances where that job deletes images before all CI jobs needing them have completed successfully, leading to all of CI needing to be re-run.

Significant effort has been put into trying to avoid that, and we've found it's been difficult to get it right:

some attempts:

a recent example:

Ok so how will we clean up?

The workflow at https://github.com/rapidsai/workflows/blob/main/.github/workflows/cleanup_staging.yaml.

It runs once a day and deletes anything from rapidsai/staging that's more than 30 days old.

Benefits of these changes

As described in #708 (comment) ...

CI here will work as it does in other RAPIDS repos.... if any jobs fail for retryable reasons (like network issues), you can safely click "re-run failed jobs" and make incremental progress towards all builds passing.

Also reduces the need to maintain code that has to keep up with the DockerHub API in two places (by deleting ci/delete-temp-images.sh here).

@jameslamb jameslamb added containers 2 - In Progress Currenty a work in progress testing labels Aug 20, 2024
@jameslamb jameslamb marked this pull request as ready for review August 20, 2024 22:19
@jameslamb jameslamb requested a review from a team as a code owner August 20, 2024 22:19
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. This is what I hoped #708 would help with.

@jameslamb jameslamb added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currenty a work in progress labels Aug 20, 2024
@jameslamb
Copy link
Member Author

cuspatial notebook failures 😬

Error during notebook tests!
Errors during cuspatial/Taxi_Dropoff_Reverse_Geocoding.ipynb
Errors during cuspatial/trajectory_clustering.ipynb

(build link)

Those are both tested in cuspatial CI (the are not excluded in this list), so that means cuspatial CI will be blocked. Will open an issue there and link it here.

@bdice
Copy link
Contributor

bdice commented Aug 21, 2024

@jameslamb I am checking cuSpatial's CI with this PR: rapidsai/cuspatial#1441 (comment)

@jameslamb
Copy link
Member Author

Great, thanks for pushing this forward @bdice . Looks like CI there did fail on these same notebooks (build link), and that rapidsai/cuspatial#1442 is up to hopefully fix it.

@mroeschke
Copy link

rapidsai/cuspatial#1442 has been merged so hopefully a rerun of this PR should pass the CI now

@jameslamb
Copy link
Member Author

Thanks as always @mroeschke !

I just restarted all CI jobs here (assuming we need to do rebuilds to pull in the newest cuspatial).

@bdice
Copy link
Contributor

bdice commented Aug 21, 2024

@jameslamb There was a solver error so I dropped Python 3.9 to make CI pass. See b6e37e3.

@jameslamb
Copy link
Member Author

Ok sure, works for me. Thank you!

Copy link
Member

@jakirkham jakirkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks James! 🙏

Also thanks everyone who reviewed and pushed fixes here and elsewhere! 🙏

@jameslamb jameslamb removed the 3 - Ready for Review Ready for review by team label Aug 22, 2024
@jameslamb
Copy link
Member Author

Given the set of approvals and passing jobs, I think this is ready to merge. Thanks yall.

@jameslamb
Copy link
Member Author

/merge

@rapids-bot rapids-bot bot merged commit b007a92 into rapidsai:branch-24.10 Aug 22, 2024
32 checks passed
@jameslamb jameslamb deleted the stop-deleting-temp-images branch August 22, 2024 16:12
@jakirkham
Copy link
Member

Thanks James and everyone for reviewing! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants