Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Garbage collector #29

Closed
wants to merge 5 commits into from
Closed

Conversation

IvanDev
Copy link
Contributor

@IvanDev IvanDev commented Apr 14, 2024

This PR introduces a garbage collector.
When we remove an image or tag from the repository, blobs referenced by deleted manifests are not removed from R2. With this new PR, we'll schedule garbage collecting after any modifying operations.
GC will wait 10 minutes after any modifications are done to the repository this ensures we'll not start garbage collecting of freshly uploaded blobs without parenting manifest.

We have 2 modes for the garbage collector, unreferenced and untagged:

  1. Unreferenced will delete all blobs that are not referenced by any manifest.
  2. Untagged will delete all blobs that are not referenced by any manifest and are not tagged.

Users can skip the GARBAGE_COLLECTOR_MODE variable which will disable GC.

Some considerations:

  • PR utilizes new RPC, so we have updated compatibility_date
  • At this moment Set is used for keeping track of blobs, we can run out of memory and should use the bloom filter instead, which means we'll have a new dependency.
  • GC doesn't remove dummy manifests that reference non-existent blobs.

@gabivlj
Copy link
Collaborator

gabivlj commented Apr 14, 2024

Thanks for the contribution! I agree we should have some kind of garbage collection to remove dangling layers.
However, I might have some concerns with having garbage collection done by a DO alarm or asynchronously.

AFAIU, there is a small chance that there is a new manifest uploaded while the garbage collection is happening and new layers might be removed. I think this is just something we will have to accept for this in some way to keep the registry implementation simple.
If we go with this route, users of the registry that use garbage collection really need to know that it comes with some risks.

I am going to propose some things and let me know what you think (just throwing some stuff here for discussion):

  • I have mixed feelings about these passes being scheduled after a tag removal. Another simpler alternative is exposing a garbage collection endpoint and let the admin just trigger garbage collection when they deem safe to do so.
  • If we use DO: We could take a lock from the GC DO when we try to upload a manifest, but makes the implementation have more complexity.
  • If we use DO: when we remove a manifest couldn't we send over to the DO the layers we should look for so GC is a bit cheaper to do? No need to list all blobs through R2. If we are using DO we could also use some state.

IMO I'd go for the GC endpoint and just let anybody that needs GC to explicitly call it.

If we are feeling fancy with DOs we could do these passes in there, but still with workers unbounded plans + ctx.waitUntil() we can extend computation time already by a bit.

Hope you find this overall feedback fair! I am open to have GC just wanting to see your point of view on these 😄.

@IvanDev
Copy link
Contributor Author

IvanDev commented Apr 15, 2024

Here are my 2 bits:

  1. I totally agree with having a manual GC request being the safest option. In the recent changes, I added /:name+/collectgarbage route with optional mode=unreferenced|untagged parameter so users can trigger GC whenever they feel safe.
  2. I also added a date check for the uploaded date field on R2 objects, so now when we scan through the blobs and if there are any changes within the last 10 minutes we'll abort GC scan to prevent damage to freshly uploaded layers. This should make the damage by GC even more unlikely. In addition, GC is now scheduled after 30 minutes this should cover for any network delays.
  3. Introducing a lock will also introduce complexity and the potential for a deadlock. So unless we really want it, I guess we can skip it and use the current implementation since it should cover most of the cases.
  4. Unfortunately, we can't minimize the request count to manifests since layers potentially can be referenced from multiple manifests (e.g. with tag/digest manifests). Counting the strong consistency of R2, relatively rare writes to the registry, and cost savings on storage by using even current primitive GC I think it should be good enough for now. As you mentioned, having DO state would make scanning almost x4.5 cheaper, but ROI on investing that much time into all of this complexity probably would be negligible, at least on most use cases with relatively rare writes (and users can even schedule GC manually now).

I agree with you on your concerns and I'm too a bit afraid of over-engineering :) I would prefer to keep the implementation simple and introduce complexity only when we really need it.

@gabivlj
Copy link
Collaborator

gabivlj commented Apr 15, 2024

I like the date change. I'd suggest we remove the asynchronous garbage collection if we have the endpoint. And in that case I do not think DO is necessary (trying to stay within R2 as much as possible here).

@IvanDev
Copy link
Contributor Author

IvanDev commented Apr 15, 2024

Sure, let's keep it safe. Had to remove date checks since it's useless now. Let's keep users responsible for their own actions :)

@gabivlj
Copy link
Collaborator

gabivlj commented Apr 19, 2024

I think you will have to run the formatter on this PR and the other https://github.com/cloudflare/serverless-registry/pull/30/files

@IvanDev
Copy link
Contributor Author

IvanDev commented Apr 20, 2024

Sure!

@martinwang2002
Copy link

upvote on this

@gabivlj
Copy link
Collaborator

gabivlj commented Sep 23, 2024

Hello folks! Due to garbage collection popular demand I have cherry picked @IvanDev's commit for rebase and add a bit of concurrency safety that I thought could be nice. Thank you for the feedback.
#48

@gabivlj gabivlj mentioned this pull request Sep 23, 2024
@gabivlj
Copy link
Collaborator

gabivlj commented Sep 23, 2024

Cherry picked commit from this PR has been merged here #48. Thank you @IvanDev for the implementation and the discussion!

@gabivlj gabivlj closed this Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants