Improve garbage collection strategy #602

hiddeco · 2022-03-04T09:54:17Z

At present, the garbage collection of an Artifact happens on the next reconciliation run and removes all but current Artifact. As reported in fluxcd/flux2#2464, this can lead to problems when multiple commits are made in a row. Because there is a chance a controller reacting to an update is not able to download the advertised Artifact in time.

In addition, there has been chatter about complying with audits, which may require a (number of) Artifact(s) to be present for a longer period of time to provide a track record.

To improve and/or address this, there are a couple of options we can pursue.

Option 1: taking factor time into account

This would be the simplest option to solve primarily the first mentioned issue. Instead of deleting all but one Artifact, we could delete all that are older than x compared to current Artifact. This would provide the controllers consuming Source objects with a certain time frame to complete their started operations.

Option 2: allow a controller global retention strategy

This would add flags to the controller config to e.g. keep a specific number of Artifacts around. Allowing cluster administrators to provide a configuration which addresses both the first and second issue.

Option 3: allow an object retention strategy

This would allow specifying a retention strategy on an object, giving the user fine-grain control over the garbage collection process. Allowing users to provide a configuration which addresses both the first and second issue.

Option 4: allow an object retention strategy with controller global fallback

This is a combination of 2 and 3, allowing more fine-grain control in total.

The text was updated successfully, but these errors were encountered:

pjbgf · 2022-03-04T11:53:26Z

I think we should handle the two concerns separately, which makes option 1 more appealing to me.
With that in mind, the TTL could potentially start as an arbitrary value, as those archives are solely used internally by flux and not recommended for user consumption.

On the audit/compliance perspective, I would argue that source controller is not the source of truth in itself for such data, and therefore an audit log feature covering externally consumed data would probably be a more resource-optimised approach towards compliance.

stefanprodan · 2022-03-04T12:27:15Z

I would introduce retention as global settings with the following defaults:

--artifact-retention-ttl=120s
--artifact-retention-records=2

For the above, the controller will delete artifacts older than 120s while preserving the two most recent ones.

I'm reluctant on having these settings in custom resources since they would allow a tenant to fill the source-controller storage by setting:

apiVersion: source.toolkit.fluxcd.io/v1beta1
kind: GitRepository
spec:
  retention: 
    ttl: 50000m
    records: 50000

hiddeco · 2022-03-04T14:12:09Z

On the audit/compliance perspective, I would argue that source controller is not the source of truth in itself for such data, and therefore an audit log feature covering externally consumed data would probably be a more resource-optimised approach towards compliance.

Due to the fact that we make last-mile modifications, I am not sure this is true.

I'm reluctant on having these settings in custom resources since they would allow a tenant to fill the source-controller storage by setting:

I would say this could be prevented by allowing an administrator to configure a max, at which the user defined value is capped.

stefanprodan · 2022-03-04T15:37:32Z

I would say this could be prevented by allowing an administrator to configure a max, at which the user defined value is capped.

Ok so maybe the flags should have a max suffix.

aryan9600 · 2022-04-22T11:28:41Z

Closed via #638

hiddeco added the enhancement New feature or request label Mar 4, 2022

hiddeco added this to Maintainers' Focus Mar 4, 2022

hiddeco moved this to Up-Next in Maintainers' Focus Mar 4, 2022

aryan9600 self-assigned this Mar 22, 2022

pjbgf added this to the GA milestone Mar 29, 2022

aryan9600 mentioned this issue Apr 5, 2022

Garbage collect with provided retention options #638

Merged

aryan9600 closed this as completed Apr 22, 2022

Repository owner moved this from Up-Next to Done in Maintainers' Focus Apr 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve garbage collection strategy #602

Improve garbage collection strategy #602

hiddeco commented Mar 4, 2022 •

edited

Loading

pjbgf commented Mar 4, 2022

stefanprodan commented Mar 4, 2022

hiddeco commented Mar 4, 2022

stefanprodan commented Mar 4, 2022

aryan9600 commented Apr 22, 2022

Improve garbage collection strategy #602

Improve garbage collection strategy #602

Comments

hiddeco commented Mar 4, 2022 • edited Loading

Option 1: taking factor time into account

Option 2: allow a controller global retention strategy

Option 3: allow an object retention strategy

Option 4: allow an object retention strategy with controller global fallback

pjbgf commented Mar 4, 2022

stefanprodan commented Mar 4, 2022

hiddeco commented Mar 4, 2022

stefanprodan commented Mar 4, 2022

aryan9600 commented Apr 22, 2022

hiddeco commented Mar 4, 2022 •

edited

Loading