Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: allow artifact gc to delete directory. Fixes #12857 #13091

Merged
merged 6 commits into from
Jul 3, 2024

Conversation

tczhao
Copy link
Member

@tczhao tczhao commented May 25, 2024

Fixes #12857

Motivation

We see error The specified key does not exist when gc artifact are not compressed and is directory

Modifications

Determine if artifact is dir and delete accordingly (used in artifact gc)

Verification

Test log

time="2024-05-25T03:16:16.108Z" level=info msg="S3 Delete artifactttt: key: fanout/collected-other-datasets/yo-test-zl4wk/"
time="2024-05-25T03:16:16.108Z" level=info msg="Creating minio client using static credentials" endpoint="minio:9000"
time="2024-05-25T03:16:16.112Z" level=info msg="Listing directory from s3" bucket=my-bucket endpoint="minio:9000" key=fanout/collected-other-datasets/yo-test-zl4wk/
time="2024-05-25T03:16:16.113Z" level=info msg="Deleting object from s3" bucket=my-bucket endpoint="minio:9000" key=fanout/collected-other-datasets/yo-test-zl4wk/hello-world.txt
time="2024-05-25T03:16:16.115Z" level=info msg="Deleting object from s3" bucket=my-bucket endpoint="minio:9000" key=fanout/collected-other-datasets/yo-test-zl4wk/hello-world2.txt

@tczhao tczhao marked this pull request as ready for review May 25, 2024 04:06
@tczhao tczhao changed the title fix: allow artifact delete on directory fix: allow artifact delete to delete directory May 25, 2024
@tczhao tczhao changed the title fix: allow artifact delete to delete directory fix: allow artifact delete to delete directory. Fixes #12857 May 25, 2024
@agilgur5 agilgur5 added the area/artifacts S3/GCP/OSS/Git/HDFS etc label May 25, 2024
Copy link

@agilgur5 agilgur5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work figuring out the problem and fixing it!

Left a few modifications below and some optimization suggestions

workflow/artifacts/s3/s3.go Outdated Show resolved Hide resolved
workflow/artifacts/s3/s3.go Outdated Show resolved Hide resolved
Comment on lines 184 to 191
isDir, err := s3cli.IsDirectory(artifact.S3.Bucket, artifact.S3.Key)
if err != nil {
return fmt.Errorf("failed to test if %s is a directory: %v", artifact.S3.Key, err)
}

if !isDir {
return s3cli.Delete(artifact.S3.Bucket, artifact.S3.Key)
} else {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could potentially be optimized I'm thinking -- try deleting and if it doesn't work and gives the appropriate error, then proceed with the directory deletion

similar to what I did in #12974

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, the current change doubles the API calls

Copy link
Member Author

@tczhao tczhao Jun 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, looking deeper I realise
For

err = deleteFile
if isNotFound(err) {
    deleteDir
}

to work, deleteFile needs to return the appropriate error.
In s3, delete only adds a marker and will return 204 regardless if the key exist or not

Copy link
Member Author

@tczhao tczhao Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, use simple path suffix check instead of s3cli.ListDirectory to determine if the path is a directory

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh does the / check actually work? Great workaround if so!

Comment on lines 196 to 197
for _, objKey := range keys {
err = s3cli.Delete(artifact.S3.Bucket, objKey)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can probably be parallelized.

I am also a bit concerned that for large directories this could potentially hit a rate limit or cause a lot of load, although I think in this case archive: none is already an edge case and a large dir would be another edge case, so potentially not necessary to think about that. though maybe should leave a comment for future reference

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

err := retry.OnError(retry.DefaultBackoff, isTransientS3Err, func() error {
currently we have retry.DefaultBackoff and isTransientS3Err to handle the rate limit situation

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right above this code/surrounding it, good point.

I do think this could still be parallelized, although that can be a separate PR as it seems we lack parallelization in several (most?) places for artifacts (related: #12442)

@agilgur5 agilgur5 changed the title fix: allow artifact delete to delete directory. Fixes #12857 fix: allow artifact gc to delete directory. Fixes #12857 May 27, 2024
@agilgur5 agilgur5 added the area/gc Garbage collection, such as TTLs, retentionPolicy, delays, and more label May 27, 2024
tczhao and others added 4 commits June 2, 2024 20:43
Co-authored-by: Anton Gilgur <[email protected]>
Signed-off-by: Tianchu Zhao <[email protected]>
Co-authored-by: Anton Gilgur <[email protected]>
Signed-off-by: Tianchu Zhao <[email protected]>
Signed-off-by: Tianchu Zhao <[email protected]>
@tczhao tczhao requested a review from agilgur5 June 10, 2024 07:24
Signed-off-by: Anton Gilgur <[email protected]>

Signed-off-by: Anton Gilgur <[email protected]>
Copy link

@agilgur5 agilgur5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if the suffix check works. It might be good to have a test case for archive: none as well

I'll leave it up to you if you want to do parallelization in this PR or separately

@juliev0
Copy link
Contributor

juliev0 commented Jul 3, 2024

@tczhao do you want to re-trigger the CI, and then it sounds like this can be merged?

@agilgur5
Copy link

agilgur5 commented Jul 3, 2024

CI doesn't need to be retrigerred as only the Windows tests are failing which are not required to pass. Can just hit merge

@juliev0 juliev0 merged commit a929c8f into argoproj:main Jul 3, 2024
27 of 28 checks passed
@juliev0
Copy link
Contributor

juliev0 commented Jul 3, 2024

CI doesn't need to be retrigerred as only the Windows tests are failing which are not required to pass. Can just hit merge

ahh, thanks

agilgur5 pushed a commit that referenced this pull request Jul 6, 2024
Signed-off-by: Tianchu Zhao <[email protected]>
Signed-off-by: Anton Gilgur <[email protected]>
Co-authored-by: Anton Gilgur <[email protected]>
(cherry picked from commit a929c8f)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/artifacts S3/GCP/OSS/Git/HDFS etc area/gc Garbage collection, such as TTLs, retentionPolicy, delays, and more
Projects
None yet
Development

Successfully merging this pull request may close these issues.

artifactGC does not run when the output artifact uses archive: none
3 participants