Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Fix ignore_file option in S3Bucket #139

Merged
merged 2 commits into from
Oct 22, 2022

Conversation

neumann-nico
Copy link
Contributor

@neumann-nico neumann-nico commented Oct 21, 2022

Closes #140

Example

When using S3Bucket.put_directory(ignore_file=".prefectignore") the filter_files function is called which returns the included_files as strings. But the actual files are of type path.PosixPath and includes the whole absolute path.
As an example:

# .prefectignore
**/folder2/*
# called with S3Bucket.put_directory(local_path=str(tmp_path / "folder1"), ignore_file=".prefectignore"))
included_files = {'file3.txt', 'folder2', 'file4.txt'}
# not sure why the directory folder2 is included here, but the files inside folder 2 are excluded. The upload function skips directories, so thats not a big problem.
# looping over the files using local_file_path
pathlib.PosixPath("/private/var/folders/h0/jxdxc2_93m74hnycmnckqb3w0000gn/T/pytest-of-niconeumann/pytest-21/test_put_directory_with_ignore1/folder1/file3.txt")
pathlib.PosixPath("/private/var/folders/h0/jxdxc2_93m74hnycmnckqb3w0000gn/T/pytest-of-niconeumann/pytest-21/test_put_directory_with_ignore1/folder1/file4.txt")
pathlib.PosixPath("/private/var/folders/h0/jxdxc2_93m74hnycmnckqb3w0000gn/T/pytest-of-niconeumann/pytest-21/test_put_directory_with_ignore1/folder1/folder2")
pathlib.PosixPath("/private/var/folders/h0/jxdxc2_93m74hnycmnckqb3w0000gn/T/pytest-of-niconeumann/pytest-21/test_put_directory_with_ignore1/folder1/folder2/file5.txt")

It tries to find those absolute posix paths in the included files, which does not work and thus skips files which should be uploaded:

if included_files is not None and local_path not in included_files:
    continue

Instead I use the relative path and parse it to a string:

if (
    included_files is not None
    and str(local_file_path.relative_to(local_path)) not in included_files
):
    continue
# called with S3Bucket.put_directory(local_path=str(tmp_path / "folder1"), ignore_file=".prefectignore"))
included_files = {'file3.txt', 'folder2', 'file4.txt'}
# looping over the files using str(local_file_path.relative_to(local_path)
"file3.txt"
"file4.txt"
"folder2"
"folder2/file5.txt"

Checklist

  • This pull request references any related issue by including "Closes #<ISSUE_NUMBER>"
    • If no issue exists and your change is not a small fix, please create an issue first.
  • This pull request includes tests or only affects documentation.
  • Summarized PR's changes in CHANGELOG.md

Copy link
Member

@desertaxle desertaxle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for identifying and contributing a fix so quickly!

@desertaxle desertaxle merged commit 584de91 into PrefectHQ:main Oct 22, 2022
@neumann-nico neumann-nico deleted the fix_s3bucket_ignore_file branch October 23, 2022 10:01
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Using ignore_file in S3Bucket.put_directory skips files which should be uploaded
2 participants