Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add file type validation and mime type extraction #1893

Merged

Conversation

jasonlai1218
Copy link
Contributor

@jasonlai1218 jasonlai1218 commented Oct 15, 2023

TL;DR

add file type validation and mime type extraction

Type

  • Bug Fix
  • Feature
  • Plugin

Are all requirements met?

  • Code completed
  • Smoke tested
  • Unit tests added
  • Code documentation added
  • Any pending items have an associated Issue

Complete description

  • Import the magic module in flytekit/types/file/file.py
  • Add a method get_mime_type_from_python_type in FlyteFilePathTransformer class in flytekit/types/file/file.py
  • Add a validation for file type in FlyteFilePathTransformer.to_literal method in flytekit/types/file/file.py
  • Modify setup.py to include python-magic>=0.4.27 as a dependency
  • Add a test case test_real_file_type_in_workflow in test_flyte_file.py

Screenshot

Local Test

(Scenario: actual file type does not match the expected file type)

Screenshot 2023-10-15 at 11 34 13 PM

(Scenario: actual file type matches the expected file type)

Screenshot 2023-10-15 at 11 35 05 PM

(Scenario: actual pdf file type does match the expected pdf file type)

Screenshot 2023-10-15 at 11 33 01 PM


Remote Test

docker build . -f Dockerfile.dev -t localhost:30000/flytekit:dev1 --build-arg PYTHON_VERSION=3.9
docker push localhost:30000/flytekit:dev1
pyflyte run --remote --image localhost:30000/flytekit:dev1 jpg_png_mismatch.py wf

(Scenario: actual file type does not match the expected file type)

Screenshot 2023-10-17 at 9 50 37 AM

Tracking Issue

Fixes flyteorg/flyte#3834

Follow-up issue

NA

- Import the `magic` module in `flytekit/types/file/file.py`
- Add a method `get_mime_type_from_python_type` in `FlyteFilePathTransformer` class in `flytekit/types/file/file.py`
- Add a validation for file type in `FlyteFilePathTransformer.to_literal` method in `flytekit/types/file/file.py`
- Modify `setup.py` to include `python-magic>=0.4.27` as a dependency
- Add a test case `test_real_file_type_in_workflow` in `test_flyte_file.py`

Signed-off-by: jason.lai <[email protected]>
@welcome
Copy link

welcome bot commented Oct 15, 2023

Thank you for opening this pull request! 🙌

These tips will help get your PR across the finish line:

  • Most of the repos have a PR template; if not, fill it out to the best of your knowledge.
  • Sign off your commits (Reference: DCO Guide).

@parag477
Copy link

I want to work on this issue.

- Modify the test function name from `test_file_type_in_workflow_with_bad_format` to `test_matching_file_types_in_workflow`
- Remove the `print(type(res))` line
- Remove the test function `test_mismatching_file_types`
- Add tests for the `get_mime_type_from_python_type` function

Signed-off-by: jason.lai <[email protected]>
@codecov
Copy link

codecov bot commented Oct 15, 2023

Codecov Report

Attention: 21 lines in your changes are missing coverage. Please review.

Comparison is base (28adeee) 62.69% compared to head (7dc39e6) 62.63%.
Report is 2 commits behind head on master.

Files Patch % Lines
flytekit/types/file/file.py 12.50% 21 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1893      +/-   ##
==========================================
- Coverage   62.69%   62.63%   -0.06%     
==========================================
  Files         313      313              
  Lines       23187    23211      +24     
  Branches     3513     3518       +5     
==========================================
+ Hits        14536    14539       +3     
- Misses       8229     8250      +21     
  Partials      422      422              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…ould be:

- docs: Non-code changes, such as fixing typos or adding new documentation (example scopes: Markdown file)
- refactor: A code change that neither fixes a bug nor adds a feature
- build: Changes that affect the build system or external dependencies (example scopes: gulp, broccoli, npm): update file handling and dependencies

- Fix a typo in the file `flytekit/types/file/file.py`
- Modify the `expected_type` variable assignment in `flytekit/types/file/file.py` for better readability
- Update the `setup.py` file to include the `python-magic` package as a requirement

Signed-off-by: jason.lai <[email protected]>
- Add `python-magic` to the dev requirements
- Remove `python-magic>=0.4.27` from the setup.py dependencies

Signed-off-by: jason.lai <[email protected]>
- Remove `python-magic` from dev requirements
- Add `python-magic` to setup.py dependencies

Signed-off-by: jason.lai <[email protected]>
- Import the `magic` module after the existing imports in `flytekit/types/file/file.py`

Signed-off-by: jason.lai <[email protected]>
- Fix a typo in the error message for incorrect file type

Signed-off-by: jason.lai <[email protected]>
- Add the installation of libmagic1 package to the Dockerfile.dev

Signed-off-by: jason.lai <[email protected]>
- Add a `validate_file_type` method to the `FlyteFilePathTransformer` class
- Validate the file type in the `to_literal` method
- Remove file type validation from the `to_literal` method for `pathlib.Path` and `str` inputs

Signed-off-by: jason.lai <[email protected]>
- Import the `patch` function from `unittest.mock` in `tests/flytekit/unit/core/test_flyte_file.py`
- Add a new test `test_validate_file_type_incorrect` in `tests/flytekit/unit/core/test_flyte_file.py`
- In the new test, mock the return value of `FlyteFilePathTransformer.get_format` and `magic.from_file`
- Test that `transformer.validate_file_type` raises a `ValueError` with the correct message

Signed-off-by: jason.lai <[email protected]>
flytekit/types/file/file.py Outdated Show resolved Hide resolved
flytekit/types/file/file.py Outdated Show resolved Hide resolved
flytekit/types/file/file.py Show resolved Hide resolved
tests/flytekit/unit/core/test_flyte_file.py Outdated Show resolved Hide resolved
tests/flytekit/unit/core/test_flyte_file.py Show resolved Hide resolved
- Update the type hint for the `validate_file_type` method to include the types `typing.Type[FlyteFile]` and `typing.Union[str, os.PathLike]`

Signed-off-by: jason.lai <[email protected]>
- Remove the import of the `magic` module
- Add a try-except block for importing `magic` and log a warning if it fails
- Modify the `validate_file_type` method to handle the case where `magic` is not installed
- Add a new test for file types with a naked `FlyteFile` in the workflow

Signed-off-by: jason.lai <[email protected]>
- Add a fixture `can_import_magic` to check if `magic` can be imported
- Remove a print statement in `test_file_type_in_workflow_with_bad_format`
- Remove a print statement in `test_matching_file_types_in_workflow`
- Replace the test function `test_mismatching_file_types` with `test_mismatching_file_types(can_import_magic)`
- Add a condition to check if `can_import_magic` is `True` in `test_mismatching_file_types`
- Replace the test function `test_validate_file_type_incorrect` with `test_validate_file_type_incorrect(can_import_magic)`

Signed-off-by: jason.lai <[email protected]>
…ange would be "refactor". The changes mentioned in the summaries do not fix a bug or introduce a new feature, but rather involve code restructuring and formatting improvements.: refactor import statements, conditional statements, and assertions

- Import statement `import magic` was added (line 40)
- Import statement `import magic` was added (line 45)
- Conditional statement changed from `if can_import_magic == True` to `if can_import_magic` (line 52)
- Assertion message changed from "Incorrect type, expected image/jpeg, got text/plain" to "Incorrect type, expected image/jpeg, got text/plain" (line 58)
- Code within the `with patch.object` block was indented (lines 66-69)
- Conditional statement changed from `if can_import_magic == True` to `if can_import_magic` (line 71)
- Code within the `with patch.object` block was indented (lines 73-76)
- Assertion message changed from "Incorrect file type, expected image/jpeg, got {source_file_mime_type}" to "Incorrect file type, expected image/jpeg, got {source_file_mime_type}" (line 79)

Signed-off-by: jason.lai <[email protected]>
- Add a new test for the `FlyteFile` type with an annotated hash method
- Define a `calc_hash` function to calculate the hash of a `FlyteFile` path
- Add a new task `t1` that returns a `HashedFlyteFile`
- Add a new task `t2` that prints the path of a `HashedFlyteFile`
- Add a new workflow `wf` that takes a path and creates a `HashedFlyteFile` and passes it to `t2`
- Call the `wf` workflow with a local dummy file path

Signed-off-by: jason.lai <[email protected]>
…use the change does not fix a bug or add a new feature, but instead modifies the MIME type for CSV files and adds a new test file.: change MIME type for CSV files and add test file

- Change the MIME type for CSV files from "text/csv" to "text/plain" in `flytekit/types/file/file.py`
- Add a new test file `tests/flytekit/unit/extras/tasks/testdata/test.csv` with the content "1,2"

Signed-off-by: jason.lai <[email protected]>
…ould be "fix". This is because the changes mentioned involve fixing typos, fixing assertion error messages, and updating expected values in tests.: fix various issues and improve tests in file handling

- Fix a typo in the logger warning message
- Ignore file type comparison when the file does not exist
- Fix assertion error message in the test for mismatching file types
- Update the expected MIME type for the csv file type in a test
- Add a new test for the annotated hash method, with an assertion error message when the file type is incorrect

Signed-off-by: jason.lai <[email protected]>
- Change the MIME type for `hdf5` files from `application/x-hdf` to `text/plain`
- Change the MIME type for `ipynb` files from `application/x-ipynb+json` to `application/json`
- Change the MIME type for `onnx` files from `application/octet-stream` to `application/json`

Signed-off-by: jason.lai <[email protected]>
- Modify the MIME type returned for the `hdf5` file type
- Modify the MIME type returned for the `ipynb` file type
- Modify the MIME type returned for the `onnx` file type

Signed-off-by: jason.lai <[email protected]>
@jasonlai1218 jasonlai1218 requested a review from pingsutw November 7, 2023 01:51
flytekit/types/file/file.py Outdated Show resolved Hide resolved
tests/flytekit/unit/core/test_flyte_file.py Outdated Show resolved Hide resolved
tests/flytekit/unit/core/test_flyte_file.py Outdated Show resolved Hide resolved
flytekit/types/file/file.py Outdated Show resolved Hide resolved
- Modify the `get_mime_type_from_python_type` method to `get_mime_type_from_extension`
- Remove several file type mappings from the `extension_to_mime_type` dictionary
- Add a loop to populate the `extension_to_mime_type` dictionary with all file type mappings
- Modify the `validate_file_type` method to use `get_mime_type_from_extension` instead of `get_mime_type_from_python_type`
- In the `test_get_mime_type_from_extension_success` test, assert the correct mime type for various file extensions
- In the `test_get_mime_type_from_extension_failure` test, assert that a `KeyError` is raised for an unknown file extension

Signed-off-by: jason.lai <[email protected]>
@jasonlai1218 jasonlai1218 requested a review from pingsutw November 7, 2023 03:45
@eapolinario
Copy link
Collaborator

Something weird is happening to the windows tests. They are hanging, running for multiple hours until they are canceled.

Can you take a look, @jasonlai1218 ?

- Add `python-magic-bin` as a requirement when installing on Windows
- Add `python-magic` as a requirement when installing on Darwin or Linux
- Remove `python-magic` as a requirement when installing on Windows

Signed-off-by: jason.lai <[email protected]>
@jasonlai1218
Copy link
Contributor Author

@eapolinario
According to the doc, the lib that Windows OS needs to install is python-magic-bin, not libmagic1. I guess this is the cause of the build error.

Screenshot 2023-11-08 at 11 49 56 PM

so I adjusted it like this

# Because DLLs for libmagic are required on Windows,
# we must specify python-magic-bin when installing the pypi package.
python-magic-bin; platform_system=='Windows'
python-magic; (platform_system=='Darwin' or platform_system=='Linux')

@jasonlai1218 jasonlai1218 reopened this Nov 8, 2023
@eapolinario
Copy link
Collaborator

@jasonlai1218 , how about we disable the feature on windows? Leave a TODO explaining the limitation and create a github issue.

- Comment out the `python-magic-bin` requirement on Windows due to build errors with DLLs for `libmagic`
- Add a TODO comment about finding a solution to support `python-magic` on Windows
- Update the comment about `python-magic` to mention that it is currently used for Mac OS and Linux only
- Add a comment about a related GitHub issue for more details on the `python-magic` usage

Signed-off-by: jason.lai <[email protected]>
@jasonlai1218
Copy link
Contributor Author

@jasonlai1218 , how about we disable the feature on windows? Leave a TODO explaining the limitation and create a github issue.

OK, I updated it.

# TODO: Currently, the python-magic library causes build errors on Windows due to its dependency on DLLs for libmagic.
# We have temporarily disabled this feature on Windows and are using python-magic for Mac OS and Linux instead.
# For more details, see the related GitHub issue.
# Once a solution is found, this should be updated to support Windows as well.
python-magic; (platform_system=='Darwin' or platform_system=='Linux')

and I create a github issue
flyteorg/flyte#4391

@eapolinario eapolinario merged commit 2861267 into flyteorg:master Nov 16, 2023
70 of 72 checks passed
Copy link

welcome bot commented Nov 16, 2023

Congrats on merging your first pull request! 🎉

ringohoffman pushed a commit to ringohoffman/flytekit that referenced this pull request Nov 24, 2023
* feat: add file type validation and mime type extraction

- Import the `magic` module in `flytekit/types/file/file.py`
- Add a method `get_mime_type_from_python_type` in `FlyteFilePathTransformer` class in `flytekit/types/file/file.py`
- Add a validation for file type in `FlyteFilePathTransformer.to_literal` method in `flytekit/types/file/file.py`
- Modify `setup.py` to include `python-magic>=0.4.27` as a dependency
- Add a test case `test_real_file_type_in_workflow` in `test_flyte_file.py`

Signed-off-by: jason.lai <[email protected]>

* test: refactor test functions and add new tests

- Modify the test function name from `test_file_type_in_workflow_with_bad_format` to `test_matching_file_types_in_workflow`
- Remove the `print(type(res))` line
- Remove the test function `test_mismatching_file_types`
- Add tests for the `get_mime_type_from_python_type` function

Signed-off-by: jason.lai <[email protected]>

* Based on the file summaries provided, the best label for the commit would be:

- docs: Non-code changes, such as fixing typos or adding new documentation (example scopes: Markdown file)
- refactor: A code change that neither fixes a bug nor adds a feature
- build: Changes that affect the build system or external dependencies (example scopes: gulp, broccoli, npm): update file handling and dependencies

- Fix a typo in the file `flytekit/types/file/file.py`
- Modify the `expected_type` variable assignment in `flytekit/types/file/file.py` for better readability
- Update the `setup.py` file to include the `python-magic` package as a requirement

Signed-off-by: jason.lai <[email protected]>

* chore: update dev requirements and setup.py dependencies

- Add `python-magic` to the dev requirements
- Remove `python-magic>=0.4.27` from the setup.py dependencies

Signed-off-by: jason.lai <[email protected]>

* chore: update dependencies and remove unnecessary dev requirement

- Remove `python-magic` from dev requirements
- Add `python-magic` to setup.py dependencies

Signed-off-by: jason.lai <[email protected]>

* chore: import `magic` module in `file.py`

- Import the `magic` module after the existing imports in `flytekit/types/file/file.py`

Signed-off-by: jason.lai <[email protected]>

* docs: fix typo in error message for incorrect file type

- Fix a typo in the error message for incorrect file type

Signed-off-by: jason.lai <[email protected]>

* build: add libmagic1 package to Dockerfile.dev

- Add the installation of libmagic1 package to the Dockerfile.dev

Signed-off-by: jason.lai <[email protected]>

* feat: refactor file type validation in FlyteFilePathTransformer class

- Add a `validate_file_type` method to the `FlyteFilePathTransformer` class
- Validate the file type in the `to_literal` method
- Remove file type validation from the `to_literal` method for `pathlib.Path` and `str` inputs

Signed-off-by: jason.lai <[email protected]>

* test: add test for invalid file type validation

- Import the `patch` function from `unittest.mock` in `tests/flytekit/unit/core/test_flyte_file.py`
- Add a new test `test_validate_file_type_incorrect` in `tests/flytekit/unit/core/test_flyte_file.py`
- In the new test, mock the return value of `FlyteFilePathTransformer.get_format` and `magic.from_file`
- Test that `transformer.validate_file_type` raises a `ValueError` with the correct message

Signed-off-by: jason.lai <[email protected]>

* refactor: update type hint for `validate_file_type` method

- Update the type hint for the `validate_file_type` method to include the types `typing.Type[FlyteFile]` and `typing.Union[str, os.PathLike]`

Signed-off-by: jason.lai <[email protected]>

* refactor: handle missing `magic` module gracefully

- Remove the import of the `magic` module
- Add a try-except block for importing `magic` and log a warning if it fails
- Modify the `validate_file_type` method to handle the case where `magic` is not installed
- Add a new test for file types with a naked `FlyteFile` in the workflow

Signed-off-by: jason.lai <[email protected]>

* test: refactor file type tests to use `can_import_magic` fixture

- Add a fixture `can_import_magic` to check if `magic` can be imported
- Remove a print statement in `test_file_type_in_workflow_with_bad_format`
- Remove a print statement in `test_matching_file_types_in_workflow`
- Replace the test function `test_mismatching_file_types` with `test_mismatching_file_types(can_import_magic)`
- Add a condition to check if `can_import_magic` is `True` in `test_mismatching_file_types`
- Replace the test function `test_validate_file_type_incorrect` with `test_validate_file_type_incorrect(can_import_magic)`

Signed-off-by: jason.lai <[email protected]>

* Based on the file summaries provided, the best label for this code change would be "refactor". The changes mentioned in the summaries do not fix a bug or introduce a new feature, but rather involve code restructuring and formatting improvements.: refactor import statements, conditional statements, and assertions

- Import statement `import magic` was added (line 40)
- Import statement `import magic` was added (line 45)
- Conditional statement changed from `if can_import_magic == True` to `if can_import_magic` (line 52)
- Assertion message changed from "Incorrect type, expected image/jpeg, got text/plain" to "Incorrect type, expected image/jpeg, got text/plain" (line 58)
- Code within the `with patch.object` block was indented (lines 66-69)
- Conditional statement changed from `if can_import_magic == True` to `if can_import_magic` (line 71)
- Code within the `with patch.object` block was indented (lines 73-76)
- Assertion message changed from "Incorrect file type, expected image/jpeg, got {source_file_mime_type}" to "Incorrect file type, expected image/jpeg, got {source_file_mime_type}" (line 79)

Signed-off-by: jason.lai <[email protected]>

* feat: refactor file handling and introduce hash calculation

- Add a new test for the `FlyteFile` type with an annotated hash method
- Define a `calc_hash` function to calculate the hash of a `FlyteFile` path
- Add a new task `t1` that returns a `HashedFlyteFile`
- Add a new task `t2` that prints the path of a `HashedFlyteFile`
- Add a new workflow `wf` that takes a path and creates a `HashedFlyteFile` and passes it to `t2`
- Call the `wf` workflow with a local dummy file path

Signed-off-by: jason.lai <[email protected]>

* The label that best describes this change is "refactor". This is because the change does not fix a bug or add a new feature, but instead modifies the MIME type for CSV files and adds a new test file.: change MIME type for CSV files and add test file

- Change the MIME type for CSV files from "text/csv" to "text/plain" in `flytekit/types/file/file.py`
- Add a new test file `tests/flytekit/unit/extras/tasks/testdata/test.csv` with the content "1,2"

Signed-off-by: jason.lai <[email protected]>

* Based on the file summaries provided, the best label for the commit would be "fix". This is because the changes mentioned involve fixing typos, fixing assertion error messages, and updating expected values in tests.: fix various issues and improve tests in file handling

- Fix a typo in the logger warning message
- Ignore file type comparison when the file does not exist
- Fix assertion error message in the test for mismatching file types
- Update the expected MIME type for the csv file type in a test
- Add a new test for the annotated hash method, with an assertion error message when the file type is incorrect

Signed-off-by: jason.lai <[email protected]>

* refactor: change MIME types for `hdf5`, `ipynb`, and `onnx` files

- Change the MIME type for `hdf5` files from `application/x-hdf` to `text/plain`
- Change the MIME type for `ipynb` files from `application/x-ipynb+json` to `application/json`
- Change the MIME type for `onnx` files from `application/octet-stream` to `application/json`

Signed-off-by: jason.lai <[email protected]>

* refactor: update MIME types for specific file types

- Modify the MIME type returned for the `hdf5` file type
- Modify the MIME type returned for the `ipynb` file type
- Modify the MIME type returned for the `onnx` file type

Signed-off-by: jason.lai <[email protected]>

* The label best describing this change is "test".: update file.py and test_flyte_file.py for improved file type handling

- Import `mimetypes` module in `file.py`
- Add `pdf`, `txt`, `csv`, and `svg` extensions to `extension_to_mime_type` dictionary in `get_mime_type_from_python_type` method in `file.py`
- Add a new fixture `local_dummy_txt_file` in `test_flyte_file.py`
- Modify `test_matching_file_types_in_workflow` method in `test_flyte_file.py` to accept `local_dummy_txt_file` fixture
- Modify `my_wf` method in `test_matching_file_types_in_workflow` method in `test_flyte_file.py` to accept `path` parameter
- Modify assertions in `test_matching_file_types_in_workflow` method in `test_flyte_file.py` to remove the newline character in the expected output
- Modify `test_file_types_with_naked_flytefile_in_workflow` method in `test_flyte_file.py` to accept `local_dummy_txt_file` fixture
- Modify `my_wf` method in `test_file_types_with_naked_flytefile_in_workflow` method in `test_flyte_file.py` to accept `path` parameter
- Modify assertions in `test_file_types_with_naked_flytefile_in_workflow` method in `test_flyte_file.py` to remove the newline character in the expected output
- Modify `test_mismatching_file_types` method in `test_flyte_file.py`

Signed-off-by: jason.lai <[email protected]>

* feat: refactor file handling and input parameter in test cases

- Modify the `test_input_output_substitution_files` function in `tests/flytekit/unit/extras/tasks/test_shell.py`
- Change the `inputs` parameter in the `test_input_output_substitution_files` function from `kwtypes(f=CSVFile)` to `kwtypes(f=FlyteFile)`
- Modify the data written to the `test.csv` file in `tests/flytekit/unit/extras/tasks/testdata/test.csv`
- Add a new file `write_csv_format_file.py`

Signed-off-by: jason.lai <[email protected]>

* chore: remove write_csv_format_file.py

- Delete the file `write_csv_format_file.py`

Signed-off-by: jason.lai <[email protected]>

* feat: add `python-magic` library for file type validation and logging improvements

- Add `python-magic` to the `dev-requirements.in` file
- Modify the `validate_file_type` method in `file.py` to include debug logging
- Remove `python-magic` from the `setup.py` file
- Modify the `test_flyte_file.py` file:
  - Remove the `can_import_magic` fixture
  - Skip the test functions if `magic` is not installed
  - Modify the `test_mismatching_file_types` function to remove the `can_import_magic` check
  - Modify the `test_validate_file_type_incorrect` function to remove the `can_import_magic` check
  - Modify the `test_flyte_file_type_annotated_hashmethod` function to remove the `can_import_magic` check

Signed-off-by: jason.lai <[email protected]>

* feat: add validation method for file types in FlyteFilePathTransformer

- Add a validation method for file types in FlyteFilePathTransformer

Signed-off-by: jason.lai <[email protected]>

* feat: refactor file handling and add new tests

- Add two new files: `custom_type_example.py` and `custom_type_wf.py`
- Modify the file `flytekit/types/file/file.py`:
  - Remove the try-except block for handling `FileNotFoundError`
  - Modify the logic for comparing file types
- Modify the file `tests/flytekit/unit/core/test_flyte_file.py`:
  - Add a new function `can_import(module_name)`
  - Add a new test `test_file_type_in_workflow_with_bad_format()`
  - Modify the test `test_mismatching_file_types()`
  - Modify the test `test_get_mime_type_from_python_type_failure()`
  - Modify the test `test_validate_file_type_incorrect()`
  - Modify the test `test_flyte_file_type_annotated_hashmethod()`
- Modify the file `tests/flytekit/unit/core/test_type_engine.py`:
  - Import the function `can_import` from `tests/flytekit/unit/core/test_flyte_file.py`
  - Add a new test `test_flyte_file_in_dataclassjsonmixin()`
  - Skip the test `test_flyte_file_in_dataclassjsonmixin()` if `magic` module is imported

Signed-off-by: jason.lai <[email protected]>

* The label best describing this change is "refactor".: remove unnecessary imports from test files

- Remove `import sys` from `test_flyte_file.py`
- Remove `import sys` from `test_type_engine.py`

Signed-off-by: jason.lai <[email protected]>

* test: add skipif marks to test functions for dataclass and enum in dataclassjsonmixin

- Add a skipif mark to the `test_flyte_file_in_dataclass` function
- Add a skipif mark to the parametrize block in the `test_enum_in_dataclassjsonmixin` function

Signed-off-by: jason.lai <[email protected]>

* test: add import statement and skip test in unit test file

- Add a `can_import` import statement from `tests.flytekit.unit.core.test_flyte_file`
- Add a `pytest.mark.skipif` decorator with a reason for skipping the test

Signed-off-by: jason.lai <[email protected]>

* test: remove unnecessary tests in core and type_hints modules

- Remove the `can_import` import statement in `tests/flytekit/unit/core/test_type_engine.py`
- Remove the `test_flyte_file_in_dataclass` test in `tests/flytekit/unit/core/test_type_engine.py`
- Remove the `test_flyte_file_in_dataclassjsonmixin` test in `tests/flytekit/unit/core/test_type_engine.py`
- Remove the `test_enum_in_dataclassjsonmixin` test in `tests/flytekit/unit/core/test_type_engine.py`
- Remove the `test_dict_to_literal_map` test in `tests/flytekit/unit/core/test_type_engine.py`
- Remove the `test_wf1_with_sql_with_patch` test in `tests/flytekit/unit/core/test_type_hints.py`
- Remove the `test_flyte_file_in_dataclass` test in `tests/flytekit/unit/core/test_type_hints.py`

Signed-off-by: jason.lai <[email protected]>

* chore: remove custom type files

- Delete `custom_type_example.py`
- Delete `custom_type_wf.py`

Signed-off-by: jason.lai <[email protected]>

* refactor: remove unnecessary validation in file transformations

- Skip validation for remote files in the `FlyteFilePathTransformer` class
- Remove unnecessary validation in the `test_flyte_file_in_dataclass` and `test_flyte_file_in_dataclassjsonmixin` tests
- Remove unnecessary validation in the `test_dict_to_literal_map` test

Signed-off-by: jason.lai <[email protected]>

* refactor: refactor file validation and test updates

- Remove unnecessary code for validating file format
- Update tests for `test_flyte_file_in_dataclass` to use `wf` instead of `ctx`

Signed-off-by: jason.lai <[email protected]>

* refactor: refactor file imports and remove unnecessary code

- Remove unnecessary code that does not affect the functionality of `FlyteFilePathTransformer.get_format()` method.
- Replace the import of `PNGImageFile` with `FlyteFile` in the `test_type_hints.py` file.
- Update the `test_flyte_file_in_dataclass()` method to use `FlyteFile` instead of `PNGImageFile`.

Signed-off-by: jason.lai <[email protected]>

* feat: refactor file type handling

- Modify the `get_mime_type_from_python_type` method to `get_mime_type_from_extension`
- Remove several file type mappings from the `extension_to_mime_type` dictionary
- Add a loop to populate the `extension_to_mime_type` dictionary with all file type mappings
- Modify the `validate_file_type` method to use `get_mime_type_from_extension` instead of `get_mime_type_from_python_type`
- In the `test_get_mime_type_from_extension_success` test, assert the correct mime type for various file extensions
- In the `test_get_mime_type_from_extension_failure` test, assert that a `KeyError` is raised for an unknown file extension

Signed-off-by: jason.lai <[email protected]>

* build: update requirements for installing `python-magic` library

- Add `python-magic-bin` as a requirement when installing on Windows
- Add `python-magic` as a requirement when installing on Darwin or Linux
- Remove `python-magic` as a requirement when installing on Windows

Signed-off-by: jason.lai <[email protected]>

* chore: update `python-magic` usage comments for Windows compatibility

- Comment out the `python-magic-bin` requirement on Windows due to build errors with DLLs for `libmagic`
- Add a TODO comment about finding a solution to support `python-magic` on Windows
- Update the comment about `python-magic` to mention that it is currently used for Mac OS and Linux only
- Add a comment about a related GitHub issue for more details on the `python-magic` usage

Signed-off-by: jason.lai <[email protected]>

---------

Signed-off-by: jason.lai <[email protected]>
@eapolinario eapolinario mentioned this pull request Nov 28, 2023
3 tasks
mark-thm pushed a commit to mark-thm/flytekit that referenced this pull request Dec 8, 2023
* feat: add file type validation and mime type extraction

- Import the `magic` module in `flytekit/types/file/file.py`
- Add a method `get_mime_type_from_python_type` in `FlyteFilePathTransformer` class in `flytekit/types/file/file.py`
- Add a validation for file type in `FlyteFilePathTransformer.to_literal` method in `flytekit/types/file/file.py`
- Modify `setup.py` to include `python-magic>=0.4.27` as a dependency
- Add a test case `test_real_file_type_in_workflow` in `test_flyte_file.py`

Signed-off-by: jason.lai <[email protected]>

* test: refactor test functions and add new tests

- Modify the test function name from `test_file_type_in_workflow_with_bad_format` to `test_matching_file_types_in_workflow`
- Remove the `print(type(res))` line
- Remove the test function `test_mismatching_file_types`
- Add tests for the `get_mime_type_from_python_type` function

Signed-off-by: jason.lai <[email protected]>

* Based on the file summaries provided, the best label for the commit would be:

- docs: Non-code changes, such as fixing typos or adding new documentation (example scopes: Markdown file)
- refactor: A code change that neither fixes a bug nor adds a feature
- build: Changes that affect the build system or external dependencies (example scopes: gulp, broccoli, npm): update file handling and dependencies

- Fix a typo in the file `flytekit/types/file/file.py`
- Modify the `expected_type` variable assignment in `flytekit/types/file/file.py` for better readability
- Update the `setup.py` file to include the `python-magic` package as a requirement

Signed-off-by: jason.lai <[email protected]>

* chore: update dev requirements and setup.py dependencies

- Add `python-magic` to the dev requirements
- Remove `python-magic>=0.4.27` from the setup.py dependencies

Signed-off-by: jason.lai <[email protected]>

* chore: update dependencies and remove unnecessary dev requirement

- Remove `python-magic` from dev requirements
- Add `python-magic` to setup.py dependencies

Signed-off-by: jason.lai <[email protected]>

* chore: import `magic` module in `file.py`

- Import the `magic` module after the existing imports in `flytekit/types/file/file.py`

Signed-off-by: jason.lai <[email protected]>

* docs: fix typo in error message for incorrect file type

- Fix a typo in the error message for incorrect file type

Signed-off-by: jason.lai <[email protected]>

* build: add libmagic1 package to Dockerfile.dev

- Add the installation of libmagic1 package to the Dockerfile.dev

Signed-off-by: jason.lai <[email protected]>

* feat: refactor file type validation in FlyteFilePathTransformer class

- Add a `validate_file_type` method to the `FlyteFilePathTransformer` class
- Validate the file type in the `to_literal` method
- Remove file type validation from the `to_literal` method for `pathlib.Path` and `str` inputs

Signed-off-by: jason.lai <[email protected]>

* test: add test for invalid file type validation

- Import the `patch` function from `unittest.mock` in `tests/flytekit/unit/core/test_flyte_file.py`
- Add a new test `test_validate_file_type_incorrect` in `tests/flytekit/unit/core/test_flyte_file.py`
- In the new test, mock the return value of `FlyteFilePathTransformer.get_format` and `magic.from_file`
- Test that `transformer.validate_file_type` raises a `ValueError` with the correct message

Signed-off-by: jason.lai <[email protected]>

* refactor: update type hint for `validate_file_type` method

- Update the type hint for the `validate_file_type` method to include the types `typing.Type[FlyteFile]` and `typing.Union[str, os.PathLike]`

Signed-off-by: jason.lai <[email protected]>

* refactor: handle missing `magic` module gracefully

- Remove the import of the `magic` module
- Add a try-except block for importing `magic` and log a warning if it fails
- Modify the `validate_file_type` method to handle the case where `magic` is not installed
- Add a new test for file types with a naked `FlyteFile` in the workflow

Signed-off-by: jason.lai <[email protected]>

* test: refactor file type tests to use `can_import_magic` fixture

- Add a fixture `can_import_magic` to check if `magic` can be imported
- Remove a print statement in `test_file_type_in_workflow_with_bad_format`
- Remove a print statement in `test_matching_file_types_in_workflow`
- Replace the test function `test_mismatching_file_types` with `test_mismatching_file_types(can_import_magic)`
- Add a condition to check if `can_import_magic` is `True` in `test_mismatching_file_types`
- Replace the test function `test_validate_file_type_incorrect` with `test_validate_file_type_incorrect(can_import_magic)`

Signed-off-by: jason.lai <[email protected]>

* Based on the file summaries provided, the best label for this code change would be "refactor". The changes mentioned in the summaries do not fix a bug or introduce a new feature, but rather involve code restructuring and formatting improvements.: refactor import statements, conditional statements, and assertions

- Import statement `import magic` was added (line 40)
- Import statement `import magic` was added (line 45)
- Conditional statement changed from `if can_import_magic == True` to `if can_import_magic` (line 52)
- Assertion message changed from "Incorrect type, expected image/jpeg, got text/plain" to "Incorrect type, expected image/jpeg, got text/plain" (line 58)
- Code within the `with patch.object` block was indented (lines 66-69)
- Conditional statement changed from `if can_import_magic == True` to `if can_import_magic` (line 71)
- Code within the `with patch.object` block was indented (lines 73-76)
- Assertion message changed from "Incorrect file type, expected image/jpeg, got {source_file_mime_type}" to "Incorrect file type, expected image/jpeg, got {source_file_mime_type}" (line 79)

Signed-off-by: jason.lai <[email protected]>

* feat: refactor file handling and introduce hash calculation

- Add a new test for the `FlyteFile` type with an annotated hash method
- Define a `calc_hash` function to calculate the hash of a `FlyteFile` path
- Add a new task `t1` that returns a `HashedFlyteFile`
- Add a new task `t2` that prints the path of a `HashedFlyteFile`
- Add a new workflow `wf` that takes a path and creates a `HashedFlyteFile` and passes it to `t2`
- Call the `wf` workflow with a local dummy file path

Signed-off-by: jason.lai <[email protected]>

* The label that best describes this change is "refactor". This is because the change does not fix a bug or add a new feature, but instead modifies the MIME type for CSV files and adds a new test file.: change MIME type for CSV files and add test file

- Change the MIME type for CSV files from "text/csv" to "text/plain" in `flytekit/types/file/file.py`
- Add a new test file `tests/flytekit/unit/extras/tasks/testdata/test.csv` with the content "1,2"

Signed-off-by: jason.lai <[email protected]>

* Based on the file summaries provided, the best label for the commit would be "fix". This is because the changes mentioned involve fixing typos, fixing assertion error messages, and updating expected values in tests.: fix various issues and improve tests in file handling

- Fix a typo in the logger warning message
- Ignore file type comparison when the file does not exist
- Fix assertion error message in the test for mismatching file types
- Update the expected MIME type for the csv file type in a test
- Add a new test for the annotated hash method, with an assertion error message when the file type is incorrect

Signed-off-by: jason.lai <[email protected]>

* refactor: change MIME types for `hdf5`, `ipynb`, and `onnx` files

- Change the MIME type for `hdf5` files from `application/x-hdf` to `text/plain`
- Change the MIME type for `ipynb` files from `application/x-ipynb+json` to `application/json`
- Change the MIME type for `onnx` files from `application/octet-stream` to `application/json`

Signed-off-by: jason.lai <[email protected]>

* refactor: update MIME types for specific file types

- Modify the MIME type returned for the `hdf5` file type
- Modify the MIME type returned for the `ipynb` file type
- Modify the MIME type returned for the `onnx` file type

Signed-off-by: jason.lai <[email protected]>

* The label best describing this change is "test".: update file.py and test_flyte_file.py for improved file type handling

- Import `mimetypes` module in `file.py`
- Add `pdf`, `txt`, `csv`, and `svg` extensions to `extension_to_mime_type` dictionary in `get_mime_type_from_python_type` method in `file.py`
- Add a new fixture `local_dummy_txt_file` in `test_flyte_file.py`
- Modify `test_matching_file_types_in_workflow` method in `test_flyte_file.py` to accept `local_dummy_txt_file` fixture
- Modify `my_wf` method in `test_matching_file_types_in_workflow` method in `test_flyte_file.py` to accept `path` parameter
- Modify assertions in `test_matching_file_types_in_workflow` method in `test_flyte_file.py` to remove the newline character in the expected output
- Modify `test_file_types_with_naked_flytefile_in_workflow` method in `test_flyte_file.py` to accept `local_dummy_txt_file` fixture
- Modify `my_wf` method in `test_file_types_with_naked_flytefile_in_workflow` method in `test_flyte_file.py` to accept `path` parameter
- Modify assertions in `test_file_types_with_naked_flytefile_in_workflow` method in `test_flyte_file.py` to remove the newline character in the expected output
- Modify `test_mismatching_file_types` method in `test_flyte_file.py`

Signed-off-by: jason.lai <[email protected]>

* feat: refactor file handling and input parameter in test cases

- Modify the `test_input_output_substitution_files` function in `tests/flytekit/unit/extras/tasks/test_shell.py`
- Change the `inputs` parameter in the `test_input_output_substitution_files` function from `kwtypes(f=CSVFile)` to `kwtypes(f=FlyteFile)`
- Modify the data written to the `test.csv` file in `tests/flytekit/unit/extras/tasks/testdata/test.csv`
- Add a new file `write_csv_format_file.py`

Signed-off-by: jason.lai <[email protected]>

* chore: remove write_csv_format_file.py

- Delete the file `write_csv_format_file.py`

Signed-off-by: jason.lai <[email protected]>

* feat: add `python-magic` library for file type validation and logging improvements

- Add `python-magic` to the `dev-requirements.in` file
- Modify the `validate_file_type` method in `file.py` to include debug logging
- Remove `python-magic` from the `setup.py` file
- Modify the `test_flyte_file.py` file:
  - Remove the `can_import_magic` fixture
  - Skip the test functions if `magic` is not installed
  - Modify the `test_mismatching_file_types` function to remove the `can_import_magic` check
  - Modify the `test_validate_file_type_incorrect` function to remove the `can_import_magic` check
  - Modify the `test_flyte_file_type_annotated_hashmethod` function to remove the `can_import_magic` check

Signed-off-by: jason.lai <[email protected]>

* feat: add validation method for file types in FlyteFilePathTransformer

- Add a validation method for file types in FlyteFilePathTransformer

Signed-off-by: jason.lai <[email protected]>

* feat: refactor file handling and add new tests

- Add two new files: `custom_type_example.py` and `custom_type_wf.py`
- Modify the file `flytekit/types/file/file.py`:
  - Remove the try-except block for handling `FileNotFoundError`
  - Modify the logic for comparing file types
- Modify the file `tests/flytekit/unit/core/test_flyte_file.py`:
  - Add a new function `can_import(module_name)`
  - Add a new test `test_file_type_in_workflow_with_bad_format()`
  - Modify the test `test_mismatching_file_types()`
  - Modify the test `test_get_mime_type_from_python_type_failure()`
  - Modify the test `test_validate_file_type_incorrect()`
  - Modify the test `test_flyte_file_type_annotated_hashmethod()`
- Modify the file `tests/flytekit/unit/core/test_type_engine.py`:
  - Import the function `can_import` from `tests/flytekit/unit/core/test_flyte_file.py`
  - Add a new test `test_flyte_file_in_dataclassjsonmixin()`
  - Skip the test `test_flyte_file_in_dataclassjsonmixin()` if `magic` module is imported

Signed-off-by: jason.lai <[email protected]>

* The label best describing this change is "refactor".: remove unnecessary imports from test files

- Remove `import sys` from `test_flyte_file.py`
- Remove `import sys` from `test_type_engine.py`

Signed-off-by: jason.lai <[email protected]>

* test: add skipif marks to test functions for dataclass and enum in dataclassjsonmixin

- Add a skipif mark to the `test_flyte_file_in_dataclass` function
- Add a skipif mark to the parametrize block in the `test_enum_in_dataclassjsonmixin` function

Signed-off-by: jason.lai <[email protected]>

* test: add import statement and skip test in unit test file

- Add a `can_import` import statement from `tests.flytekit.unit.core.test_flyte_file`
- Add a `pytest.mark.skipif` decorator with a reason for skipping the test

Signed-off-by: jason.lai <[email protected]>

* test: remove unnecessary tests in core and type_hints modules

- Remove the `can_import` import statement in `tests/flytekit/unit/core/test_type_engine.py`
- Remove the `test_flyte_file_in_dataclass` test in `tests/flytekit/unit/core/test_type_engine.py`
- Remove the `test_flyte_file_in_dataclassjsonmixin` test in `tests/flytekit/unit/core/test_type_engine.py`
- Remove the `test_enum_in_dataclassjsonmixin` test in `tests/flytekit/unit/core/test_type_engine.py`
- Remove the `test_dict_to_literal_map` test in `tests/flytekit/unit/core/test_type_engine.py`
- Remove the `test_wf1_with_sql_with_patch` test in `tests/flytekit/unit/core/test_type_hints.py`
- Remove the `test_flyte_file_in_dataclass` test in `tests/flytekit/unit/core/test_type_hints.py`

Signed-off-by: jason.lai <[email protected]>

* chore: remove custom type files

- Delete `custom_type_example.py`
- Delete `custom_type_wf.py`

Signed-off-by: jason.lai <[email protected]>

* refactor: remove unnecessary validation in file transformations

- Skip validation for remote files in the `FlyteFilePathTransformer` class
- Remove unnecessary validation in the `test_flyte_file_in_dataclass` and `test_flyte_file_in_dataclassjsonmixin` tests
- Remove unnecessary validation in the `test_dict_to_literal_map` test

Signed-off-by: jason.lai <[email protected]>

* refactor: refactor file validation and test updates

- Remove unnecessary code for validating file format
- Update tests for `test_flyte_file_in_dataclass` to use `wf` instead of `ctx`

Signed-off-by: jason.lai <[email protected]>

* refactor: refactor file imports and remove unnecessary code

- Remove unnecessary code that does not affect the functionality of `FlyteFilePathTransformer.get_format()` method.
- Replace the import of `PNGImageFile` with `FlyteFile` in the `test_type_hints.py` file.
- Update the `test_flyte_file_in_dataclass()` method to use `FlyteFile` instead of `PNGImageFile`.

Signed-off-by: jason.lai <[email protected]>

* feat: refactor file type handling

- Modify the `get_mime_type_from_python_type` method to `get_mime_type_from_extension`
- Remove several file type mappings from the `extension_to_mime_type` dictionary
- Add a loop to populate the `extension_to_mime_type` dictionary with all file type mappings
- Modify the `validate_file_type` method to use `get_mime_type_from_extension` instead of `get_mime_type_from_python_type`
- In the `test_get_mime_type_from_extension_success` test, assert the correct mime type for various file extensions
- In the `test_get_mime_type_from_extension_failure` test, assert that a `KeyError` is raised for an unknown file extension

Signed-off-by: jason.lai <[email protected]>

* build: update requirements for installing `python-magic` library

- Add `python-magic-bin` as a requirement when installing on Windows
- Add `python-magic` as a requirement when installing on Darwin or Linux
- Remove `python-magic` as a requirement when installing on Windows

Signed-off-by: jason.lai <[email protected]>

* chore: update `python-magic` usage comments for Windows compatibility

- Comment out the `python-magic-bin` requirement on Windows due to build errors with DLLs for `libmagic`
- Add a TODO comment about finding a solution to support `python-magic` on Windows
- Update the comment about `python-magic` to mention that it is currently used for Mac OS and Linux only
- Add a comment about a related GitHub issue for more details on the `python-magic` usage

Signed-off-by: jason.lai <[email protected]>

---------

Signed-off-by: jason.lai <[email protected]>
RRap0so pushed a commit to RRap0so/flytekit that referenced this pull request Dec 15, 2023
* feat: add file type validation and mime type extraction

- Import the `magic` module in `flytekit/types/file/file.py`
- Add a method `get_mime_type_from_python_type` in `FlyteFilePathTransformer` class in `flytekit/types/file/file.py`
- Add a validation for file type in `FlyteFilePathTransformer.to_literal` method in `flytekit/types/file/file.py`
- Modify `setup.py` to include `python-magic>=0.4.27` as a dependency
- Add a test case `test_real_file_type_in_workflow` in `test_flyte_file.py`

Signed-off-by: jason.lai <[email protected]>

* test: refactor test functions and add new tests

- Modify the test function name from `test_file_type_in_workflow_with_bad_format` to `test_matching_file_types_in_workflow`
- Remove the `print(type(res))` line
- Remove the test function `test_mismatching_file_types`
- Add tests for the `get_mime_type_from_python_type` function

Signed-off-by: jason.lai <[email protected]>

* Based on the file summaries provided, the best label for the commit would be:

- docs: Non-code changes, such as fixing typos or adding new documentation (example scopes: Markdown file)
- refactor: A code change that neither fixes a bug nor adds a feature
- build: Changes that affect the build system or external dependencies (example scopes: gulp, broccoli, npm): update file handling and dependencies

- Fix a typo in the file `flytekit/types/file/file.py`
- Modify the `expected_type` variable assignment in `flytekit/types/file/file.py` for better readability
- Update the `setup.py` file to include the `python-magic` package as a requirement

Signed-off-by: jason.lai <[email protected]>

* chore: update dev requirements and setup.py dependencies

- Add `python-magic` to the dev requirements
- Remove `python-magic>=0.4.27` from the setup.py dependencies

Signed-off-by: jason.lai <[email protected]>

* chore: update dependencies and remove unnecessary dev requirement

- Remove `python-magic` from dev requirements
- Add `python-magic` to setup.py dependencies

Signed-off-by: jason.lai <[email protected]>

* chore: import `magic` module in `file.py`

- Import the `magic` module after the existing imports in `flytekit/types/file/file.py`

Signed-off-by: jason.lai <[email protected]>

* docs: fix typo in error message for incorrect file type

- Fix a typo in the error message for incorrect file type

Signed-off-by: jason.lai <[email protected]>

* build: add libmagic1 package to Dockerfile.dev

- Add the installation of libmagic1 package to the Dockerfile.dev

Signed-off-by: jason.lai <[email protected]>

* feat: refactor file type validation in FlyteFilePathTransformer class

- Add a `validate_file_type` method to the `FlyteFilePathTransformer` class
- Validate the file type in the `to_literal` method
- Remove file type validation from the `to_literal` method for `pathlib.Path` and `str` inputs

Signed-off-by: jason.lai <[email protected]>

* test: add test for invalid file type validation

- Import the `patch` function from `unittest.mock` in `tests/flytekit/unit/core/test_flyte_file.py`
- Add a new test `test_validate_file_type_incorrect` in `tests/flytekit/unit/core/test_flyte_file.py`
- In the new test, mock the return value of `FlyteFilePathTransformer.get_format` and `magic.from_file`
- Test that `transformer.validate_file_type` raises a `ValueError` with the correct message

Signed-off-by: jason.lai <[email protected]>

* refactor: update type hint for `validate_file_type` method

- Update the type hint for the `validate_file_type` method to include the types `typing.Type[FlyteFile]` and `typing.Union[str, os.PathLike]`

Signed-off-by: jason.lai <[email protected]>

* refactor: handle missing `magic` module gracefully

- Remove the import of the `magic` module
- Add a try-except block for importing `magic` and log a warning if it fails
- Modify the `validate_file_type` method to handle the case where `magic` is not installed
- Add a new test for file types with a naked `FlyteFile` in the workflow

Signed-off-by: jason.lai <[email protected]>

* test: refactor file type tests to use `can_import_magic` fixture

- Add a fixture `can_import_magic` to check if `magic` can be imported
- Remove a print statement in `test_file_type_in_workflow_with_bad_format`
- Remove a print statement in `test_matching_file_types_in_workflow`
- Replace the test function `test_mismatching_file_types` with `test_mismatching_file_types(can_import_magic)`
- Add a condition to check if `can_import_magic` is `True` in `test_mismatching_file_types`
- Replace the test function `test_validate_file_type_incorrect` with `test_validate_file_type_incorrect(can_import_magic)`

Signed-off-by: jason.lai <[email protected]>

* Based on the file summaries provided, the best label for this code change would be "refactor". The changes mentioned in the summaries do not fix a bug or introduce a new feature, but rather involve code restructuring and formatting improvements.: refactor import statements, conditional statements, and assertions

- Import statement `import magic` was added (line 40)
- Import statement `import magic` was added (line 45)
- Conditional statement changed from `if can_import_magic == True` to `if can_import_magic` (line 52)
- Assertion message changed from "Incorrect type, expected image/jpeg, got text/plain" to "Incorrect type, expected image/jpeg, got text/plain" (line 58)
- Code within the `with patch.object` block was indented (lines 66-69)
- Conditional statement changed from `if can_import_magic == True` to `if can_import_magic` (line 71)
- Code within the `with patch.object` block was indented (lines 73-76)
- Assertion message changed from "Incorrect file type, expected image/jpeg, got {source_file_mime_type}" to "Incorrect file type, expected image/jpeg, got {source_file_mime_type}" (line 79)

Signed-off-by: jason.lai <[email protected]>

* feat: refactor file handling and introduce hash calculation

- Add a new test for the `FlyteFile` type with an annotated hash method
- Define a `calc_hash` function to calculate the hash of a `FlyteFile` path
- Add a new task `t1` that returns a `HashedFlyteFile`
- Add a new task `t2` that prints the path of a `HashedFlyteFile`
- Add a new workflow `wf` that takes a path and creates a `HashedFlyteFile` and passes it to `t2`
- Call the `wf` workflow with a local dummy file path

Signed-off-by: jason.lai <[email protected]>

* The label that best describes this change is "refactor". This is because the change does not fix a bug or add a new feature, but instead modifies the MIME type for CSV files and adds a new test file.: change MIME type for CSV files and add test file

- Change the MIME type for CSV files from "text/csv" to "text/plain" in `flytekit/types/file/file.py`
- Add a new test file `tests/flytekit/unit/extras/tasks/testdata/test.csv` with the content "1,2"

Signed-off-by: jason.lai <[email protected]>

* Based on the file summaries provided, the best label for the commit would be "fix". This is because the changes mentioned involve fixing typos, fixing assertion error messages, and updating expected values in tests.: fix various issues and improve tests in file handling

- Fix a typo in the logger warning message
- Ignore file type comparison when the file does not exist
- Fix assertion error message in the test for mismatching file types
- Update the expected MIME type for the csv file type in a test
- Add a new test for the annotated hash method, with an assertion error message when the file type is incorrect

Signed-off-by: jason.lai <[email protected]>

* refactor: change MIME types for `hdf5`, `ipynb`, and `onnx` files

- Change the MIME type for `hdf5` files from `application/x-hdf` to `text/plain`
- Change the MIME type for `ipynb` files from `application/x-ipynb+json` to `application/json`
- Change the MIME type for `onnx` files from `application/octet-stream` to `application/json`

Signed-off-by: jason.lai <[email protected]>

* refactor: update MIME types for specific file types

- Modify the MIME type returned for the `hdf5` file type
- Modify the MIME type returned for the `ipynb` file type
- Modify the MIME type returned for the `onnx` file type

Signed-off-by: jason.lai <[email protected]>

* The label best describing this change is "test".: update file.py and test_flyte_file.py for improved file type handling

- Import `mimetypes` module in `file.py`
- Add `pdf`, `txt`, `csv`, and `svg` extensions to `extension_to_mime_type` dictionary in `get_mime_type_from_python_type` method in `file.py`
- Add a new fixture `local_dummy_txt_file` in `test_flyte_file.py`
- Modify `test_matching_file_types_in_workflow` method in `test_flyte_file.py` to accept `local_dummy_txt_file` fixture
- Modify `my_wf` method in `test_matching_file_types_in_workflow` method in `test_flyte_file.py` to accept `path` parameter
- Modify assertions in `test_matching_file_types_in_workflow` method in `test_flyte_file.py` to remove the newline character in the expected output
- Modify `test_file_types_with_naked_flytefile_in_workflow` method in `test_flyte_file.py` to accept `local_dummy_txt_file` fixture
- Modify `my_wf` method in `test_file_types_with_naked_flytefile_in_workflow` method in `test_flyte_file.py` to accept `path` parameter
- Modify assertions in `test_file_types_with_naked_flytefile_in_workflow` method in `test_flyte_file.py` to remove the newline character in the expected output
- Modify `test_mismatching_file_types` method in `test_flyte_file.py`

Signed-off-by: jason.lai <[email protected]>

* feat: refactor file handling and input parameter in test cases

- Modify the `test_input_output_substitution_files` function in `tests/flytekit/unit/extras/tasks/test_shell.py`
- Change the `inputs` parameter in the `test_input_output_substitution_files` function from `kwtypes(f=CSVFile)` to `kwtypes(f=FlyteFile)`
- Modify the data written to the `test.csv` file in `tests/flytekit/unit/extras/tasks/testdata/test.csv`
- Add a new file `write_csv_format_file.py`

Signed-off-by: jason.lai <[email protected]>

* chore: remove write_csv_format_file.py

- Delete the file `write_csv_format_file.py`

Signed-off-by: jason.lai <[email protected]>

* feat: add `python-magic` library for file type validation and logging improvements

- Add `python-magic` to the `dev-requirements.in` file
- Modify the `validate_file_type` method in `file.py` to include debug logging
- Remove `python-magic` from the `setup.py` file
- Modify the `test_flyte_file.py` file:
  - Remove the `can_import_magic` fixture
  - Skip the test functions if `magic` is not installed
  - Modify the `test_mismatching_file_types` function to remove the `can_import_magic` check
  - Modify the `test_validate_file_type_incorrect` function to remove the `can_import_magic` check
  - Modify the `test_flyte_file_type_annotated_hashmethod` function to remove the `can_import_magic` check

Signed-off-by: jason.lai <[email protected]>

* feat: add validation method for file types in FlyteFilePathTransformer

- Add a validation method for file types in FlyteFilePathTransformer

Signed-off-by: jason.lai <[email protected]>

* feat: refactor file handling and add new tests

- Add two new files: `custom_type_example.py` and `custom_type_wf.py`
- Modify the file `flytekit/types/file/file.py`:
  - Remove the try-except block for handling `FileNotFoundError`
  - Modify the logic for comparing file types
- Modify the file `tests/flytekit/unit/core/test_flyte_file.py`:
  - Add a new function `can_import(module_name)`
  - Add a new test `test_file_type_in_workflow_with_bad_format()`
  - Modify the test `test_mismatching_file_types()`
  - Modify the test `test_get_mime_type_from_python_type_failure()`
  - Modify the test `test_validate_file_type_incorrect()`
  - Modify the test `test_flyte_file_type_annotated_hashmethod()`
- Modify the file `tests/flytekit/unit/core/test_type_engine.py`:
  - Import the function `can_import` from `tests/flytekit/unit/core/test_flyte_file.py`
  - Add a new test `test_flyte_file_in_dataclassjsonmixin()`
  - Skip the test `test_flyte_file_in_dataclassjsonmixin()` if `magic` module is imported

Signed-off-by: jason.lai <[email protected]>

* The label best describing this change is "refactor".: remove unnecessary imports from test files

- Remove `import sys` from `test_flyte_file.py`
- Remove `import sys` from `test_type_engine.py`

Signed-off-by: jason.lai <[email protected]>

* test: add skipif marks to test functions for dataclass and enum in dataclassjsonmixin

- Add a skipif mark to the `test_flyte_file_in_dataclass` function
- Add a skipif mark to the parametrize block in the `test_enum_in_dataclassjsonmixin` function

Signed-off-by: jason.lai <[email protected]>

* test: add import statement and skip test in unit test file

- Add a `can_import` import statement from `tests.flytekit.unit.core.test_flyte_file`
- Add a `pytest.mark.skipif` decorator with a reason for skipping the test

Signed-off-by: jason.lai <[email protected]>

* test: remove unnecessary tests in core and type_hints modules

- Remove the `can_import` import statement in `tests/flytekit/unit/core/test_type_engine.py`
- Remove the `test_flyte_file_in_dataclass` test in `tests/flytekit/unit/core/test_type_engine.py`
- Remove the `test_flyte_file_in_dataclassjsonmixin` test in `tests/flytekit/unit/core/test_type_engine.py`
- Remove the `test_enum_in_dataclassjsonmixin` test in `tests/flytekit/unit/core/test_type_engine.py`
- Remove the `test_dict_to_literal_map` test in `tests/flytekit/unit/core/test_type_engine.py`
- Remove the `test_wf1_with_sql_with_patch` test in `tests/flytekit/unit/core/test_type_hints.py`
- Remove the `test_flyte_file_in_dataclass` test in `tests/flytekit/unit/core/test_type_hints.py`

Signed-off-by: jason.lai <[email protected]>

* chore: remove custom type files

- Delete `custom_type_example.py`
- Delete `custom_type_wf.py`

Signed-off-by: jason.lai <[email protected]>

* refactor: remove unnecessary validation in file transformations

- Skip validation for remote files in the `FlyteFilePathTransformer` class
- Remove unnecessary validation in the `test_flyte_file_in_dataclass` and `test_flyte_file_in_dataclassjsonmixin` tests
- Remove unnecessary validation in the `test_dict_to_literal_map` test

Signed-off-by: jason.lai <[email protected]>

* refactor: refactor file validation and test updates

- Remove unnecessary code for validating file format
- Update tests for `test_flyte_file_in_dataclass` to use `wf` instead of `ctx`

Signed-off-by: jason.lai <[email protected]>

* refactor: refactor file imports and remove unnecessary code

- Remove unnecessary code that does not affect the functionality of `FlyteFilePathTransformer.get_format()` method.
- Replace the import of `PNGImageFile` with `FlyteFile` in the `test_type_hints.py` file.
- Update the `test_flyte_file_in_dataclass()` method to use `FlyteFile` instead of `PNGImageFile`.

Signed-off-by: jason.lai <[email protected]>

* feat: refactor file type handling

- Modify the `get_mime_type_from_python_type` method to `get_mime_type_from_extension`
- Remove several file type mappings from the `extension_to_mime_type` dictionary
- Add a loop to populate the `extension_to_mime_type` dictionary with all file type mappings
- Modify the `validate_file_type` method to use `get_mime_type_from_extension` instead of `get_mime_type_from_python_type`
- In the `test_get_mime_type_from_extension_success` test, assert the correct mime type for various file extensions
- In the `test_get_mime_type_from_extension_failure` test, assert that a `KeyError` is raised for an unknown file extension

Signed-off-by: jason.lai <[email protected]>

* build: update requirements for installing `python-magic` library

- Add `python-magic-bin` as a requirement when installing on Windows
- Add `python-magic` as a requirement when installing on Darwin or Linux
- Remove `python-magic` as a requirement when installing on Windows

Signed-off-by: jason.lai <[email protected]>

* chore: update `python-magic` usage comments for Windows compatibility

- Comment out the `python-magic-bin` requirement on Windows due to build errors with DLLs for `libmagic`
- Add a TODO comment about finding a solution to support `python-magic` on Windows
- Update the comment about `python-magic` to mention that it is currently used for Mac OS and Linux only
- Add a comment about a related GitHub issue for more details on the `python-magic` usage

Signed-off-by: jason.lai <[email protected]>

---------

Signed-off-by: jason.lai <[email protected]>
Signed-off-by: Rafael Raposo <[email protected]>
@wild-endeavor wild-endeavor mentioned this pull request Mar 29, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

providing validation for specialized FlyteFormat types
5 participants