Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: add file type validation and mime type extraction (flyteorg#1893)
* feat: add file type validation and mime type extraction - Import the `magic` module in `flytekit/types/file/file.py` - Add a method `get_mime_type_from_python_type` in `FlyteFilePathTransformer` class in `flytekit/types/file/file.py` - Add a validation for file type in `FlyteFilePathTransformer.to_literal` method in `flytekit/types/file/file.py` - Modify `setup.py` to include `python-magic>=0.4.27` as a dependency - Add a test case `test_real_file_type_in_workflow` in `test_flyte_file.py` Signed-off-by: jason.lai <[email protected]> * test: refactor test functions and add new tests - Modify the test function name from `test_file_type_in_workflow_with_bad_format` to `test_matching_file_types_in_workflow` - Remove the `print(type(res))` line - Remove the test function `test_mismatching_file_types` - Add tests for the `get_mime_type_from_python_type` function Signed-off-by: jason.lai <[email protected]> * Based on the file summaries provided, the best label for the commit would be: - docs: Non-code changes, such as fixing typos or adding new documentation (example scopes: Markdown file) - refactor: A code change that neither fixes a bug nor adds a feature - build: Changes that affect the build system or external dependencies (example scopes: gulp, broccoli, npm): update file handling and dependencies - Fix a typo in the file `flytekit/types/file/file.py` - Modify the `expected_type` variable assignment in `flytekit/types/file/file.py` for better readability - Update the `setup.py` file to include the `python-magic` package as a requirement Signed-off-by: jason.lai <[email protected]> * chore: update dev requirements and setup.py dependencies - Add `python-magic` to the dev requirements - Remove `python-magic>=0.4.27` from the setup.py dependencies Signed-off-by: jason.lai <[email protected]> * chore: update dependencies and remove unnecessary dev requirement - Remove `python-magic` from dev requirements - Add `python-magic` to setup.py dependencies Signed-off-by: jason.lai <[email protected]> * chore: import `magic` module in `file.py` - Import the `magic` module after the existing imports in `flytekit/types/file/file.py` Signed-off-by: jason.lai <[email protected]> * docs: fix typo in error message for incorrect file type - Fix a typo in the error message for incorrect file type Signed-off-by: jason.lai <[email protected]> * build: add libmagic1 package to Dockerfile.dev - Add the installation of libmagic1 package to the Dockerfile.dev Signed-off-by: jason.lai <[email protected]> * feat: refactor file type validation in FlyteFilePathTransformer class - Add a `validate_file_type` method to the `FlyteFilePathTransformer` class - Validate the file type in the `to_literal` method - Remove file type validation from the `to_literal` method for `pathlib.Path` and `str` inputs Signed-off-by: jason.lai <[email protected]> * test: add test for invalid file type validation - Import the `patch` function from `unittest.mock` in `tests/flytekit/unit/core/test_flyte_file.py` - Add a new test `test_validate_file_type_incorrect` in `tests/flytekit/unit/core/test_flyte_file.py` - In the new test, mock the return value of `FlyteFilePathTransformer.get_format` and `magic.from_file` - Test that `transformer.validate_file_type` raises a `ValueError` with the correct message Signed-off-by: jason.lai <[email protected]> * refactor: update type hint for `validate_file_type` method - Update the type hint for the `validate_file_type` method to include the types `typing.Type[FlyteFile]` and `typing.Union[str, os.PathLike]` Signed-off-by: jason.lai <[email protected]> * refactor: handle missing `magic` module gracefully - Remove the import of the `magic` module - Add a try-except block for importing `magic` and log a warning if it fails - Modify the `validate_file_type` method to handle the case where `magic` is not installed - Add a new test for file types with a naked `FlyteFile` in the workflow Signed-off-by: jason.lai <[email protected]> * test: refactor file type tests to use `can_import_magic` fixture - Add a fixture `can_import_magic` to check if `magic` can be imported - Remove a print statement in `test_file_type_in_workflow_with_bad_format` - Remove a print statement in `test_matching_file_types_in_workflow` - Replace the test function `test_mismatching_file_types` with `test_mismatching_file_types(can_import_magic)` - Add a condition to check if `can_import_magic` is `True` in `test_mismatching_file_types` - Replace the test function `test_validate_file_type_incorrect` with `test_validate_file_type_incorrect(can_import_magic)` Signed-off-by: jason.lai <[email protected]> * Based on the file summaries provided, the best label for this code change would be "refactor". The changes mentioned in the summaries do not fix a bug or introduce a new feature, but rather involve code restructuring and formatting improvements.: refactor import statements, conditional statements, and assertions - Import statement `import magic` was added (line 40) - Import statement `import magic` was added (line 45) - Conditional statement changed from `if can_import_magic == True` to `if can_import_magic` (line 52) - Assertion message changed from "Incorrect type, expected image/jpeg, got text/plain" to "Incorrect type, expected image/jpeg, got text/plain" (line 58) - Code within the `with patch.object` block was indented (lines 66-69) - Conditional statement changed from `if can_import_magic == True` to `if can_import_magic` (line 71) - Code within the `with patch.object` block was indented (lines 73-76) - Assertion message changed from "Incorrect file type, expected image/jpeg, got {source_file_mime_type}" to "Incorrect file type, expected image/jpeg, got {source_file_mime_type}" (line 79) Signed-off-by: jason.lai <[email protected]> * feat: refactor file handling and introduce hash calculation - Add a new test for the `FlyteFile` type with an annotated hash method - Define a `calc_hash` function to calculate the hash of a `FlyteFile` path - Add a new task `t1` that returns a `HashedFlyteFile` - Add a new task `t2` that prints the path of a `HashedFlyteFile` - Add a new workflow `wf` that takes a path and creates a `HashedFlyteFile` and passes it to `t2` - Call the `wf` workflow with a local dummy file path Signed-off-by: jason.lai <[email protected]> * The label that best describes this change is "refactor". This is because the change does not fix a bug or add a new feature, but instead modifies the MIME type for CSV files and adds a new test file.: change MIME type for CSV files and add test file - Change the MIME type for CSV files from "text/csv" to "text/plain" in `flytekit/types/file/file.py` - Add a new test file `tests/flytekit/unit/extras/tasks/testdata/test.csv` with the content "1,2" Signed-off-by: jason.lai <[email protected]> * Based on the file summaries provided, the best label for the commit would be "fix". This is because the changes mentioned involve fixing typos, fixing assertion error messages, and updating expected values in tests.: fix various issues and improve tests in file handling - Fix a typo in the logger warning message - Ignore file type comparison when the file does not exist - Fix assertion error message in the test for mismatching file types - Update the expected MIME type for the csv file type in a test - Add a new test for the annotated hash method, with an assertion error message when the file type is incorrect Signed-off-by: jason.lai <[email protected]> * refactor: change MIME types for `hdf5`, `ipynb`, and `onnx` files - Change the MIME type for `hdf5` files from `application/x-hdf` to `text/plain` - Change the MIME type for `ipynb` files from `application/x-ipynb+json` to `application/json` - Change the MIME type for `onnx` files from `application/octet-stream` to `application/json` Signed-off-by: jason.lai <[email protected]> * refactor: update MIME types for specific file types - Modify the MIME type returned for the `hdf5` file type - Modify the MIME type returned for the `ipynb` file type - Modify the MIME type returned for the `onnx` file type Signed-off-by: jason.lai <[email protected]> * The label best describing this change is "test".: update file.py and test_flyte_file.py for improved file type handling - Import `mimetypes` module in `file.py` - Add `pdf`, `txt`, `csv`, and `svg` extensions to `extension_to_mime_type` dictionary in `get_mime_type_from_python_type` method in `file.py` - Add a new fixture `local_dummy_txt_file` in `test_flyte_file.py` - Modify `test_matching_file_types_in_workflow` method in `test_flyte_file.py` to accept `local_dummy_txt_file` fixture - Modify `my_wf` method in `test_matching_file_types_in_workflow` method in `test_flyte_file.py` to accept `path` parameter - Modify assertions in `test_matching_file_types_in_workflow` method in `test_flyte_file.py` to remove the newline character in the expected output - Modify `test_file_types_with_naked_flytefile_in_workflow` method in `test_flyte_file.py` to accept `local_dummy_txt_file` fixture - Modify `my_wf` method in `test_file_types_with_naked_flytefile_in_workflow` method in `test_flyte_file.py` to accept `path` parameter - Modify assertions in `test_file_types_with_naked_flytefile_in_workflow` method in `test_flyte_file.py` to remove the newline character in the expected output - Modify `test_mismatching_file_types` method in `test_flyte_file.py` Signed-off-by: jason.lai <[email protected]> * feat: refactor file handling and input parameter in test cases - Modify the `test_input_output_substitution_files` function in `tests/flytekit/unit/extras/tasks/test_shell.py` - Change the `inputs` parameter in the `test_input_output_substitution_files` function from `kwtypes(f=CSVFile)` to `kwtypes(f=FlyteFile)` - Modify the data written to the `test.csv` file in `tests/flytekit/unit/extras/tasks/testdata/test.csv` - Add a new file `write_csv_format_file.py` Signed-off-by: jason.lai <[email protected]> * chore: remove write_csv_format_file.py - Delete the file `write_csv_format_file.py` Signed-off-by: jason.lai <[email protected]> * feat: add `python-magic` library for file type validation and logging improvements - Add `python-magic` to the `dev-requirements.in` file - Modify the `validate_file_type` method in `file.py` to include debug logging - Remove `python-magic` from the `setup.py` file - Modify the `test_flyte_file.py` file: - Remove the `can_import_magic` fixture - Skip the test functions if `magic` is not installed - Modify the `test_mismatching_file_types` function to remove the `can_import_magic` check - Modify the `test_validate_file_type_incorrect` function to remove the `can_import_magic` check - Modify the `test_flyte_file_type_annotated_hashmethod` function to remove the `can_import_magic` check Signed-off-by: jason.lai <[email protected]> * feat: add validation method for file types in FlyteFilePathTransformer - Add a validation method for file types in FlyteFilePathTransformer Signed-off-by: jason.lai <[email protected]> * feat: refactor file handling and add new tests - Add two new files: `custom_type_example.py` and `custom_type_wf.py` - Modify the file `flytekit/types/file/file.py`: - Remove the try-except block for handling `FileNotFoundError` - Modify the logic for comparing file types - Modify the file `tests/flytekit/unit/core/test_flyte_file.py`: - Add a new function `can_import(module_name)` - Add a new test `test_file_type_in_workflow_with_bad_format()` - Modify the test `test_mismatching_file_types()` - Modify the test `test_get_mime_type_from_python_type_failure()` - Modify the test `test_validate_file_type_incorrect()` - Modify the test `test_flyte_file_type_annotated_hashmethod()` - Modify the file `tests/flytekit/unit/core/test_type_engine.py`: - Import the function `can_import` from `tests/flytekit/unit/core/test_flyte_file.py` - Add a new test `test_flyte_file_in_dataclassjsonmixin()` - Skip the test `test_flyte_file_in_dataclassjsonmixin()` if `magic` module is imported Signed-off-by: jason.lai <[email protected]> * The label best describing this change is "refactor".: remove unnecessary imports from test files - Remove `import sys` from `test_flyte_file.py` - Remove `import sys` from `test_type_engine.py` Signed-off-by: jason.lai <[email protected]> * test: add skipif marks to test functions for dataclass and enum in dataclassjsonmixin - Add a skipif mark to the `test_flyte_file_in_dataclass` function - Add a skipif mark to the parametrize block in the `test_enum_in_dataclassjsonmixin` function Signed-off-by: jason.lai <[email protected]> * test: add import statement and skip test in unit test file - Add a `can_import` import statement from `tests.flytekit.unit.core.test_flyte_file` - Add a `pytest.mark.skipif` decorator with a reason for skipping the test Signed-off-by: jason.lai <[email protected]> * test: remove unnecessary tests in core and type_hints modules - Remove the `can_import` import statement in `tests/flytekit/unit/core/test_type_engine.py` - Remove the `test_flyte_file_in_dataclass` test in `tests/flytekit/unit/core/test_type_engine.py` - Remove the `test_flyte_file_in_dataclassjsonmixin` test in `tests/flytekit/unit/core/test_type_engine.py` - Remove the `test_enum_in_dataclassjsonmixin` test in `tests/flytekit/unit/core/test_type_engine.py` - Remove the `test_dict_to_literal_map` test in `tests/flytekit/unit/core/test_type_engine.py` - Remove the `test_wf1_with_sql_with_patch` test in `tests/flytekit/unit/core/test_type_hints.py` - Remove the `test_flyte_file_in_dataclass` test in `tests/flytekit/unit/core/test_type_hints.py` Signed-off-by: jason.lai <[email protected]> * chore: remove custom type files - Delete `custom_type_example.py` - Delete `custom_type_wf.py` Signed-off-by: jason.lai <[email protected]> * refactor: remove unnecessary validation in file transformations - Skip validation for remote files in the `FlyteFilePathTransformer` class - Remove unnecessary validation in the `test_flyte_file_in_dataclass` and `test_flyte_file_in_dataclassjsonmixin` tests - Remove unnecessary validation in the `test_dict_to_literal_map` test Signed-off-by: jason.lai <[email protected]> * refactor: refactor file validation and test updates - Remove unnecessary code for validating file format - Update tests for `test_flyte_file_in_dataclass` to use `wf` instead of `ctx` Signed-off-by: jason.lai <[email protected]> * refactor: refactor file imports and remove unnecessary code - Remove unnecessary code that does not affect the functionality of `FlyteFilePathTransformer.get_format()` method. - Replace the import of `PNGImageFile` with `FlyteFile` in the `test_type_hints.py` file. - Update the `test_flyte_file_in_dataclass()` method to use `FlyteFile` instead of `PNGImageFile`. Signed-off-by: jason.lai <[email protected]> * feat: refactor file type handling - Modify the `get_mime_type_from_python_type` method to `get_mime_type_from_extension` - Remove several file type mappings from the `extension_to_mime_type` dictionary - Add a loop to populate the `extension_to_mime_type` dictionary with all file type mappings - Modify the `validate_file_type` method to use `get_mime_type_from_extension` instead of `get_mime_type_from_python_type` - In the `test_get_mime_type_from_extension_success` test, assert the correct mime type for various file extensions - In the `test_get_mime_type_from_extension_failure` test, assert that a `KeyError` is raised for an unknown file extension Signed-off-by: jason.lai <[email protected]> * build: update requirements for installing `python-magic` library - Add `python-magic-bin` as a requirement when installing on Windows - Add `python-magic` as a requirement when installing on Darwin or Linux - Remove `python-magic` as a requirement when installing on Windows Signed-off-by: jason.lai <[email protected]> * chore: update `python-magic` usage comments for Windows compatibility - Comment out the `python-magic-bin` requirement on Windows due to build errors with DLLs for `libmagic` - Add a TODO comment about finding a solution to support `python-magic` on Windows - Update the comment about `python-magic` to mention that it is currently used for Mac OS and Linux only - Add a comment about a related GitHub issue for more details on the `python-magic` usage Signed-off-by: jason.lai <[email protected]> --------- Signed-off-by: jason.lai <[email protected]>
- Loading branch information