Skip to content

Commit

Permalink
feat: add file type validation and mime type extraction (flyteorg#1893)
Browse files Browse the repository at this point in the history
* feat: add file type validation and mime type extraction

- Import the `magic` module in `flytekit/types/file/file.py`
- Add a method `get_mime_type_from_python_type` in `FlyteFilePathTransformer` class in `flytekit/types/file/file.py`
- Add a validation for file type in `FlyteFilePathTransformer.to_literal` method in `flytekit/types/file/file.py`
- Modify `setup.py` to include `python-magic>=0.4.27` as a dependency
- Add a test case `test_real_file_type_in_workflow` in `test_flyte_file.py`

Signed-off-by: jason.lai <[email protected]>

* test: refactor test functions and add new tests

- Modify the test function name from `test_file_type_in_workflow_with_bad_format` to `test_matching_file_types_in_workflow`
- Remove the `print(type(res))` line
- Remove the test function `test_mismatching_file_types`
- Add tests for the `get_mime_type_from_python_type` function

Signed-off-by: jason.lai <[email protected]>

* Based on the file summaries provided, the best label for the commit would be:

- docs: Non-code changes, such as fixing typos or adding new documentation (example scopes: Markdown file)
- refactor: A code change that neither fixes a bug nor adds a feature
- build: Changes that affect the build system or external dependencies (example scopes: gulp, broccoli, npm): update file handling and dependencies

- Fix a typo in the file `flytekit/types/file/file.py`
- Modify the `expected_type` variable assignment in `flytekit/types/file/file.py` for better readability
- Update the `setup.py` file to include the `python-magic` package as a requirement

Signed-off-by: jason.lai <[email protected]>

* chore: update dev requirements and setup.py dependencies

- Add `python-magic` to the dev requirements
- Remove `python-magic>=0.4.27` from the setup.py dependencies

Signed-off-by: jason.lai <[email protected]>

* chore: update dependencies and remove unnecessary dev requirement

- Remove `python-magic` from dev requirements
- Add `python-magic` to setup.py dependencies

Signed-off-by: jason.lai <[email protected]>

* chore: import `magic` module in `file.py`

- Import the `magic` module after the existing imports in `flytekit/types/file/file.py`

Signed-off-by: jason.lai <[email protected]>

* docs: fix typo in error message for incorrect file type

- Fix a typo in the error message for incorrect file type

Signed-off-by: jason.lai <[email protected]>

* build: add libmagic1 package to Dockerfile.dev

- Add the installation of libmagic1 package to the Dockerfile.dev

Signed-off-by: jason.lai <[email protected]>

* feat: refactor file type validation in FlyteFilePathTransformer class

- Add a `validate_file_type` method to the `FlyteFilePathTransformer` class
- Validate the file type in the `to_literal` method
- Remove file type validation from the `to_literal` method for `pathlib.Path` and `str` inputs

Signed-off-by: jason.lai <[email protected]>

* test: add test for invalid file type validation

- Import the `patch` function from `unittest.mock` in `tests/flytekit/unit/core/test_flyte_file.py`
- Add a new test `test_validate_file_type_incorrect` in `tests/flytekit/unit/core/test_flyte_file.py`
- In the new test, mock the return value of `FlyteFilePathTransformer.get_format` and `magic.from_file`
- Test that `transformer.validate_file_type` raises a `ValueError` with the correct message

Signed-off-by: jason.lai <[email protected]>

* refactor: update type hint for `validate_file_type` method

- Update the type hint for the `validate_file_type` method to include the types `typing.Type[FlyteFile]` and `typing.Union[str, os.PathLike]`

Signed-off-by: jason.lai <[email protected]>

* refactor: handle missing `magic` module gracefully

- Remove the import of the `magic` module
- Add a try-except block for importing `magic` and log a warning if it fails
- Modify the `validate_file_type` method to handle the case where `magic` is not installed
- Add a new test for file types with a naked `FlyteFile` in the workflow

Signed-off-by: jason.lai <[email protected]>

* test: refactor file type tests to use `can_import_magic` fixture

- Add a fixture `can_import_magic` to check if `magic` can be imported
- Remove a print statement in `test_file_type_in_workflow_with_bad_format`
- Remove a print statement in `test_matching_file_types_in_workflow`
- Replace the test function `test_mismatching_file_types` with `test_mismatching_file_types(can_import_magic)`
- Add a condition to check if `can_import_magic` is `True` in `test_mismatching_file_types`
- Replace the test function `test_validate_file_type_incorrect` with `test_validate_file_type_incorrect(can_import_magic)`

Signed-off-by: jason.lai <[email protected]>

* Based on the file summaries provided, the best label for this code change would be "refactor". The changes mentioned in the summaries do not fix a bug or introduce a new feature, but rather involve code restructuring and formatting improvements.: refactor import statements, conditional statements, and assertions

- Import statement `import magic` was added (line 40)
- Import statement `import magic` was added (line 45)
- Conditional statement changed from `if can_import_magic == True` to `if can_import_magic` (line 52)
- Assertion message changed from "Incorrect type, expected image/jpeg, got text/plain" to "Incorrect type, expected image/jpeg, got text/plain" (line 58)
- Code within the `with patch.object` block was indented (lines 66-69)
- Conditional statement changed from `if can_import_magic == True` to `if can_import_magic` (line 71)
- Code within the `with patch.object` block was indented (lines 73-76)
- Assertion message changed from "Incorrect file type, expected image/jpeg, got {source_file_mime_type}" to "Incorrect file type, expected image/jpeg, got {source_file_mime_type}" (line 79)

Signed-off-by: jason.lai <[email protected]>

* feat: refactor file handling and introduce hash calculation

- Add a new test for the `FlyteFile` type with an annotated hash method
- Define a `calc_hash` function to calculate the hash of a `FlyteFile` path
- Add a new task `t1` that returns a `HashedFlyteFile`
- Add a new task `t2` that prints the path of a `HashedFlyteFile`
- Add a new workflow `wf` that takes a path and creates a `HashedFlyteFile` and passes it to `t2`
- Call the `wf` workflow with a local dummy file path

Signed-off-by: jason.lai <[email protected]>

* The label that best describes this change is "refactor". This is because the change does not fix a bug or add a new feature, but instead modifies the MIME type for CSV files and adds a new test file.: change MIME type for CSV files and add test file

- Change the MIME type for CSV files from "text/csv" to "text/plain" in `flytekit/types/file/file.py`
- Add a new test file `tests/flytekit/unit/extras/tasks/testdata/test.csv` with the content "1,2"

Signed-off-by: jason.lai <[email protected]>

* Based on the file summaries provided, the best label for the commit would be "fix". This is because the changes mentioned involve fixing typos, fixing assertion error messages, and updating expected values in tests.: fix various issues and improve tests in file handling

- Fix a typo in the logger warning message
- Ignore file type comparison when the file does not exist
- Fix assertion error message in the test for mismatching file types
- Update the expected MIME type for the csv file type in a test
- Add a new test for the annotated hash method, with an assertion error message when the file type is incorrect

Signed-off-by: jason.lai <[email protected]>

* refactor: change MIME types for `hdf5`, `ipynb`, and `onnx` files

- Change the MIME type for `hdf5` files from `application/x-hdf` to `text/plain`
- Change the MIME type for `ipynb` files from `application/x-ipynb+json` to `application/json`
- Change the MIME type for `onnx` files from `application/octet-stream` to `application/json`

Signed-off-by: jason.lai <[email protected]>

* refactor: update MIME types for specific file types

- Modify the MIME type returned for the `hdf5` file type
- Modify the MIME type returned for the `ipynb` file type
- Modify the MIME type returned for the `onnx` file type

Signed-off-by: jason.lai <[email protected]>

* The label best describing this change is "test".: update file.py and test_flyte_file.py for improved file type handling

- Import `mimetypes` module in `file.py`
- Add `pdf`, `txt`, `csv`, and `svg` extensions to `extension_to_mime_type` dictionary in `get_mime_type_from_python_type` method in `file.py`
- Add a new fixture `local_dummy_txt_file` in `test_flyte_file.py`
- Modify `test_matching_file_types_in_workflow` method in `test_flyte_file.py` to accept `local_dummy_txt_file` fixture
- Modify `my_wf` method in `test_matching_file_types_in_workflow` method in `test_flyte_file.py` to accept `path` parameter
- Modify assertions in `test_matching_file_types_in_workflow` method in `test_flyte_file.py` to remove the newline character in the expected output
- Modify `test_file_types_with_naked_flytefile_in_workflow` method in `test_flyte_file.py` to accept `local_dummy_txt_file` fixture
- Modify `my_wf` method in `test_file_types_with_naked_flytefile_in_workflow` method in `test_flyte_file.py` to accept `path` parameter
- Modify assertions in `test_file_types_with_naked_flytefile_in_workflow` method in `test_flyte_file.py` to remove the newline character in the expected output
- Modify `test_mismatching_file_types` method in `test_flyte_file.py`

Signed-off-by: jason.lai <[email protected]>

* feat: refactor file handling and input parameter in test cases

- Modify the `test_input_output_substitution_files` function in `tests/flytekit/unit/extras/tasks/test_shell.py`
- Change the `inputs` parameter in the `test_input_output_substitution_files` function from `kwtypes(f=CSVFile)` to `kwtypes(f=FlyteFile)`
- Modify the data written to the `test.csv` file in `tests/flytekit/unit/extras/tasks/testdata/test.csv`
- Add a new file `write_csv_format_file.py`

Signed-off-by: jason.lai <[email protected]>

* chore: remove write_csv_format_file.py

- Delete the file `write_csv_format_file.py`

Signed-off-by: jason.lai <[email protected]>

* feat: add `python-magic` library for file type validation and logging improvements

- Add `python-magic` to the `dev-requirements.in` file
- Modify the `validate_file_type` method in `file.py` to include debug logging
- Remove `python-magic` from the `setup.py` file
- Modify the `test_flyte_file.py` file:
  - Remove the `can_import_magic` fixture
  - Skip the test functions if `magic` is not installed
  - Modify the `test_mismatching_file_types` function to remove the `can_import_magic` check
  - Modify the `test_validate_file_type_incorrect` function to remove the `can_import_magic` check
  - Modify the `test_flyte_file_type_annotated_hashmethod` function to remove the `can_import_magic` check

Signed-off-by: jason.lai <[email protected]>

* feat: add validation method for file types in FlyteFilePathTransformer

- Add a validation method for file types in FlyteFilePathTransformer

Signed-off-by: jason.lai <[email protected]>

* feat: refactor file handling and add new tests

- Add two new files: `custom_type_example.py` and `custom_type_wf.py`
- Modify the file `flytekit/types/file/file.py`:
  - Remove the try-except block for handling `FileNotFoundError`
  - Modify the logic for comparing file types
- Modify the file `tests/flytekit/unit/core/test_flyte_file.py`:
  - Add a new function `can_import(module_name)`
  - Add a new test `test_file_type_in_workflow_with_bad_format()`
  - Modify the test `test_mismatching_file_types()`
  - Modify the test `test_get_mime_type_from_python_type_failure()`
  - Modify the test `test_validate_file_type_incorrect()`
  - Modify the test `test_flyte_file_type_annotated_hashmethod()`
- Modify the file `tests/flytekit/unit/core/test_type_engine.py`:
  - Import the function `can_import` from `tests/flytekit/unit/core/test_flyte_file.py`
  - Add a new test `test_flyte_file_in_dataclassjsonmixin()`
  - Skip the test `test_flyte_file_in_dataclassjsonmixin()` if `magic` module is imported

Signed-off-by: jason.lai <[email protected]>

* The label best describing this change is "refactor".: remove unnecessary imports from test files

- Remove `import sys` from `test_flyte_file.py`
- Remove `import sys` from `test_type_engine.py`

Signed-off-by: jason.lai <[email protected]>

* test: add skipif marks to test functions for dataclass and enum in dataclassjsonmixin

- Add a skipif mark to the `test_flyte_file_in_dataclass` function
- Add a skipif mark to the parametrize block in the `test_enum_in_dataclassjsonmixin` function

Signed-off-by: jason.lai <[email protected]>

* test: add import statement and skip test in unit test file

- Add a `can_import` import statement from `tests.flytekit.unit.core.test_flyte_file`
- Add a `pytest.mark.skipif` decorator with a reason for skipping the test

Signed-off-by: jason.lai <[email protected]>

* test: remove unnecessary tests in core and type_hints modules

- Remove the `can_import` import statement in `tests/flytekit/unit/core/test_type_engine.py`
- Remove the `test_flyte_file_in_dataclass` test in `tests/flytekit/unit/core/test_type_engine.py`
- Remove the `test_flyte_file_in_dataclassjsonmixin` test in `tests/flytekit/unit/core/test_type_engine.py`
- Remove the `test_enum_in_dataclassjsonmixin` test in `tests/flytekit/unit/core/test_type_engine.py`
- Remove the `test_dict_to_literal_map` test in `tests/flytekit/unit/core/test_type_engine.py`
- Remove the `test_wf1_with_sql_with_patch` test in `tests/flytekit/unit/core/test_type_hints.py`
- Remove the `test_flyte_file_in_dataclass` test in `tests/flytekit/unit/core/test_type_hints.py`

Signed-off-by: jason.lai <[email protected]>

* chore: remove custom type files

- Delete `custom_type_example.py`
- Delete `custom_type_wf.py`

Signed-off-by: jason.lai <[email protected]>

* refactor: remove unnecessary validation in file transformations

- Skip validation for remote files in the `FlyteFilePathTransformer` class
- Remove unnecessary validation in the `test_flyte_file_in_dataclass` and `test_flyte_file_in_dataclassjsonmixin` tests
- Remove unnecessary validation in the `test_dict_to_literal_map` test

Signed-off-by: jason.lai <[email protected]>

* refactor: refactor file validation and test updates

- Remove unnecessary code for validating file format
- Update tests for `test_flyte_file_in_dataclass` to use `wf` instead of `ctx`

Signed-off-by: jason.lai <[email protected]>

* refactor: refactor file imports and remove unnecessary code

- Remove unnecessary code that does not affect the functionality of `FlyteFilePathTransformer.get_format()` method.
- Replace the import of `PNGImageFile` with `FlyteFile` in the `test_type_hints.py` file.
- Update the `test_flyte_file_in_dataclass()` method to use `FlyteFile` instead of `PNGImageFile`.

Signed-off-by: jason.lai <[email protected]>

* feat: refactor file type handling

- Modify the `get_mime_type_from_python_type` method to `get_mime_type_from_extension`
- Remove several file type mappings from the `extension_to_mime_type` dictionary
- Add a loop to populate the `extension_to_mime_type` dictionary with all file type mappings
- Modify the `validate_file_type` method to use `get_mime_type_from_extension` instead of `get_mime_type_from_python_type`
- In the `test_get_mime_type_from_extension_success` test, assert the correct mime type for various file extensions
- In the `test_get_mime_type_from_extension_failure` test, assert that a `KeyError` is raised for an unknown file extension

Signed-off-by: jason.lai <[email protected]>

* build: update requirements for installing `python-magic` library

- Add `python-magic-bin` as a requirement when installing on Windows
- Add `python-magic` as a requirement when installing on Darwin or Linux
- Remove `python-magic` as a requirement when installing on Windows

Signed-off-by: jason.lai <[email protected]>

* chore: update `python-magic` usage comments for Windows compatibility

- Comment out the `python-magic-bin` requirement on Windows due to build errors with DLLs for `libmagic`
- Add a TODO comment about finding a solution to support `python-magic` on Windows
- Update the comment about `python-magic` to mention that it is currently used for Mac OS and Linux only
- Add a comment about a related GitHub issue for more details on the `python-magic` usage

Signed-off-by: jason.lai <[email protected]>

---------

Signed-off-by: jason.lai <[email protected]>
  • Loading branch information
jasonlai1218 authored and mark-thm committed Dec 8, 2023
1 parent 1ae7134 commit 4610312
Show file tree
Hide file tree
Showing 7 changed files with 203 additions and 11 deletions.
2 changes: 1 addition & 1 deletion Dockerfile.dev
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ WORKDIR /root

ARG VERSION

RUN apt-get update && apt-get install build-essential vim -y
RUN apt-get update && apt-get install build-essential vim libmagic1 -y

COPY . /flytekit

Expand Down
6 changes: 6 additions & 0 deletions dev-requirements.in
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,12 @@ torch<=1.12.1; python_version<'3.11'
# pytorch 2 supports python 3.11
torch<=2.0.0; python_version>='3.11' or platform_system!='Windows'

# TODO: Currently, the python-magic library causes build errors on Windows due to its dependency on DLLs for libmagic.
# We have temporarily disabled this feature on Windows and are using python-magic for Mac OS and Linux instead.
# For more details, see the related GitHub issue.
# Once a solution is found, this should be updated to support Windows as well.
python-magic; (platform_system=='Darwin' or platform_system=='Linux')

pillow
scikit-learn
types-protobuf
Expand Down
54 changes: 54 additions & 0 deletions flytekit/types/file/file.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from __future__ import annotations

import mimetypes
import os
import pathlib
import typing
Expand Down Expand Up @@ -324,6 +325,57 @@ def assert_type(
def get_literal_type(self, t: typing.Union[typing.Type[FlyteFile], os.PathLike]) -> LiteralType:
return LiteralType(blob=self._blob_type(format=FlyteFilePathTransformer.get_format(t)))

def get_mime_type_from_extension(self, extension: str) -> str:
extension_to_mime_type = {
"hdf5": "text/plain",
"joblib": "application/octet-stream",
"python_pickle": "application/octet-stream",
"ipynb": "application/json",
"onnx": "application/json",
"tfrecord": "application/octet-stream",
}

for ext, mimetype in mimetypes.types_map.items():
extension_to_mime_type[ext.split(".")[1]] = mimetype

return extension_to_mime_type[extension]

def validate_file_type(
self, python_type: typing.Type[FlyteFile], source_path: typing.Union[str, os.PathLike]
) -> None:
"""
This method validates the type of the file at source_path against the expected python_type.
It uses the magic library to determine the real type of the file. If the magic library is not installed,
it logs a debug message and returns. If the actual file does not exist, it returns without raising an error.
:param python_type: The expected type of the file
:param source_path: The path to the file to validate
:raises ValueError: If the real type of the file is not the same as the expected python_type
"""
if FlyteFilePathTransformer.get_format(python_type) == "":
return

try:
# isolate the exception to the libmagic import
import magic

except ImportError as e:
logger.debug(f"Libmagic is not installed. Error message: {e}")
return

ctx = FlyteContext.current_context()
if ctx.file_access.is_remote(source_path):
# Skip validation for remote files. One of the use cases for FlyteFile is to point to remote files,
# you might have access to a remote file (e.g., in s3) that you want to pass to a Flyte workflow.
# Therefore, we should only validate FlyteFiles for which their path is considered local.
return

if FlyteFilePathTransformer.get_format(python_type):
real_type = magic.from_file(source_path, mime=True)
expected_type = self.get_mime_type_from_extension(FlyteFilePathTransformer.get_format(python_type))
if real_type != expected_type:
raise ValueError(f"Incorrect file type, expected {expected_type}, got {real_type}")

def to_literal(
self,
ctx: FlyteContext,
Expand All @@ -348,6 +400,7 @@ def to_literal(

if isinstance(python_val, FlyteFile):
source_path = python_val.path
self.validate_file_type(python_type, source_path)

# If the object has a remote source, then we just convert it back. This means that if someone is just
# going back and forth between a FlyteFile Python value and a Blob Flyte IDL value, we don't do anything.
Expand All @@ -373,6 +426,7 @@ def to_literal(
elif isinstance(python_val, pathlib.Path) or isinstance(python_val, str):
source_path = str(python_val)
if issubclass(python_type, FlyteFile):
self.validate_file_type(python_type, source_path)
if ctx.file_access.is_remote(source_path):
should_upload = False
else:
Expand Down
133 changes: 131 additions & 2 deletions tests/flytekit/unit/core/test_flyte_file.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import pathlib
import tempfile
import typing
from unittest.mock import MagicMock
from unittest.mock import MagicMock, patch

import pytest
from typing_extensions import Annotated
Expand All @@ -19,7 +19,7 @@
from flytekit.core.workflow import workflow
from flytekit.models.core.types import BlobType
from flytekit.models.literals import LiteralMap
from flytekit.types.file.file import FlyteFile
from flytekit.types.file.file import FlyteFile, FlyteFilePathTransformer


# Fixture that ensures a dummy local file
Expand All @@ -34,6 +34,25 @@ def local_dummy_file():
os.remove(path)


@pytest.fixture
def local_dummy_txt_file():
fd, path = tempfile.mkstemp(suffix=".txt")
try:
with os.fdopen(fd, "w") as tmp:
tmp.write("Hello World")
yield path
finally:
os.remove(path)


def can_import(module_name) -> bool:
try:
__import__(module_name)
return True
except ImportError:
return False


def test_file_type_in_workflow_with_bad_format():
@task
def t1() -> FlyteFile[typing.TypeVar("txt")]:
Expand All @@ -52,6 +71,116 @@ def my_wf() -> FlyteFile[typing.TypeVar("txt")]:
assert fh.read() == "Hello World\n"


def test_matching_file_types_in_workflow(local_dummy_txt_file):
# TXT
@task
def t1(path: FlyteFile[typing.TypeVar("txt")]) -> FlyteFile[typing.TypeVar("txt")]:
return path

@workflow
def my_wf(path: FlyteFile[typing.TypeVar("txt")]) -> FlyteFile[typing.TypeVar("txt")]:
f = t1(path=path)
return f

res = my_wf(path=local_dummy_txt_file)
with open(res, "r") as fh:
assert fh.read() == "Hello World"


def test_file_types_with_naked_flytefile_in_workflow(local_dummy_txt_file):
@task
def t1(path: FlyteFile[typing.TypeVar("txt")]) -> FlyteFile:
return path

@workflow
def my_wf(path: FlyteFile[typing.TypeVar("txt")]) -> FlyteFile:
f = t1(path=path)
return f

res = my_wf(path=local_dummy_txt_file)
with open(res, "r") as fh:
assert fh.read() == "Hello World"


@pytest.mark.skipif(not can_import("magic"), reason="Libmagic is not installed")
def test_mismatching_file_types(local_dummy_txt_file):
@task
def t1(path: FlyteFile[typing.TypeVar("txt")]) -> FlyteFile[typing.TypeVar("jpeg")]:
return path

@workflow
def my_wf(path: FlyteFile[typing.TypeVar("txt")]) -> FlyteFile[typing.TypeVar("jpeg")]:
f = t1(path=path)
return f

with pytest.raises(TypeError) as excinfo:
my_wf(path=local_dummy_txt_file)
assert "Incorrect file type, expected image/jpeg, got text/plain" in str(excinfo.value)


def test_get_mime_type_from_extension_success():
transformer = TypeEngine.get_transformer(FlyteFile)
assert transformer.get_mime_type_from_extension("html") == "text/html"
assert transformer.get_mime_type_from_extension("jpeg") == "image/jpeg"
assert transformer.get_mime_type_from_extension("png") == "image/png"
assert transformer.get_mime_type_from_extension("hdf5") == "text/plain"
assert transformer.get_mime_type_from_extension("joblib") == "application/octet-stream"
assert transformer.get_mime_type_from_extension("pdf") == "application/pdf"
assert transformer.get_mime_type_from_extension("python_pickle") == "application/octet-stream"
assert transformer.get_mime_type_from_extension("ipynb") == "application/json"
assert transformer.get_mime_type_from_extension("svg") == "image/svg+xml"
assert transformer.get_mime_type_from_extension("csv") == "text/csv"
assert transformer.get_mime_type_from_extension("onnx") == "application/json"
assert transformer.get_mime_type_from_extension("tfrecord") == "application/octet-stream"
assert transformer.get_mime_type_from_extension("txt") == "text/plain"


def test_get_mime_type_from_extension_failure():
transformer = TypeEngine.get_transformer(FlyteFile)
with pytest.raises(KeyError):
transformer.get_mime_type_from_extension("unknown_extension")


@pytest.mark.skipif(not can_import("magic"), reason="Libmagic is not installed")
def test_validate_file_type_incorrect():
transformer = TypeEngine.get_transformer(FlyteFile)
source_path = "/tmp/flytekit_test.png"
source_file_mime_type = "image/png"
user_defined_format = "jpeg"

with patch.object(FlyteFilePathTransformer, "get_format", return_value=user_defined_format):
with patch("magic.from_file", return_value=source_file_mime_type):
with pytest.raises(
ValueError, match=f"Incorrect file type, expected image/jpeg, got {source_file_mime_type}"
):
transformer.validate_file_type(user_defined_format, source_path)


@pytest.mark.skipif(not can_import("magic"), reason="Libmagic is not installed")
def test_flyte_file_type_annotated_hashmethod(local_dummy_file):
def calc_hash(ff: FlyteFile) -> str:
return str(ff.path)

HashedFlyteFile = Annotated[FlyteFile["jpeg"], HashMethod(calc_hash)]

@task
def t1(path: str) -> HashedFlyteFile:
return HashedFlyteFile(path)

@task
def t2(ff: HashedFlyteFile) -> None:
print(ff.path)

@workflow
def wf(path: str) -> None:
ff = t1(path=path)
t2(ff=ff)

with pytest.raises(TypeError) as excinfo:
wf(path=local_dummy_file)
assert "Incorrect file type, expected image/jpeg, got text/plain" in str(excinfo.value)


def test_file_handling_remote_default_wf_input():
SAMPLE_DATA = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"

Expand Down
6 changes: 3 additions & 3 deletions tests/flytekit/unit/core/test_type_hints.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@
from flytekit.models.types import LiteralType, SimpleType
from flytekit.tools.translator import get_serializable
from flytekit.types.directory import FlyteDirectory, TensorboardLogs
from flytekit.types.file import FlyteFile, PNGImageFile
from flytekit.types.file import FlyteFile
from flytekit.types.schema import FlyteSchema, SchemaOpenMode
from flytekit.types.structured.structured_dataset import StructuredDataset

Expand Down Expand Up @@ -390,7 +390,7 @@ def test_flyte_file_in_dataclass():
@dataclass
class InnerFileStruct(DataClassJsonMixin):
a: FlyteFile
b: PNGImageFile
b: FlyteFile

@dataclass
class FileStruct(DataClassJsonMixin):
Expand All @@ -400,7 +400,7 @@ class FileStruct(DataClassJsonMixin):
@task
def t1(path: str) -> FileStruct:
file = FlyteFile(path)
fs = FileStruct(a=file, b=InnerFileStruct(a=file, b=PNGImageFile(path)))
fs = FileStruct(a=file, b=InnerFileStruct(a=file, b=FlyteFile(path)))
return fs

@dynamic
Expand Down
9 changes: 4 additions & 5 deletions tests/flytekit/unit/extras/tasks/test_shell.py
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@ def test_input_output_substitution_files():
name="test",
debug=True,
script=script,
inputs=kwtypes(f=CSVFile),
inputs=kwtypes(f=FlyteFile),
output_locs=[
OutputLocation(var="y", var_type=FlyteFile, location="{inputs.f}.mod"),
],
Expand All @@ -127,11 +127,10 @@ def test_input_output_substitution_files():

contents = "1,2,3,4\n"
with tempfile.TemporaryDirectory() as tmp:
csv = os.path.join(tmp, "abc.csv")
print(csv)
with open(csv, "w") as f:
test_data = os.path.join(tmp, "abc.txt")
with open(test_data, "w") as f:
f.write(contents)
y = t(f=csv)
y = t(f=test_data)
assert y.path[-4:] == ".mod"
assert os.path.exists(y.path)
with open(y.path) as f:
Expand Down
4 changes: 4 additions & 0 deletions tests/flytekit/unit/extras/tasks/testdata/test.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
SN,Name,Contribution
1,Linus Torvalds,Linux Kernel
2,Tim Berners-Lee,World Wide Web
3,Guido van Rossum,Python Programming

0 comments on commit 4610312

Please sign in to comment.