Skip to content

Commit

Permalink
clean up data types and I/O files
Browse files Browse the repository at this point in the history
Signed-off-by: nikki everett <[email protected]>
  • Loading branch information
nikki everett committed Apr 17, 2024
1 parent 04c76fe commit 87c1642
Show file tree
Hide file tree
Showing 8 changed files with 194 additions and 921 deletions.
137 changes: 25 additions & 112 deletions docs/user_guide/data_types_and_io/accessing_attributes.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,3 @@
---
jupytext:
cell_metadata_filter: all
formats: md:myst
main_language: python
notebook_metadata_filter: all
text_representation:
extension: .md
format_name: myst
format_version: 0.13
jupytext_version: 1.16.1
kernelspec:
display_name: Python 3
language: python
name: python3
---

+++ {"lines_to_next_cell": 0}

(attribute_access)=

# Accessing attributes
Expand All @@ -25,131 +6,61 @@ kernelspec:
.. tags:: Basic
```

You can directly access attributes on output promises for lists, dicts, dataclasses and combinations of these types in Flyte.
This functionality facilitates the direct passing of output attributes within workflows,
You can directly access attributes on output promises for lists, dicts, dataclasses and combinations of these types in Flyte. This functionality facilitates the direct passing of output attributes within workflows,
enhancing the convenience of working with complex data structures.

To begin, import the required dependencies and define a common task for subsequent use.

```{code-cell}
from dataclasses import dataclass
from dataclasses_json import dataclass_json
from flytekit import task, workflow
```{note}
To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks].
```

To begin, import the required dependencies and define a common task for subsequent use:

@task
def print_message(message: str):
print(message)
return
```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/attribute_access.py
:caption: data_types_and_io/attribute_access.py
:lines: 1-10
```

+++ {"lines_to_next_cell": 0}

## List
You can access an output list using index notation.

:::{important}
Flyte currently does not support output promise access through list slicing.
:::

```{code-cell}
@task
def list_task() -> list[str]:
return ["apple", "banana"]
@workflow
def list_wf():
items = list_task()
first_item = items[0]
print_message(message=first_item)
```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/attribute_access.py
:caption: data_types_and_io/attribute_access.py
:lines: 14-23
```

+++ {"lines_to_next_cell": 0}

## Dictionary
Access the output dictionary by specifying the key.

```{code-cell}
@task
def dict_task() -> dict[str, str]:
return {"fruit": "banana"}
@workflow
def dict_wf():
fruit_dict = dict_task()
print_message(message=fruit_dict["fruit"])
```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/attribute_access.py
:caption: data_types_and_io/attribute_access.py
:lines: 27-35
```

+++ {"lines_to_next_cell": 0}

## Data class
Directly access an attribute of a dataclass.

```{code-cell}
@dataclass_json
@dataclass
class Fruit:
name: str
@task
def dataclass_task() -> Fruit:
return Fruit(name="banana")
@workflow
def dataclass_wf():
fruit_instance = dataclass_task()
print_message(message=fruit_instance.name)
```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/attribute_access.py
:caption: data_types_and_io/attribute_access.py
:lines: 39-53
```

+++ {"lines_to_next_cell": 0}

## Complex type
Combinations of list, dict and dataclass also work effectively.

```{code-cell}
@task
def advance_task() -> (dict[str, list[str]], list[dict[str, str]], dict[str, Fruit]):
return {"fruits": ["banana"]}, [{"fruit": "banana"}], {"fruit": Fruit(name="banana")}
@task
def print_list(fruits: list[str]):
print(fruits)
@task
def print_dict(fruit_dict: dict[str, str]):
print(fruit_dict)
@workflow
def advanced_workflow():
dictionary_list, list_dict, dict_dataclass = advance_task()
print_message(message=dictionary_list["fruits"][0])
print_message(message=list_dict[0]["fruit"])
print_message(message=dict_dataclass["fruit"].name)
print_list(fruits=dictionary_list["fruits"])
print_dict(fruit_dict=list_dict[0])
```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/attribute_access.py
:caption: data_types_and_io/attribute_access.py
:lines: 57-80
```

+++ {"lines_to_next_cell": 0}

You can run all the workflows locally as follows:

```{code-cell}
:lines_to_next_cell: 2
if __name__ == "__main__":
list_wf()
dict_wf()
dataclass_wf()
advanced_workflow()
```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/attribute_access.py
:caption: data_types_and_io/attribute_access.py
:lines: 84-88
```

## Failure scenario
Expand All @@ -174,3 +85,5 @@ def failed_workflow():
print_message(message=fruit_dict["fruits"]) # Accessing a non-existent key
print_message(message=fruit_instance.fruit) # Accessing a non-existent param
```

[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/data_types_and_io/
138 changes: 25 additions & 113 deletions docs/user_guide/data_types_and_io/dataclass.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,3 @@
---
jupytext:
cell_metadata_filter: all
formats: md:myst
main_language: python
notebook_metadata_filter: all
text_representation:
extension: .md
format_name: myst
format_version: 0.13
jupytext_version: 1.16.1
kernelspec:
display_name: Python 3
language: python
name: python3
---

+++ {"lines_to_next_cell": 0}

(dataclass)=

# Dataclass
Expand All @@ -38,36 +19,25 @@ If you're using Flytekit version >= v1.11.1, you don't need to decorate with `@d
inherit from Mashumaro's `DataClassJSONMixin`.
:::

To begin, import the necessary dependencies.
```{note}
To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks].
```

```{code-cell}
import os
import tempfile
from dataclasses import dataclass
To begin, import the necessary dependencies:

import pandas as pd
from flytekit import task, workflow
from flytekit.types.directory import FlyteDirectory
from flytekit.types.file import FlyteFile
from flytekit.types.structured import StructuredDataset
from mashumaro.mixins.json import DataClassJSONMixin
```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/dataclass.py
:caption: data_types_and_io/dataclass.py
:lines: 1-10
```

+++ {"lines_to_next_cell": 0}

## Python types
We define a `dataclass` with `int`, `str` and `dict` as the data types.

```{code-cell}
@dataclass
class Datum(DataClassJSONMixin):
x: int
y: str
z: dict[int, str]
```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/dataclass.py
:caption: data_types_and_io/dataclass.py
:pyobject: Datum
```

+++ {"lines_to_next_cell": 0}

You can send a `dataclass` between different tasks written in various languages, and input it through the Flyte console as raw JSON.

:::{note}
Expand All @@ -76,100 +46,42 @@ All variables in a data class should be **annotated with their type**. Failure t

Once declared, a dataclass can be returned as an output or accepted as an input.

```{code-cell}
@task
def stringify(s: int) -> Datum:
"""
A dataclass return will be treated as a single complex JSON return.
"""
return Datum(x=s, y=str(s), z={s: str(s)})
@task
def add(x: Datum, y: Datum) -> Datum:
"""
Flytekit automatically converts the provided JSON into a data class.
If the structures don't match, it triggers a runtime failure.
"""
x.z.update(y.z)
return Datum(x=x.x + y.x, y=x.y + y.y, z=x.z)
```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/dataclass.py
:caption: data_types_and_io/dataclass.py
:lines: 28-43
```

+++ {"lines_to_next_cell": 0}

## Flyte types
We also define a data class that accepts {std:ref}`StructuredDataset <structured_dataset>`,
{std:ref}`FlyteFile <files>` and {std:ref}`FlyteDirectory <folder>`.

```{code-cell}
@dataclass
class FlyteTypes(DataClassJSONMixin):
dataframe: StructuredDataset
file: FlyteFile
directory: FlyteDirectory
@task
def upload_data() -> FlyteTypes:
"""
Flytekit will upload FlyteFile, FlyteDirectory and StructuredDataset to the blob store,
such as GCP or S3.
"""
# 1. StructuredDataset
df = pd.DataFrame({"Name": ["Tom", "Joseph"], "Age": [20, 22]})
# 2. FlyteDirectory
temp_dir = tempfile.mkdtemp(prefix="flyte-")
df.to_parquet(temp_dir + "/df.parquet")
# 3. FlyteFile
file_path = tempfile.NamedTemporaryFile(delete=False)
file_path.write(b"Hello, World!")
fs = FlyteTypes(
dataframe=StructuredDataset(dataframe=df),
file=FlyteFile(file_path.name),
directory=FlyteDirectory(temp_dir),
)
return fs
@task
def download_data(res: FlyteTypes):
assert pd.DataFrame({"Name": ["Tom", "Joseph"], "Age": [20, 22]}).equals(res.dataframe.open(pd.DataFrame).all())
f = open(res.file, "r")
assert f.read() == "Hello, World!"
assert os.listdir(res.directory) == ["df.parquet"]
```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/dataclass.py
:caption: data_types_and_io/dataclass.py
:lines: 47-84
```

+++ {"lines_to_next_cell": 0}

A data class supports the usage of data associated with Python types, data classes,
flyte file, flyte directory and structured dataset.

We define a workflow that calls the tasks created above.

```{code-cell}
@workflow
def dataclass_wf(x: int, y: int) -> (Datum, FlyteTypes):
o1 = add(x=stringify(s=x), y=stringify(s=y))
o2 = upload_data()
download_data(res=o2)
return o1, o2
```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/dataclass.py
:caption: data_types_and_io/dataclass.py
:pyobject: dataclass_wf
```

+++ {"lines_to_next_cell": 0}

You can run the workflow locally as follows:

```{code-cell}
if __name__ == "__main__":
dataclass_wf(x=10, y=20)
```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/dataclass.py
:caption: data_types_and_io/dataclass.py
:lines: 97-98
```

To trigger a task that accepts a dataclass as an input with `pyflyte run`, you can provide a JSON file as an input:
```
pyflyte run \
https://raw.githubusercontent.com/flyteorg/flytesnacks/master/example_code/data_types_and_io/data_types_and_io/dataclass.py \
https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/dataclass.py \
add --x dataclass_input.json --y dataclass_input.json
```

[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/data_types_and_io/
Loading

0 comments on commit 87c1642

Please sign in to comment.