Async tasks and eager revamp #2927

wild-endeavor · 2024-11-13T19:32:47Z

To support the growing interest to support Python async style programming, and to bring the eager mode out of experimental, this pr revamps the call patters in eager to support native Python async semantics. This PR also introduces the concept of async tasks. Note that while documentation refers to eager entities as "workflows", using the @eager decorator produces a task, not a workflow, so they'll continue to show up under the Tasks tab of the Flyte UI.

This is a backwards incompatible change for @eager.

Usage

Below are some examples of how you might call the new eager workflow.

Patterns

Simple Example

This is the simplest case. An eager workflow invokes a simple task. The container running the eager function will reach out to the control plane, and kick off a single-task execution.

@task
def add_one(x: int) -> int:  
    return x + 1  
  
  
@eager
async def simple_eager_workflow(x: int) -> int:  
    # This is the normal way of calling tasks. Call normal tasks in an effectively async way by hanging and waiting for  
    # the result.    out = add_one(x=x)  
  
    return out

This is akin to an async function in Python calling a synchronous function. It will automatically block until the results are available.

Controlling Execution

This example shows how you might do work at the same time as a task kicked off by eager is running. Again this follows the same semantics as Python. In Python the executing function also needs to relinquish control of the CPU by calling an await so that other items on the loop can progress.

@eager
async def simple_await() -> int:
    t1 = asyncio.create_task(add_one(x=10))

	# This allows the loop to run in the background, actually kicking
	# off the execution
    await asyncio.sleep(0)
    # <can do more CPU intensive things here while>
  
    # don't forget the comma if just awaiting one
    i1, = await asyncio.gather(t1)  # can have more of course
    return i1

Nested Example

Eager workflows can also be nested. If an eager workflow encounters another eager workflow, it will be launched against Admin as a single task execution, just like any other task.

@task
def add_one(x: int) -> int:
    return x + 1

@task
def double(x: int) -> int:  
    return x * 2  

@dynamic
def level_3_dt(x: int) -> int:
    out = add_one(x=x)
    return out

@eager
async def level_2(x: int) -> int:
    out = add_one(x=x)
    level_3_res = level_3_dt(x=out)
    final_res = double(x=level_3_res)
    return final_res

@eager
async def level_1() -> typing.Tuple[int, int]:
    i1 = add_one(x=10)
    t2 = asyncio.create_task(level_2(x=1))

    i2 = await t2
    return i1, i2

Errors

If an eager task runs a task on a remote Flyte cluster and that task fails, there is no way to recreate the exact type (AssertionError, ValueError, etc.) that caused the failure, so all failures are interpreted as an EagerException. This behavior is the same as before, but the import location has changed

from flytekit import EagerException

@eager  
async def base_wf(x: int) -> int:  
	try:
	    out = add_one(x=x)
	except EagerException as ee:
	    out = add_two(x=x)

Developer/Admin Notes

Naming/Searching

When an eager task launches downstream entities the execution names are deterministic. A hash is made from the current eager task's execution ID, the entity type and name being run, the call order (if a task is called multiple times) and the inputs. This makes the execution idempotent for future recovery work.

Labels

Two labels are attached to executions launched by an eager execution, the current eager execution's name (under eager-exec), and the root eager execution's name (under root-eager-exec in the case of nested eager tasks).

Signal Handling

A signal handler gets added when an eager task runs in the backend, and listens to sigint (ctrl-c) and sigterm (the signal that is sent by K8s when the pod is deleted, aka kill). The handler will iterate through all the executions and terminate anything that's not already in a terminal state.

Changes

High level changes

Addition of a new AsyncPythonFunctionTask and EagerAsyncPythonFunctionTask.
Introduced two new modes, one for eager local execution, and one for eager backend execution. This made more sense than adding an eager execution mode like how dynamic is done.
Added an async version of the call handler - the main flyte_entity_call_handler and this new async call handler now call each other recursively.
Updated main call handler to now allow the calling of other flyte entities inside an eager task.
Introduced two new Execution modes, one for local eager and one for eager backend run.
Add as_python_native to the LiteralsResolver so it can produce proper un-packable outputs as the result of calling flyte entities.
Adding two classes for the watching of executions launched by an eager task - Controller and Informer (not set on the names, these are not user facing so we can change later)

Note

A note for developers, originally we were only going to have the async version of the call handler, but this failed because of the way flytekit/Python async works. Because we cannot guarantee that we'll only ever make the jump from async to sync once, async functions are run on a separate thread in flytekit. We ran into issues where plugins and other libraries depended on being run in the main Python thread (one example is signal handling for instance). The synchronous version of the call handler was added back for this reason, to keep the non-eager, non-async call flow effectively the same as before.

Other items

For agents, change signal handling by redirecting it through the context manager for now, but more work could be done here to make it more rigorous.
Added an exception handler to the main asyn.py loop_manager to log errors so they're more visible.
Updated idl requirement to pick up new is_eager bit.

Remaining items

These probably make more sense to do after this is merged. Eager should be considered not completely usable until these are in.

Introduce async workflows that compile the same way as normal workflows, but are callable within other async workflows, and can call eager tasks directly.
Documentation updates outside of code comments

Investigate

Things that came up in the course of this PR.

Not related to eager but the naming of entities changed drastically if i was two folders up from the code rather than one.

How was this patch tested?

Tested locally using sandbox. More testing needed in deployed environments.

Setup process

Screenshots

Check all the applicable boxes

I updated the documentation accordingly.
All new and existing tests passed.
All commits are signed-off.

Related PRs

Docs link

Signed-off-by: Yee Hing Tong <[email protected]>

codecov · 2024-11-14T02:01:35Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.05%. Comparing base (47fe660) to head (802bae1).
Report is 6 commits behind head on master.

Additional details and impacted files

@@             Coverage Diff             @@
##           master    #2927       +/-   ##
===========================================
+ Coverage   51.35%   90.05%   +38.70%     
===========================================
  Files         200       85      -115     
  Lines       20940     3820    -17120     
  Branches     2697        0     -2697     
===========================================
- Hits        10753     3440     -7313     
+ Misses       9586      380     -9206     
+ Partials      601        0      -601

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Yee Hing Tong <[email protected]>

deck and exceptions Signed-off-by: Yee Hing Tong <[email protected]>

Signed-off-by: Yee Hing Tong <[email protected]>

… the launch function, pass the work item directly so that exceptions can always be set, unit tests Signed-off-by: Yee Hing Tong <[email protected]>

Signed-off-by: Yee Hing Tong <[email protected]>

Signed-off-by: Yee Hing Tong <[email protected]> Signed-off-by: Kevin Su <[email protected]>

kumare3 · 2024-12-02T06:28:48Z

tests/flytekit/unit/core/test_async.py

+
+
+@pytest.mark.sandbox_test
+def test_easy_2():


i think we should test 2 concurrent eager executions running in the same python vm but are independent

For example
exec-1 -> eager
exec-2 -> eager -> eager

Logically these are independent executions

let me make an issue for this, and some of the other todos that came up. will have to overhaul the flyte context & manager. I think the scenario you're thinking of can be described like this.

@task def t1(): print("normal task") @eager def eg_1(): some_other_task() @eager def eg_p(): t1() eg_1()

The issue is that the FlyteContext is a shared global across all coroutines (it's a thread local, not a coroutine local) which means that when t1's call handler runs, it'll set the execution state one way, and when eg_1's call handler runs it'll try to set it another way (actually it'll fail because it'll think it's running inside t1 rather than eg_p). What we want is a sort of tree of context objects rather than the list it is today.

tests/flytekit/unit/remote/test_remote.py

Signed-off-by: Yee Hing Tong <[email protected]>

…on tests Signed-off-by: Yee Hing Tong <[email protected]>

Signed-off-by: Yee Hing Tong <[email protected]>

eapolinario

This is amazing.

Signed-off-by: Yee Hing Tong <[email protected]>

wild-endeavor added 13 commits November 1, 2024 18:36

eod

3e1d5a2

Signed-off-by: Yee Hing Tong <[email protected]>

notes

b45c4b7

Signed-off-by: Yee Hing Tong <[email protected]>

changes

999cea7

Signed-off-by: Yee Hing Tong <[email protected]>

need to verify tests

8be77c5

Signed-off-by: Yee Hing Tong <[email protected]>

quick lint pass and async

11f3242

Signed-off-by: Yee Hing Tong <[email protected]>

more tests

8ceec0a

Signed-off-by: Yee Hing Tong <[email protected]>

add some assertions even though they're not correct

528b063

Signed-off-by: Yee Hing Tong <[email protected]>

nested eager in real execution calls the backend

2b6f698

Signed-off-by: Yee Hing Tong <[email protected]>

comment

3ad46d1

Signed-off-by: Yee Hing Tong <[email protected]>

note

38581a3

Signed-off-by: Yee Hing Tong <[email protected]>

Merge remote-tracking branch 'origin/master' into async/tasks

469c53f

Signed-off-by: Yee Hing Tong <[email protected]>

comments, pre-worker queu

adbf189

Signed-off-by: Yee Hing Tong <[email protected]>

replace queue

b45cc87

Signed-off-by: Yee Hing Tong <[email protected]>

wild-endeavor changed the title ~~Async/tasks~~ wip Async/tasks Nov 14, 2024

wild-endeavor added 15 commits November 14, 2024 11:19

add turning back to native values

52555eb

Signed-off-by: Yee Hing Tong <[email protected]>

remote

9530595

Signed-off-by: Yee Hing Tong <[email protected]>

remote

809e6b6

Signed-off-by: Yee Hing Tong <[email protected]>

remote

88537e2

Signed-off-by: Yee Hing Tong <[email protected]>

return

27e7ee2

Signed-off-by: Yee Hing Tong <[email protected]>

remove older comments

a0fc558

Signed-off-by: Yee Hing Tong <[email protected]>

Async/tasks cleanup (#2937)

4dfd857

deck and exceptions Signed-off-by: Yee Hing Tong <[email protected]>

merge in consistent exec ids and signals

e0c6b7b

Signed-off-by: Yee Hing Tong <[email protected]>

add cb to watch function, add exception handler, add try/catch around…

f9b45d6

… the launch function, pass the work item directly so that exceptions can always be set, unit tests Signed-off-by: Yee Hing Tong <[email protected]>

fmt

d4fad13

Signed-off-by: Yee Hing Tong <[email protected]>

fix one test and skip rest for now

7c62e20

Signed-off-by: Yee Hing Tong <[email protected]>

wrong marker

8c80bb6

Signed-off-by: Yee Hing Tong <[email protected]>

remove old eager tests

bb04b64

Signed-off-by: Yee Hing Tong <[email protected]>

Merge branch 'master' into async/tasks

65df34a

Signed-off-by: Yee Hing Tong <[email protected]>

Async/tasks agent (#2955)

af631bc

Signed-off-by: Yee Hing Tong <[email protected]> Signed-off-by: Kevin Su <[email protected]>

wild-endeavor marked this pull request as ready for review November 27, 2024 23:40

wild-endeavor requested review from kumare3, eapolinario, pingsutw, cosmicBboy, samhita-alla, thomasjpfan and Future-Outlier as code owners November 27, 2024 23:40

kumare3 reviewed Dec 2, 2024

View reviewed changes

tests/flytekit/unit/remote/test_remote.py Outdated Show resolved Hide resolved

wild-endeavor added 12 commits December 2, 2024 11:30

merge master

b214d77

Signed-off-by: Yee Hing Tong <[email protected]>

bring mashumaro dep back in line with master

6ee16ca

Signed-off-by: Yee Hing Tong <[email protected]>

remove debug code and clean up comments

be750f8

Signed-off-by: Yee Hing Tong <[email protected]>

add gh issue links

222f65c

Signed-off-by: Yee Hing Tong <[email protected]>

add one more gh issue

9931735

Signed-off-by: Yee Hing Tong <[email protected]>

merge master

95ae041

Signed-off-by: Yee Hing Tong <[email protected]>

skip all sandbox tests for now, wait until released and add integrati…

09d473c

…on tests Signed-off-by: Yee Hing Tong <[email protected]>

try adding simplest integration test

84cf0ff

Signed-off-by: Yee Hing Tong <[email protected]>

lint

452105c

Signed-off-by: Yee Hing Tong <[email protected]>

Async/tasks torch (#2972)

f114f53

Signed-off-by: Yee Hing Tong <[email protected]>

revert changes for base agent (#2973)

c4dc438

Signed-off-by: Yee Hing Tong <[email protected]>

de-duplicate call handler code

7f66c92

Signed-off-by: Yee Hing Tong <[email protected]>

wild-endeavor changed the title ~~wip Async/tasks~~ Async tasks and eager revamp Dec 4, 2024

wild-endeavor added 2 commits December 4, 2024 17:01

try redirecting back (#2978)

802bae1

Signed-off-by: Yee Hing Tong <[email protected]>

merge master

7684ba2

Signed-off-by: Yee Hing Tong <[email protected]>

eapolinario previously approved these changes Dec 9, 2024

View reviewed changes

docs changes

3071f56

Signed-off-by: Yee Hing Tong <[email protected]>

wild-endeavor dismissed eapolinario’s stale review via 3071f56 December 9, 2024 20:31

pingsutw approved these changes Dec 9, 2024

View reviewed changes

wild-endeavor merged commit 276c464 into master Dec 9, 2024
100 of 102 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Async tasks and eager revamp #2927

Async tasks and eager revamp #2927

wild-endeavor commented Nov 13, 2024 •

edited

Loading

codecov bot commented Nov 14, 2024 •

edited

Loading

kumare3 Dec 2, 2024

wild-endeavor Dec 2, 2024 •

edited

Loading

eapolinario left a comment

Async tasks and eager revamp #2927

Async tasks and eager revamp #2927

Conversation

wild-endeavor commented Nov 13, 2024 • edited Loading

Usage

Patterns

Simple Example

Controlling Execution

Nested Example

Errors

Developer/Admin Notes

Naming/Searching

Labels

Signal Handling

Changes

High level changes

Other items

Remaining items

Investigate

How was this patch tested?

Setup process

Screenshots

Check all the applicable boxes

Related PRs

Docs link

codecov bot commented Nov 14, 2024 • edited Loading

Codecov Report

kumare3 Dec 2, 2024

Choose a reason for hiding this comment

wild-endeavor Dec 2, 2024 • edited Loading

Choose a reason for hiding this comment

eapolinario left a comment

Choose a reason for hiding this comment

wild-endeavor commented Nov 13, 2024 •

edited

Loading

codecov bot commented Nov 14, 2024 •

edited

Loading

wild-endeavor Dec 2, 2024 •

edited

Loading