-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async tasks and eager revamp #2927
Conversation
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #2927 +/- ##
===========================================
+ Coverage 51.35% 90.05% +38.70%
===========================================
Files 200 85 -115
Lines 20940 3820 -17120
Branches 2697 0 -2697
===========================================
- Hits 10753 3440 -7313
+ Misses 9586 380 -9206
+ Partials 601 0 -601 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
deck and exceptions Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
… the launch function, pass the work item directly so that exceptions can always be set, unit tests Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]> Signed-off-by: Kevin Su <[email protected]>
|
||
|
||
@pytest.mark.sandbox_test | ||
def test_easy_2(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we should test 2 concurrent eager executions running in the same python vm but are independent
For example
exec-1 -> eager
exec-2 -> eager -> eager
Logically these are independent executions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me make an issue for this, and some of the other todos that came up. will have to overhaul the flyte context & manager. I think the scenario you're thinking of can be described like this.
@task
def t1():
print("normal task")
@eager
def eg_1():
some_other_task()
@eager
def eg_p():
t1()
eg_1()
The issue is that the FlyteContext is a shared global across all coroutines (it's a thread local, not a coroutine local) which means that when t1's call handler runs, it'll set the execution state one way, and when eg_1's call handler runs it'll try to set it another way (actually it'll fail because it'll think it's running inside t1
rather than eg_p
). What we want is a sort of tree of context objects rather than the list it is today.
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
…on tests Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is amazing.
Signed-off-by: Yee Hing Tong <[email protected]>
To support the growing interest to support Python async style programming, and to bring the
eager
mode out of experimental, this pr revamps the call patters in eager to support native Python async semantics. This PR also introduces the concept of async tasks. Note that while documentation refers to eager entities as "workflows", using the@eager
decorator produces a task, not a workflow, so they'll continue to show up under the Tasks tab of the Flyte UI.This is a backwards incompatible change for
@eager
.Usage
Below are some examples of how you might call the new eager workflow.
Patterns
Simple Example
This is the simplest case. An eager workflow invokes a simple task. The container running the eager function will reach out to the control plane, and kick off a single-task execution.
This is akin to an async function in Python calling a synchronous function. It will automatically block until the results are available.
Controlling Execution
This example shows how you might do work at the same time as a task kicked off by eager is running. Again this follows the same semantics as Python. In Python the executing function also needs to relinquish control of the CPU by calling an await so that other items on the loop can progress.
Nested Example
Eager workflows can also be nested. If an eager workflow encounters another eager workflow, it will be launched against Admin as a single task execution, just like any other task.
Errors
If an eager task runs a task on a remote Flyte cluster and that task fails, there is no way to recreate the exact type (AssertionError, ValueError, etc.) that caused the failure, so all failures are interpreted as an
EagerException
. This behavior is the same as before, but the import location has changedDeveloper/Admin Notes
Naming/Searching
When an eager task launches downstream entities the execution names are deterministic. A hash is made from the current eager task's execution ID, the entity type and name being run, the call order (if a task is called multiple times) and the inputs. This makes the execution idempotent for future recovery work.
Labels
Two labels are attached to executions launched by an eager execution, the current eager execution's name (under
eager-exec
), and the root eager execution's name (underroot-eager-exec
in the case of nested eager tasks).Signal Handling
A signal handler gets added when an eager task runs in the backend, and listens to sigint (ctrl-c) and sigterm (the signal that is sent by K8s when the pod is deleted, aka kill). The handler will iterate through all the executions and terminate anything that's not already in a terminal state.
Changes
High level changes
AsyncPythonFunctionTask
andEagerAsyncPythonFunctionTask
.flyte_entity_call_handler
and this new async call handler now call each other recursively.as_python_native
to theLiteralsResolver
so it can produce proper un-packable outputs as the result of calling flyte entities.Controller
andInformer
(not set on the names, these are not user facing so we can change later)Note
A note for developers, originally we were only going to have the async version of the call handler, but this failed because of the way flytekit/Python async works. Because we cannot guarantee that we'll only ever make the jump from async to sync once, async functions are run on a separate thread in flytekit. We ran into issues where plugins and other libraries depended on being run in the main Python thread (one example is signal handling for instance). The synchronous version of the call handler was added back for this reason, to keep the non-eager, non-async call flow effectively the same as before.
Other items
asyn.py
loop_manager to log errors so they're more visible.is_eager
bit.Remaining items
These probably make more sense to do after this is merged. Eager should be considered not completely usable until these are in.
Investigate
Things that came up in the course of this PR.
How was this patch tested?
Tested locally using sandbox. More testing needed in deployed environments.
Setup process
Screenshots
Check all the applicable boxes
Related PRs
Docs link