-
Notifications
You must be signed in to change notification settings - Fork 61
Conversation
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Codecov Report
@@ Coverage Diff @@
## master #378 +/- ##
==========================================
+ Coverage 76.11% 78.49% +2.37%
==========================================
Files 18 18
Lines 1390 1195 -195
==========================================
- Hits 1058 938 -120
+ Misses 280 205 -75
Partials 52 52
Flags with carried forward coverage won't be shown. Click here to find out more. see 18 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! The only two things I'm still not grasping are:
- Where does the task type multiplexing happen? Each request here has a
task_type
variable which makes it sound like there will be a single server running that handles all requests and just has multiple plugins registered. However the flyteplugins PR has anEndpointForTaskTypes
flag where you specify separate endpoints for each task type - is the idea to support both use-cases? - All but the delete responses have an
error_message
field which implies failure. We changes this in the create fucntion to a one_of meaning that the create either succeeded or failed, but this wasn't changed in the get response message. Is there a reason to add this explicit message or can we rely on the gRPC error handling to pass failures from server to client?
Signed-off-by: Kevin Su <[email protected]>
re multiplexing: both. In flyte, we will ship all the plugin in flyte repo to one plugin container. it also much easier for us to update the single binary and helm chart. Some users may want submit the job to different grpc server for development, so we'd like to support both. see more discussion here. re error_message: yup, you're right. we should rely on the gRPC error handling. I just removed all the error message from idl. will update the grpc server in flytekit to return error code and message. |
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
import "flyteidl/core/interface.proto"; | ||
|
||
// BackendPluginService defines an RPC Service that allows propeller to send the request to the backend plugin server. | ||
service BackendPluginService { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit - does some like ExternalPluginService
make more sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
like it!
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be a reasonable MVP. Would love another reviewer to go through in some depth though!
* Add backend plugin system service Signed-off-by: Kevin Su <[email protected]> * Add backend plugin system service Signed-off-by: Kevin Su <[email protected]> * nit Signed-off-by: Kevin Su <[email protected]> * nit Signed-off-by: Kevin Su <[email protected]> * nit Signed-off-by: Kevin Su <[email protected]> * nit Signed-off-by: Kevin Su <[email protected]> * update state Signed-off-by: Kevin Su <[email protected]> * update state Signed-off-by: Kevin Su <[email protected]> * dics Signed-off-by: Kevin Su <[email protected]> * Remove output prefix from get request Signed-off-by: Kevin Su <[email protected]> * update Signed-off-by: Kevin Su <[email protected]> * remove prev state Signed-off-by: Kevin Su <[email protected]> * update proto Signed-off-by: Kevin Su <[email protected]> * remove error message Signed-off-by: Kevin Su <[email protected]> * update comment Signed-off-by: Kevin Su <[email protected]> * make generate Signed-off-by: Kevin Su <[email protected]> * Rename the service Signed-off-by: Kevin Su <[email protected]> * nit Signed-off-by: Kevin Su <[email protected]> --------- Signed-off-by: Kevin Su <[email protected]>
* added dynamic_job_spec_uri to dynamic workflow metadata and node execution closure (#360) Signed-off-by: Daniel Rammer <[email protected]> * Use TokenCache in ClientCredentialsTokenSourceProvider (#377) * Init customTokenSource.refreshTime (#381) Signed-off-by: Andrew Dye <[email protected]> * added DataLoadingConfig to K8sPod message (#368) Signed-off-by: Daniel Rammer <[email protected]> * Add Reasons field to TaskExecutionClosure to track time-series of reasons (#382) * added a time-series of reasons to the TaskExecution closure Signed-off-by: Daniel Rammer <[email protected]> * added docs Signed-off-by: Daniel Rammer <[email protected]> * actually finishing docs too Signed-off-by: Daniel Rammer <[email protected]> --------- Signed-off-by: Daniel Rammer <[email protected]> * Create service for runtime metrics (#367) * added span messages Signed-off-by: Daniel Rammer <[email protected]> * added endpoints to service Signed-off-by: Daniel Rammer <[email protected]> * generated mocks Signed-off-by: Daniel Rammer <[email protected]> * removed get task execution metrics rpc Signed-off-by: Daniel Rammer <[email protected]> * added EXECUTION_IDLE category Signed-off-by: Daniel Rammer <[email protected]> * updated PLUGIN_EXECUTION to PLUGIN_RUNTIME Signed-off-by: Daniel Rammer <[email protected]> * removed recorded_at on workflow and node level events Signed-off-by: Daniel Rammer <[email protected]> * added docs for task event reported_at field Signed-off-by: Daniel Rammer <[email protected]> * removed GetNodeExecutionMetrics endpoint - will implement later if necessary Signed-off-by: Daniel Rammer <[email protected]> * updated docs Signed-off-by: Daniel Rammer <[email protected]> * added reported_at for node execution events Signed-off-by: Daniel Rammer <[email protected]> * fixed typo Signed-off-by: Daniel Rammer <[email protected]> * fixed typos and removed dead code Signed-off-by: Daniel Rammer <[email protected]> * updated categories Signed-off-by: Daniel Rammer <[email protected]> * added workflow setup and teardown categories Signed-off-by: Daniel Rammer <[email protected]> * simplified span message and moved to flyteidl.core Signed-off-by: Daniel Rammer <[email protected]> --------- Signed-off-by: Daniel Rammer <[email protected]> * Remove misleading token refresh logic from client credentials token source provider (#383) * Out of core plugin (#378) * Add backend plugin system service Signed-off-by: Kevin Su <[email protected]> * Add backend plugin system service Signed-off-by: Kevin Su <[email protected]> * nit Signed-off-by: Kevin Su <[email protected]> * nit Signed-off-by: Kevin Su <[email protected]> * nit Signed-off-by: Kevin Su <[email protected]> * nit Signed-off-by: Kevin Su <[email protected]> * update state Signed-off-by: Kevin Su <[email protected]> * update state Signed-off-by: Kevin Su <[email protected]> * dics Signed-off-by: Kevin Su <[email protected]> * Remove output prefix from get request Signed-off-by: Kevin Su <[email protected]> * update Signed-off-by: Kevin Su <[email protected]> * remove prev state Signed-off-by: Kevin Su <[email protected]> * update proto Signed-off-by: Kevin Su <[email protected]> * remove error message Signed-off-by: Kevin Su <[email protected]> * update comment Signed-off-by: Kevin Su <[email protected]> * make generate Signed-off-by: Kevin Su <[email protected]> * Rename the service Signed-off-by: Kevin Su <[email protected]> * nit Signed-off-by: Kevin Su <[email protected]> --------- Signed-off-by: Kevin Su <[email protected]> * Feat: Add `ElasticConfig` message type for torch elastic training (#394) * Add elastic config args to pytorch proto Signed-off-by: Fabio Graetz <[email protected]> * Add elastic config message type for torchrun training Signed-off-by: Fabio Graetz <[email protected]> --------- Signed-off-by: Fabio Graetz <[email protected]> Co-authored-by: Fabio Grätz <[email protected]> Co-authored-by: Ketan Umare <[email protected]> * Retract 1.4.x (#397) Signed-off-by: eduardo apolinario <[email protected]> Co-authored-by: eduardo apolinario <[email protected]> * Data addresses #minor (#391) Signed-off-by: Yee Hing Tong <[email protected]> * Refactor kf-operator plugins configs and support setting different specs for different replica groups (#386) * refactor kubeflow operators proto Signed-off-by: Yubo Wang <[email protected]> * add back the original proto for backward compatible Signed-off-by: Yubo Wang <[email protected]> * clean up comments Signed-off-by: Yubo Wang <[email protected]> * add kubeflow.rs Signed-off-by: Yubo Wang <[email protected]> * add elastic config Signed-off-by: Yubo Wang <[email protected]> * add command to MPI Signed-off-by: Yubo Wang <[email protected]> * add slots and command to mpi spec Signed-off-by: Yubo Wang <[email protected]> --------- Signed-off-by: Yubo Wang <[email protected]> Co-authored-by: Yubo Wang <[email protected]> * add user_identifier (#388) Signed-off-by: byhsu <[email protected]> Signed-off-by: eduardo apolinario <[email protected]> Co-authored-by: byhsu <[email protected]> Co-authored-by: eduardo apolinario <[email protected]> * Add envs to execution spec (#400) Signed-off-by: Kevin Su <[email protected]> * Support union and none type in flyteidl (#401) * add support for Union Scalar Signed-off-by: Yubo Wang <[email protected]> * support union type and literals Signed-off-by: Yubo Wang <[email protected]> * change union type extraction Signed-off-by: Yubo Wang <[email protected]> --------- Signed-off-by: Yubo Wang <[email protected]> Co-authored-by: Yubo Wang <[email protected]> Co-authored-by: Kevin Su <[email protected]> * Rename user_identity to execution_identity (#402) Signed-off-by: byhsu <[email protected]> Co-authored-by: byhsu <[email protected]> * make generate Signed-off-by: eduardo apolinario <[email protected]> * Revert "Support union and none type in flyteidl (#401)" This reverts commit 3284f61. Signed-off-by: Eduardo Apolinario <[email protected]> * We should not update flyteidl version in backend components in the case of this branch Signed-off-by: eduardo apolinario <[email protected]> --------- Signed-off-by: Daniel Rammer <[email protected]> Signed-off-by: Andrew Dye <[email protected]> Signed-off-by: Kevin Su <[email protected]> Signed-off-by: Fabio Graetz <[email protected]> Signed-off-by: eduardo apolinario <[email protected]> Signed-off-by: Yee Hing Tong <[email protected]> Signed-off-by: Yubo Wang <[email protected]> Signed-off-by: byhsu <[email protected]> Signed-off-by: Eduardo Apolinario <[email protected]> Co-authored-by: Dan Rammer <[email protected]> Co-authored-by: Andrew Dye <[email protected]> Co-authored-by: Kevin Su <[email protected]> Co-authored-by: Fabio M. Graetz, Ph.D <[email protected]> Co-authored-by: Fabio Grätz <[email protected]> Co-authored-by: Ketan Umare <[email protected]> Co-authored-by: eduardo apolinario <[email protected]> Co-authored-by: Yee Hing Tong <[email protected]> Co-authored-by: Yubo Wang <[email protected]> Co-authored-by: Yubo Wang <[email protected]> Co-authored-by: ByronHsu <[email protected]> Co-authored-by: byhsu <[email protected]>
* Add backend plugin system service Signed-off-by: Kevin Su <[email protected]> * Add backend plugin system service Signed-off-by: Kevin Su <[email protected]> * nit Signed-off-by: Kevin Su <[email protected]> * nit Signed-off-by: Kevin Su <[email protected]> * nit Signed-off-by: Kevin Su <[email protected]> * nit Signed-off-by: Kevin Su <[email protected]> * update state Signed-off-by: Kevin Su <[email protected]> * update state Signed-off-by: Kevin Su <[email protected]> * dics Signed-off-by: Kevin Su <[email protected]> * Remove output prefix from get request Signed-off-by: Kevin Su <[email protected]> * update Signed-off-by: Kevin Su <[email protected]> * remove prev state Signed-off-by: Kevin Su <[email protected]> * update proto Signed-off-by: Kevin Su <[email protected]> * remove error message Signed-off-by: Kevin Su <[email protected]> * update comment Signed-off-by: Kevin Su <[email protected]> * make generate Signed-off-by: Kevin Su <[email protected]> * Rename the service Signed-off-by: Kevin Su <[email protected]> * nit Signed-off-by: Kevin Su <[email protected]> --------- Signed-off-by: Kevin Su <[email protected]>
* Adding support for structured dataset (#369) Signed-off-by: pmahindrakar-oss <[email protected]> * added dynamic_job_spec_uri to dynamic workflow metadata and node execution closure (#360) Signed-off-by: Daniel Rammer <[email protected]> * Use TokenCache in ClientCredentialsTokenSourceProvider (#377) * Init customTokenSource.refreshTime (#381) Signed-off-by: Andrew Dye <[email protected]> * added DataLoadingConfig to K8sPod message (#368) Signed-off-by: Daniel Rammer <[email protected]> * Add Reasons field to TaskExecutionClosure to track time-series of reasons (#382) * added a time-series of reasons to the TaskExecution closure Signed-off-by: Daniel Rammer <[email protected]> * added docs Signed-off-by: Daniel Rammer <[email protected]> * actually finishing docs too Signed-off-by: Daniel Rammer <[email protected]> --------- Signed-off-by: Daniel Rammer <[email protected]> * Create service for runtime metrics (#367) * added span messages Signed-off-by: Daniel Rammer <[email protected]> * added endpoints to service Signed-off-by: Daniel Rammer <[email protected]> * generated mocks Signed-off-by: Daniel Rammer <[email protected]> * removed get task execution metrics rpc Signed-off-by: Daniel Rammer <[email protected]> * added EXECUTION_IDLE category Signed-off-by: Daniel Rammer <[email protected]> * updated PLUGIN_EXECUTION to PLUGIN_RUNTIME Signed-off-by: Daniel Rammer <[email protected]> * removed recorded_at on workflow and node level events Signed-off-by: Daniel Rammer <[email protected]> * added docs for task event reported_at field Signed-off-by: Daniel Rammer <[email protected]> * removed GetNodeExecutionMetrics endpoint - will implement later if necessary Signed-off-by: Daniel Rammer <[email protected]> * updated docs Signed-off-by: Daniel Rammer <[email protected]> * added reported_at for node execution events Signed-off-by: Daniel Rammer <[email protected]> * fixed typo Signed-off-by: Daniel Rammer <[email protected]> * fixed typos and removed dead code Signed-off-by: Daniel Rammer <[email protected]> * updated categories Signed-off-by: Daniel Rammer <[email protected]> * added workflow setup and teardown categories Signed-off-by: Daniel Rammer <[email protected]> * simplified span message and moved to flyteidl.core Signed-off-by: Daniel Rammer <[email protected]> --------- Signed-off-by: Daniel Rammer <[email protected]> * Remove misleading token refresh logic from client credentials token source provider (#383) * Out of core plugin (#378) * Add backend plugin system service Signed-off-by: Kevin Su <[email protected]> * Add backend plugin system service Signed-off-by: Kevin Su <[email protected]> * nit Signed-off-by: Kevin Su <[email protected]> * nit Signed-off-by: Kevin Su <[email protected]> * nit Signed-off-by: Kevin Su <[email protected]> * nit Signed-off-by: Kevin Su <[email protected]> * update state Signed-off-by: Kevin Su <[email protected]> * update state Signed-off-by: Kevin Su <[email protected]> * dics Signed-off-by: Kevin Su <[email protected]> * Remove output prefix from get request Signed-off-by: Kevin Su <[email protected]> * update Signed-off-by: Kevin Su <[email protected]> * remove prev state Signed-off-by: Kevin Su <[email protected]> * update proto Signed-off-by: Kevin Su <[email protected]> * remove error message Signed-off-by: Kevin Su <[email protected]> * update comment Signed-off-by: Kevin Su <[email protected]> * make generate Signed-off-by: Kevin Su <[email protected]> * Rename the service Signed-off-by: Kevin Su <[email protected]> * nit Signed-off-by: Kevin Su <[email protected]> --------- Signed-off-by: Kevin Su <[email protected]> * Feat: Add `ElasticConfig` message type for torch elastic training (#394) * Add elastic config args to pytorch proto Signed-off-by: Fabio Graetz <[email protected]> * Add elastic config message type for torchrun training Signed-off-by: Fabio Graetz <[email protected]> --------- Signed-off-by: Fabio Graetz <[email protected]> Co-authored-by: Fabio Grätz <[email protected]> Co-authored-by: Ketan Umare <[email protected]> * Retract 1.4.x (#397) Signed-off-by: eduardo apolinario <[email protected]> Co-authored-by: eduardo apolinario <[email protected]> * Data addresses #minor (#391) Signed-off-by: Yee Hing Tong <[email protected]> * Refactor kf-operator plugins configs and support setting different specs for different replica groups (#386) * refactor kubeflow operators proto Signed-off-by: Yubo Wang <[email protected]> * add back the original proto for backward compatible Signed-off-by: Yubo Wang <[email protected]> * clean up comments Signed-off-by: Yubo Wang <[email protected]> * add kubeflow.rs Signed-off-by: Yubo Wang <[email protected]> * add elastic config Signed-off-by: Yubo Wang <[email protected]> * add command to MPI Signed-off-by: Yubo Wang <[email protected]> * add slots and command to mpi spec Signed-off-by: Yubo Wang <[email protected]> --------- Signed-off-by: Yubo Wang <[email protected]> Co-authored-by: Yubo Wang <[email protected]> * add user_identifier (#388) Signed-off-by: byhsu <[email protected]> Signed-off-by: eduardo apolinario <[email protected]> Co-authored-by: byhsu <[email protected]> Co-authored-by: eduardo apolinario <[email protected]> * Add envs to execution spec (#400) Signed-off-by: Kevin Su <[email protected]> * Support union and none type in flyteidl (#401) * add support for Union Scalar Signed-off-by: Yubo Wang <[email protected]> * support union type and literals Signed-off-by: Yubo Wang <[email protected]> * change union type extraction Signed-off-by: Yubo Wang <[email protected]> --------- Signed-off-by: Yubo Wang <[email protected]> Co-authored-by: Yubo Wang <[email protected]> Co-authored-by: Kevin Su <[email protected]> * Rename user_identity to execution_identity (#402) Signed-off-by: byhsu <[email protected]> Co-authored-by: byhsu <[email protected]> * Single literal in GetDataResponse (#404) Signed-off-by: Yee Hing Tong <[email protected]> * Add namespace to execution system metadata (#406) Signed-off-by: Katrina Rogan <[email protected]> * Add oauth2 http proxy client (#405) Signed-off-by: byhsu <[email protected]> * Rename externalPluginService to AgentService (#410) * Rename externalPluginService to AgentService Signed-off-by: Kevin Su <[email protected]> * nit Signed-off-by: Kevin Su <[email protected]> --------- Signed-off-by: Kevin Su <[email protected]> * Add external_plugin_service proto back to the idl (#413) * Add external-plugin-service proto back to the idl Signed-off-by: Kevin Su <[email protected]> * update idl Signed-off-by: Kevin Su <[email protected]> * update idll Signed-off-by: Kevin Su <[email protected]> * update idll Signed-off-by: Kevin Su <[email protected]> * AsyncAgentService Signed-off-by: Kevin Su <[email protected]> --------- Signed-off-by: Kevin Su <[email protected]> * Rerun make generate Signed-off-by: eduardo apolinario <[email protected]> --------- Signed-off-by: pmahindrakar-oss <[email protected]> Signed-off-by: Daniel Rammer <[email protected]> Signed-off-by: Andrew Dye <[email protected]> Signed-off-by: Kevin Su <[email protected]> Signed-off-by: Fabio Graetz <[email protected]> Signed-off-by: eduardo apolinario <[email protected]> Signed-off-by: Yee Hing Tong <[email protected]> Signed-off-by: Yubo Wang <[email protected]> Signed-off-by: byhsu <[email protected]> Signed-off-by: Katrina Rogan <[email protected]> Co-authored-by: pmahindrakar-oss <[email protected]> Co-authored-by: Dan Rammer <[email protected]> Co-authored-by: Andrew Dye <[email protected]> Co-authored-by: Kevin Su <[email protected]> Co-authored-by: Fabio M. Graetz, Ph.D <[email protected]> Co-authored-by: Fabio Grätz <[email protected]> Co-authored-by: Ketan Umare <[email protected]> Co-authored-by: eduardo apolinario <[email protected]> Co-authored-by: Yee Hing Tong <[email protected]> Co-authored-by: Yubo Wang <[email protected]> Co-authored-by: Yubo Wang <[email protected]> Co-authored-by: ByronHsu <[email protected]> Co-authored-by: byhsu <[email protected]> Co-authored-by: Katrina Rogan <[email protected]>
* Add backend plugin system service Signed-off-by: Kevin Su <[email protected]> * Add backend plugin system service Signed-off-by: Kevin Su <[email protected]> * nit Signed-off-by: Kevin Su <[email protected]> * nit Signed-off-by: Kevin Su <[email protected]> * nit Signed-off-by: Kevin Su <[email protected]> * nit Signed-off-by: Kevin Su <[email protected]> * update state Signed-off-by: Kevin Su <[email protected]> * update state Signed-off-by: Kevin Su <[email protected]> * dics Signed-off-by: Kevin Su <[email protected]> * Remove output prefix from get request Signed-off-by: Kevin Su <[email protected]> * update Signed-off-by: Kevin Su <[email protected]> * remove prev state Signed-off-by: Kevin Su <[email protected]> * update proto Signed-off-by: Kevin Su <[email protected]> * remove error message Signed-off-by: Kevin Su <[email protected]> * update comment Signed-off-by: Kevin Su <[email protected]> * make generate Signed-off-by: Kevin Su <[email protected]> * Rename the service Signed-off-by: Kevin Su <[email protected]> * nit Signed-off-by: Kevin Su <[email protected]> --------- Signed-off-by: Kevin Su <[email protected]>
fixes flyteorg/flyte#3282
TL;DR
Define a new service and message for out of core plugin.
These messages are used by propeller to send the create / get / delete request to the backend plugin server
https://hackmd.io/k_hMtUsGTbKl2IksC3IjkA
Type
Are all requirements met?
Complete description
^^^
Tracking Issue
flyteorg/flyte#3282
Follow-up issue
NA