Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Flyte Agent #3558

Merged
merged 3 commits into from
Aug 25, 2023
Merged

[RFC] Flyte Agent #3558

merged 3 commits into from
Aug 25, 2023

Conversation

pingsutw
Copy link
Member

@pingsutw pingsutw commented Mar 30, 2023

Describe your changes

Add RFC for Flyte Agent. Hackmd

@davidmirror-ops davidmirror-ops added the rfc A label for RFC issues label Apr 6, 2023
rfc/system/0000-exteranl-plugin-service.md Outdated Show resolved Hide resolved
rfc/system/0000-exteranl-plugin-service.md Outdated Show resolved Hide resolved
rfc/system/0000-exteranl-plugin-service.md Outdated Show resolved Hide resolved
rfc/system/0000-exteranl-plugin-service.md Outdated Show resolved Hide resolved
rfc/system/0000-exteranl-plugin-service.md Outdated Show resolved Hide resolved
rfc/system/0000-exteranl-plugin-service.md Outdated Show resolved Hide resolved
rfc/system/0000-exteranl-plugin-service.md Outdated Show resolved Hide resolved
```

### Register a backend plugin
Users should be able to write the plugin using a rest-like interface. In this proposal we recommend using a simplified form of WebAPI plugins. the goal of the plugins is to enable the following functions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last sentence of this paragraph seems incomplete: I expected a list of functions that the plugins enable



### Run a Flytekit Backend Plugin
The workflow code running Snowflake, Databricks should not be changed. We can add a new config `enable_backend_flytekit_plugin_system`. If it's true, the job will be handled by plugin system. If can't find the plugin in the registry, falling back to use the default web api plugin in the propeller.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where would this be configured? the flytectl yaml config file?

The workflow code running Snowflake, Databricks should not be changed. We can add a new config `enable_backend_flytekit_plugin_system`. If it's true, the job will be handled by plugin system. If can't find the plugin in the registry, falling back to use the default web api plugin in the propeller.

### Secrets Mmanagement
Secrets management should not be imposed using Flyte’s convention, though we should provide a simplified secrets reader using flytekits secret system?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we answered this question?


### Create Job
```sequence
Propeller->Grpc API Server: REST (Task spec)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Grpc API Server" in this example is the user-written plugin?


### Create Job
```sequence
Propeller->Grpc API Server: REST (Task spec)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This suggests that propeller will talk to the gRPC api server via REST? Is this correct? The paragraph below confirms that users should be able to write plugins using a "rest-like interface". Why do we call it gRPC API server here then?

return {"job_id": job_id, "error_code": error_code, "message": message}

```
**Ideally**, we should provide a simplified interface to this in flytekit. This would mean that the flytekit plugin can simply use this interface to create a local plugin and a backendplugin.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be very nice.

rfc/system/0000-exteranl-plugin-service.md Outdated Show resolved Hide resolved
Comment on lines 165 to 181
### 1. All the plugin running in one deployment
Pros:
* Easy to maintain.
* One image, and one deployment yaml file.

Cons:
* Dependecy conflict between two plugins.
* Need to restart the deployment and rebuild the image when we add a new backend plugin to flytekit.

### 2. One plugin per deployment

Pros:
* We can scale up the specific plugin deployments when the reqeusts increase in some plugins.
* Only need to deploy the plugins that people will use.

Cons:
* Hard to maintain. (tons of images, and deployment yaml files)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all plugins need to be in one image/deployment, how would it work if users want to create plugins private to their organisation? At least for that use case a separate image/deployment would be nice. Otherwise they would have to fork flytekit to build the image?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about adding such a list in the values file of the helm chart?

plugins:
    plugin_1_name: gcr.io/.../plugin_1:version
    plugin_2_name: ...

We wouldn't have dependency conflicts because each plugin comes with its own Dockerfile. The deployment and service manifests, however, could then be templated, at least greatly reducing the number of yaml files that need to be maintained.

Also adding a private plugin becomes trivial since you only need to build the image and modify the values file.

Copy link
Member

@pradithya pradithya Apr 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For plugins internal to an organization, I think we can have the mapping of plugin --> plugin endpoint in the flyte configuration and let the plugin developer (or the system administrator) deploy the plugin separately from flyte.
e.g.

plugins:
    plugin-1: plugin-1.plugin-namespace.svc.cluster.local/plugin1
    plugin-2: plugins.plugin-namespace.svc.cluster.local/plugin2
    plugin-3: plugins.plugin-namespace.svc.cluster.local/plugin3

Note that since it only accept endpoint, the plugin developer can have more freedom to implement their plugin as one deployment (a single Docker image) or multiple deployments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Or even outside of K8s as cloud function or similar if they prefer.

* One image, and one deployment yaml file.

Cons:
* Dependecy conflict between two plugins.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very important point because this is currently one of the difficult things in flyteplugins. I once tried upgrading the spark-on-k8s plugin which was made almost impossible because I would have had to then upgrade many other plugins at the same time.

Dependency conflicts in python are bad as well so I'd be worried to recreate the same problem outside of flyteplugins.


## Open Question:

1. Should we provide a way to deploy these plugins using Flyte? IMO this should be optional and not required for MVP?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helm chart should be the way to go in my opinion. See comment about adding a list of plugins to the values above.

@davidmirror-ops
Copy link
Contributor

04-13-2023 Contributor's meetup notes:
[Byron Hsu]: potential dependency issues with different API versions?
[BH] What benefits from moving to the plugin system?
[Kevin] Move the overhead of running the plugin outside propeller
[Dan] There could be a minor performance impact as a tradeoff to the flexibility of the plugin system. No feedback loop to propeller to enqueue the workflow
[Fabio] Is it one image for all plugins or one per plugin?
[Kevin] 1:1 relationship images to plugins.

bstadlbauer
bstadlbauer previously approved these changes May 10, 2023
Copy link
Member

@bstadlbauer bstadlbauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pingsutw I guess we can merge this? 🤔

fg91
fg91 previously approved these changes May 10, 2023
Copy link
Member

@fg91 fg91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The questions I had were addressed in the last contributors sync, LGTM 👍

Signed-off-by: Kevin Su <[email protected]>
@pingsutw pingsutw dismissed stale reviews from fg91 and bstadlbauer via 00d5073 May 10, 2023 09:48
@eapolinario
Copy link
Contributor

We should rename this to Agents to indicate recent development. Otherwise, LGTM.

@davidmirror-ops
Copy link
Contributor

davidmirror-ops commented Jul 4, 2023

@pingsutw would you mind checking if we need to rename and in any case merge this?

@pingsutw pingsutw mentioned this pull request Aug 9, 2023
11 tasks
@pingsutw pingsutw changed the title [RFC] External Plugin Service [RFC] Flyte Agent Aug 17, 2023
@eapolinario eapolinario merged commit 7094b11 into master Aug 25, 2023
@eapolinario eapolinario deleted the ofc-rfc branch August 25, 2023 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rfc A label for RFC issues
Projects
Status: Implemented
Development

Successfully merging this pull request may close these issues.

7 participants