Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[META] Create APIs to save and provision a workflow #49

Closed
3 tasks done
joshpalis opened this issue Sep 21, 2023 · 22 comments
Closed
3 tasks done

[META] Create APIs to save and provision a workflow #49

joshpalis opened this issue Sep 21, 2023 · 22 comments
Assignees
Labels
enhancement New feature or request

Comments

@joshpalis
Copy link
Member

joshpalis commented Sep 21, 2023

Is your feature request related to a problem?

_Note : These API designs assumes that the global context index will automatically generate a document Id upon insert. This document Id will be passed back to the user to either update or execute the infrastructure provisioning process of a previously stored use case template. The documentId of the global context index entry will be mapped to a corresponding entry the status index (pending implementation), which the user can use to determine the status of the infrastructure provisioning process.

Goals

  • Implement a POST API to store a use case template. This API will accept a use case template, which will be indexed into the global context index and the generated documentId will be returned to the user.
  • Implement a PUT API to update a stored use case template. This API will accept a documentId parameter and a use case template request body. The documentId will be returned.
  • Implement a POST API to execute a stored use case template's infrastructure provisioning process. This API will accept a documentId which will identify the stored use case template and return the same documentId. This documentId can then be used to query the status of the provisioning process.

Example use case template

The following example use case template format may evolve, but represents the request body of the API :

{
    "name": "semantic-search",
    "description": "My semantic search use case",
    "use_case": "SEMANTIC_SEARCH",
    "operations": [
        "PROVISION",
        "INGEST",
        "QUERY"
    ],
    "version": {
        "template": "1.0",
        "compatibility": [
            "2.9",
            "3.0"
        ]
    },
    "user_inputs": {
        "index_name": "my-knn-index",
        "index_settings": {
        }
    },
    "workflows": {
        "provision": {
            "nodes": [{
                    "id": "create_index",
                    "inputs": {
                        "name": "user_inputs.index_name",
                        "settings": "user_inputs.index_settings"
                    }
                },
                {
                    "id": "create_ingest_pipeline",
                    "inputs": {
                        "id": "my-ingest-pipeline",
                        "description": "some description",
                        "processors": [
                            {
                                "type": "text_embedding",
                                "params":
                                    { 
                                       "model_id": "my-existing-model-id",
                                        "input_field_name": "text_passage",
                                        "output_field_name": "text_embedding"
                                    }
                            
                            }
                        ]
                    }
                }
            ],
            "edges": [{
                "source": "create_index",
                "dest": "create_ingest_pipeline"
            }]
        },
        "ingest": {
            "user_params": {
                "document": {}
            },
            "nodes": [{
                "id": "ingest_index",
                "inputs": {
                    "index": "user_inputs.index_name",
                    "ingest_pipeline": "my-ingest-pipeline",
                    "document": "user_params.document"
                }
            }]
        },
        "query": {
            "user_params": {
                "plaintext": "string"
            },
            "nodes": [{
                    "id": "transform_query",
                    "inputs": {
                        "template": "neural-search-template-1",
                        "plaintext": "user_params.plaintext"
                    }
                },
                {
                    "id": "query_index",
                    "inputs": {
                        "index": "user_inputs.index_name",
                        "query": "{{output-from-prev-step}}.query",
                        "search_request_processors": [],
                        "search_response_processors": []
                    }
                }
            ],
            "edges": [{
                "source": "transform_query",
                "dest": "query_index"
            }]
        }
    }
}

Create / Update Workflow API

Creates and stores a use case template into the global context index, returns the workflow_id of the document

POST _plugins/_flow_framework/workflow
{
    // use case template
}
{
    "workflow_id" : "..." // the document Id of the `global context index` entry
}

Updates a stored use case template

PUT _plugins/_flow_framework/workflow/<workflow_id>
{
    // use case template
}
{
    "workflow_id" : "..." // the document Id of the `global context index` entry
}

Provision Workflow API

Executes the infrastructure provisioning process for the stored use case template

POST _plugins/_flow_framework/workflow/<workflow_id>/_provision
{}
{
    "workflow_id" : "..." // the document Id of the `global context` index entry, for use in determining the status of the provisioning process
}
@joshpalis joshpalis added enhancement New feature or request untriaged labels Sep 21, 2023
@joshpalis joshpalis self-assigned this Sep 21, 2023
@ohltyler
Copy link
Member

store param seems a little confusing to me since it is doing that additionally without that param. WDYT about
PUT _plugins/ai-flow/execute?provision=false, where by default it will be true unless specified this way?

@ohltyler
Copy link
Member

ohltyler commented Sep 22, 2023

Given provisioning could be expensive, should it return a task ID right away to track the progress, instead of a synchronous wait for the workflow ID to be returned? If I recall this is the plan for the orchestrate API as well.

If provisioning is false, maybe we return with the doc ID right away

@joshpalis
Copy link
Member Author

store param seems a little confusing to me since it is doing that additionally without that param. WDYT about
PUT _plugins/ai-flow/execute?provision=false, where by default it will be true unless specified this way?

Sure, this sounds like a better idea, ill go ahead and modify this.

Given provisioning could be expensive, should it return a task ID right away to track the progress, instead of a synchronous wait for the workflow ID to be returned? If I recall this is the plan for the orchestrate API as well.
If provisioning is false, maybe we return with the doc ID right away

I agree, returning the ID directly and then executing the provisioning process would benefit us in cases where the provisioning process would take some time. The user can then take that returned ID to query for the status, but it does raise the question of whether or not the taskID and the workflow ID would be the same. @owaiskazi19 WDYT

@amitgalitz
Copy link
Member

Given provisioning could be expensive, should it return a task ID right away to track the progress, instead of a synchronous wait for the workflow ID to be returned? If I recall this is the plan for the orchestrate API as well.

If provisioning is false, maybe we return with the doc ID right away

If provisioning time is long enough to require for async execution which it might like you suggest, then I think its a good idea to have a task ID returned with an accompying get task API. We can also do something similar to ml-commons where they have an async param to allow for either async or sync training but that can be over engineering for P0.

@amitgalitz
Copy link
Member

PUT _plugins/ai-flow/execute?provision=false - Will not execute the provisioning process and just stores the given use case template, returning the document ID

Should we maybe have a save API to seperate this since we are only really indexing here.

Additionally should we also add a section for the API to save other information like UI metadata and status to a state index upon starting to execute

@owaiskazi19
Copy link
Member

I agree, returning the ID directly and then executing the provisioning process would benefit us in cases where the provisioning process would take some time. The user can then take that returned ID to query for the status, but it does raise the question of whether or not the taskID and the workflow ID would be the same. @owaiskazi19 WDYT

I agree to returning a taskID asynchronously and then using the status API to track the provision workflow. We also need to keep in mind that we are building a framework to simply the complex setup for use cases. We don't want to make our setup complex in anyway. Moreover, this API can be built incrementally:
P0. Do the provision in the background synchronously.
P1. Once we have the status API out, return a taskID as a response.

It will be better to convert this as a META issue @joshpalis and also if you can add payload of the request and response phase wise would be great.

@joshpalis joshpalis changed the title [FEATURE] Create Execute API [META] Create Execute API Sep 25, 2023
@joshpalis joshpalis changed the title [META] Create Execute API [META] Create APIs to save and provision a workflow Sep 27, 2023
@owaiskazi19
Copy link
Member

owaiskazi19 commented Oct 9, 2023

@ohltyler is there any specific use case from the frontend side for the below API:

POST _plugins/_flow_framework/workflows/_provision

Can we use
1.

POST _plugins/_flow_framework/workflows 

to save the template and then invoke
2.

POST _plugins/_flow_framework/workflows/<workflow_id>/_provision
{}

to execute the workflow?
We should keep our APIs as simple as possible, so I was thinking from that direction.

@ohltyler
Copy link
Member

ohltyler commented Oct 9, 2023

@owaiskazi19 yes, I agree with the above, which also aligns with the updated proposal in this issue. This follows the conventions of AD & other plugins.

@owaiskazi19
Copy link
Member

@owaiskazi19 yes, I agree with the above, which also aligns with the updated proposal in this issue. This follows the conventions of AD & other plugins.

Is it fine if we remove POST _plugins/_flow_framework/workflows/_provision API completely from the backend then?

@ohltyler
Copy link
Member

ohltyler commented Oct 9, 2023

@owaiskazi19 yes, I agree with the above, which also aligns with the updated proposal in this issue. This follows the conventions of AD & other plugins.

Is it fine if we remove POST _plugins/_flow_framework/workflows/_provision API completely from the backend then?

Yes, I'm onboard with that

@joshpalis
Copy link
Member Author

joshpalis commented Oct 9, 2023

@tyler

Here are the updated URIs based on discussion with @owaiskazi19:

POST _plugins/_flow_framework/workflows/_save - will be our save API
PUT_plugins/_flow_framework/workflows/<workflow_id>/_save - will be our update API
POST _plugins/_flow_framework/workflows/<workflow_id>/_provision - will retrieve the template from the global context using the workflow ID and execute the provision workflow

This proposal will remove the following URI path :
POST _plugins/_flow_framework/workflows/_provision - will first save the use case template and execute the provision workflow (use case template is given in the request body)

This API was meant to handle both saving and executing a use case template. As a result, from a user perspective, they will first have to save and then provision separately.

I'll update this meta issue and then update the PR as well

@owaiskazi19
Copy link
Member

@joshpalis update API should be

PUT _plugins/_flow_framework/workflows/<workflow_id>

@ohltyler
Copy link
Member

ohltyler commented Oct 9, 2023

@joshpalis wait what is the reasoning from changing from the last iteration to adding _save? That should be implied by the POST or PUT actions on if it is saving or updating. AD does not have any extra _save path. Also, it seems a little weird that save/update are different paths.

@owaiskazi19
Copy link
Member

owaiskazi19 commented Oct 9, 2023

@ohltyler How about _create in the URI instead of _save to let the API be the self explanatory? Or if you think we should not include anything even that's fine. The major concern was related to an additional provision API, rest anything is fine.
Update API shouldn't have _save as mentioned here @joshpalis .

@joshpalis
Copy link
Member Author

joshpalis commented Oct 9, 2023

@ohltyler I have no strong opinions regarding the URI for the create API

We can either do :

Option 1
POST _plugins/_flow_framework/workflows/_save
PUT_plugins/_flow_framework/workflows/<workflow_id>/_save

OR

Option 2
POST _plugins/_flow_framework/workflows/_create
PUT_plugins/_flow_framework/workflows/<workflow_id>/_create

OR

Option 3
POST _plugins/_flow_framework/workflows
PUT_plugins/_flow_framework/workflows/<workflow_id>

@ohltyler
Copy link
Member

ohltyler commented Oct 9, 2023

@joshpalis I would prefer option 3 for simplicity and for consistency with existing OpenSearch plugins.

@owaiskazi19
Copy link
Member

@joshpalis let's go with 3rd one

@joshpalis
Copy link
Member Author

Option 3 it is :)

@dbwiddis
Copy link
Member

dbwiddis commented Oct 9, 2023

Am I too late to vote for option 3?

@austintlee
Copy link

workflows, not workflow? Why is it plural? I am looking at ingest and search pipelines (pipeline) and there we use singular.

@dbwiddis
Copy link
Member

Good point @austintlee

@joshpalis
Copy link
Member Author

Thanks for the suggestion @austintlee, I'll make this change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants