Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Support update connector without undeploying the model #2496

Open
zane-neo opened this issue Jun 4, 2024 · 6 comments
Open

[FEATURE] Support update connector without undeploying the model #2496

zane-neo opened this issue Jun 4, 2024 · 6 comments
Assignees
Labels
enhancement New feature or request

Comments

@zane-neo
Copy link
Collaborator

zane-neo commented Jun 4, 2024

Is your feature request related to a problem?
Currently the update connector API checks all the usage of it and only when no model using it, then update operation can go through, but this doesn't seem reasonable, especially for remote models.

  1. When remote model deploys, it's an object creation and put into a map: https://github.com/opensearch-project/ml-commons/blob/main/plugin/src/main/java/org/opensearch/ml/model/MLModelManager.java#L1144
  2. When connector information changed, the connector information and cached model info can be retrieved, then updating the model in cache should able to redeploy the model with new connector info.

What solution would you like?
Adding a new parameter like redeploy_model=true in the url param can reduce the manual effort to undeploy/deploy the model.

What alternatives have you considered?
Change the default behavior to automatically redeploy the model after connector updated.

Do you have any additional context?
Add any other context or screenshots about the feature request here.

@zane-neo zane-neo added enhancement New feature or request untriaged labels Jun 4, 2024
@ylwu-amzn
Copy link
Collaborator

@b4sjoo Has built update connector API which doesn't need redeploy model for internal connector. Sicheng, can you help take this to support standalone connector too?

@Zhangxunmt
Copy link
Collaborator

With auto deploy for remote models, this can be easily done as follow:

  1. Update the connector and save the new connector meta into the ml-connector index. (already in the current API)
  2. Undeploy the models that are associated with the connector. (single line change)

After UpdateConnector is done and when the connector is used by any model, the model will be auto-deployed with the updated connector. People may ask why my model is un-deployed once the connector is updated? Because you have a very important metadata updated which means you model has changed so we un-deployed your model. However, it doesn't introduce any downtime or disturb how you can use your model. From the users point of view, the availability and usability remain the same.

Related: #1148, #2376

@zane-neo
Copy link
Collaborator Author

With auto deploy for remote models, this can be easily done as follow:

  1. Update the connector and save the new connector meta into the ml-connector index. (already in the current API)
  2. Undeploy the models that are associated with the connector. (single line change)

After UpdateConnector is done and when the connector is used by any model, the model will be auto-deployed with the updated connector. People may ask why my model is un-deployed once the connector is updated? Because you have a very important metadata updated which means you model has changed so we un-deployed your model. However, it doesn't introduce any downtime or disturb how you can use your model. From the users point of view, the availability and usability remain the same.

Related: #1148, #2376

Is there any possibility that the model's un-deploy and auto deploy happens at the same time causing any unexpected status?

@Zhangxunmt
Copy link
Collaborator

The "unexpected status" is too general so it's hard to imagine all edge cases or racing conditions to happen. We need to state it clearly that it's not recommended to predict a model when you are in middle of updating the connector. Before the un-deploy finishes, auto-deploy will not happen because old models are still in the memory so predictions are still based on the old model if you predict a model while updating the connector.

@zane-neo
Copy link
Collaborator Author

That doesn't seem a good user experience, if user is updating some http client related parameters, e.g. connection timeout, the good user experience would be: the prediction can keep happening and for the instances that received this update, the afterward predictions honer the updated connection timeout setting, for the instances that haven't received this update, the predictions honer the old connection timeout setting.
The thing is currently updating connector can cause data loss on production since it requires to un-deploy the model, so our purpose should be avoiding this to give user seamlessly experience, without data loss.

@Zhangxunmt
Copy link
Collaborator

It will not have data loss. The auto-deploy will refresh the new connector for you with updated params. It only may introduce data inconsistency in a short time window.

@ylwu-amzn ylwu-amzn assigned b4sjoo and unassigned Zhangxunmt Jul 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: On-deck
Development

No branches or pull requests

4 participants