Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Models are not auto-redeployed when all the nodes starts at the same time #2177

Open
khoaisohd opened this issue Mar 4, 2024 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@khoaisohd
Copy link

What is the bug?
During disaster recovery, all the nodes go down and need to restart. After all the nodes starts and join the cluster, models are failed to be auto-redeployed

How can one reproduce the bug?
Steps to reproduce the behavior:

  1. Stop all the nodes
  2. Start all the nodes at the same time
  3. All the models that were deployed are not able to be redeployed

What is the expected behavior?
Models should re re-deploy

Root cause
Model auto-deployment is triggered when nodes join the cluster but during that time the model index are not available to be queried

Proposal
Instead of query the models from model index immediately, we can wait for the model config health turn to yellow

What is your host/environment?

  • OS: Linux
  • Version [e.g. 22]
  • Plugins

Do you have any screenshots?
If applicable, add screenshots to help explain your problem.

Do you have any additional context?
Add any other context about the problem.

@khoaisohd khoaisohd added bug Something isn't working untriaged labels Mar 4, 2024
@zane-neo
Copy link
Collaborator

zane-neo commented Mar 5, 2024

@khoaisohd Is the error ClusterBlockException? Also, deploying the model automatically can be complicated in this case(cluster status not ready and query runs), this is an issue to solve this case: #1148 model automatically when inferencing), please take a look on this issue and comment if you have more concern.

@Zhangxunmt Zhangxunmt self-assigned this Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants