Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Error "model content changed" while loading pretrained models #1571

Closed
iyerurmi opened this issue Oct 30, 2023 · 5 comments
Closed

[BUG] Error "model content changed" while loading pretrained models #1571

iyerurmi opened this issue Oct 30, 2023 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@iyerurmi
Copy link

What is the bug?
We are returned with error "model content changed" while loading pretrained models in smaller instances like t3.small due to insufficient memory.

How can one reproduce the bug?
Steps to reproduce the behavior:

Create an OpenSearch cluster with low memory instances like t3.small (2GB Memory) and perform the below steps.

Step 1) Register the model_id. This provides the task ID that registers the model.

POST /_plugins/_ml/models/_upload
{
"name": "huggingface/sentence-transformers/all-MiniLM-L12-v2",
"version": "1.0.1",
"model_format": "TORCH_SCRIPT"
}

Step 2) Verify the registration of model. This provides the registered model_id.

GET /_plugins/_ml/tasks/<task_id>

Step 3) Deploy the model_id. This provides the task ID that deploys the model.

POST /_plugins/_ml/models/<model_id>/_load

Step 4) Verify the deployment of model_id

GET /_plugins/_ml/tasks/<task_id>

What is the expected behavior?
For resolution, we need to upgrade to instances with more memory. However, the error message itself is not indicative of the cause. The error could be rephrased to some thing like "Insufficient memory" so that users know the cause of the error.

What is your host/environment?

  • OS: OS2.9

Do you have any screenshots?
If applicable, add screenshots to help explain your problem.

Do you have any additional context?
Add any other context about the problem.

@iyerurmi iyerurmi added bug Something isn't working untriaged labels Oct 30, 2023
@austintlee
Copy link
Collaborator

Dup of #844 ?

@dhrubo-os
Copy link
Collaborator

@austintlee Although the message is same, I think the underlying issue is different. #844 only happens in MacOS, when you try to redeploy the model.

But in this case, I was able to reproduce the issue in managed service when only the instance capacity is small:

https://forum.opensearch.org/t/error-when-loading-embedding-model-into-memory/15351/6

For the bigger instances I didn't see this issue.

@alexmlopez15
Copy link

using version 2.11 only happens when i try to upload url models. when i use pretrained from hugging faec works just fine.
Hope it helps

@dhrubo-os
Copy link
Collaborator

using version 2.11 only happens when i try to upload url models. when i use pretrained from hugging faec works just fine.
Hope it helps

For your case, you faced this issue

@ylwu-amzn
Copy link
Collaborator

Duplicate issue as #844 (comment) , close this one

@github-project-automation github-project-automation bot moved this from Untriaged to Done in ml-commons projects Nov 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Development

No branches or pull requests

6 participants