[BUG] Error "model content changed" while loading pretrained models #1571

iyerurmi · 2023-10-30T23:03:00Z

What is the bug?
We are returned with error "model content changed" while loading pretrained models in smaller instances like t3.small due to insufficient memory.

How can one reproduce the bug?
Steps to reproduce the behavior:

Create an OpenSearch cluster with low memory instances like t3.small (2GB Memory) and perform the below steps.

Step 1) Register the model_id. This provides the task ID that registers the model.

POST /_plugins/_ml/models/_upload
{
"name": "huggingface/sentence-transformers/all-MiniLM-L12-v2",
"version": "1.0.1",
"model_format": "TORCH_SCRIPT"
}

Step 2) Verify the registration of model. This provides the registered model_id.

GET /_plugins/_ml/tasks/<task_id>

Step 3) Deploy the model_id. This provides the task ID that deploys the model.

POST /_plugins/_ml/models/<model_id>/_load

Step 4) Verify the deployment of model_id

GET /_plugins/_ml/tasks/<task_id>

What is the expected behavior?
For resolution, we need to upgrade to instances with more memory. However, the error message itself is not indicative of the cause. The error could be rephrased to some thing like "Insufficient memory" so that users know the cause of the error.

What is your host/environment?

OS: OS2.9

Do you have any screenshots?
If applicable, add screenshots to help explain your problem.

Do you have any additional context?
Add any other context about the problem.

austintlee · 2023-10-31T02:21:48Z

Dup of #844 ?

dhrubo-os · 2023-10-31T17:13:05Z

@austintlee Although the message is same, I think the underlying issue is different. #844 only happens in MacOS, when you try to redeploy the model.

But in this case, I was able to reproduce the issue in managed service when only the instance capacity is small:

https://forum.opensearch.org/t/error-when-loading-embedding-model-into-memory/15351/6

For the bigger instances I didn't see this issue.

alexmlopez15 · 2023-11-01T20:26:28Z

using version 2.11 only happens when i try to upload url models. when i use pretrained from hugging faec works just fine.
Hope it helps

dhrubo-os · 2023-11-01T20:44:33Z

using version 2.11 only happens when i try to upload url models. when i use pretrained from hugging faec works just fine.
Hope it helps

For your case, you faced this issue

ylwu-amzn · 2023-11-20T20:08:49Z

Duplicate issue as #844 (comment) , close this one

iyerurmi added bug Something isn't working untriaged labels Oct 30, 2023

ylwu-amzn added this to ml-commons projects Nov 3, 2023

ylwu-amzn assigned dhrubo-os Nov 3, 2023

ylwu-amzn moved this to Untriaged in ml-commons projects Nov 3, 2023

ylwu-amzn closed this as completed Nov 20, 2023

github-project-automation bot moved this from Untriaged to Done in ml-commons projects Nov 20, 2023

Zhangxunmt removed the untriaged label Nov 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Error "model content changed" while loading pretrained models #1571

[BUG] Error "model content changed" while loading pretrained models #1571

iyerurmi commented Oct 30, 2023

austintlee commented Oct 31, 2023

dhrubo-os commented Oct 31, 2023

alexmlopez15 commented Nov 1, 2023

dhrubo-os commented Nov 1, 2023

ylwu-amzn commented Nov 20, 2023

[BUG] Error "model content changed" while loading pretrained models #1571

[BUG] Error "model content changed" while loading pretrained models #1571

Comments

iyerurmi commented Oct 30, 2023

austintlee commented Oct 31, 2023

dhrubo-os commented Oct 31, 2023

alexmlopez15 commented Nov 1, 2023

dhrubo-os commented Nov 1, 2023

ylwu-amzn commented Nov 20, 2023