-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG]Model content hash can't match original hash value #844
Comments
@dhrubo-os Im trying to understand how ml-commons works and this issue seemed like a good one to pick up. I am trying to reproduce the problem, but I am unable to.
Response
Response
I see |
Can you please try to set all these values? |
Thanks @dhrubo-os. I was able to reproduce the problem.
|
@dhrubo-os added a failing IntegTest #999. |
Finally after all the breaking changes being merged. I've learnt this bug only happens on MacOS. |
I'm suffering from this as well, but interestingly enough I'm only using the MacOS in the client environment. I'm port-forwarding https into an EC2 instance on port 5601 running the raw tarballs for 2.9. Should I still be encountering this one or am I bumping into something else? |
@nateynateynate from what I've tested and added an integration test #1016 you shouldn't see the problem when you have OpenSearch running on Linux/Windows host.
Ideally you shouldn't, client really shouldn't matter. Can you post your stacktrace, setup and how to reproduce the problem? |
@saratvemulapalli This is happening with docker which runs on Mac OS as well. Any alternatives!!! |
had a chat with @Saikumar282 on opensearch public slack, this is an expected problem on Darwin which ML-Commons doesn't officially support yet. |
I think we might be conflating two issues here. We're going to end up with a lot of people trying to register a model via a URL and not know to put the model content hash value with it as a result of the poor example here: https://opensearch.org/docs/latest/ml-commons-plugin/ml-framework/ . We have no official instructions on how to generate this hash value nor do we mention it anywhere except for in our call examples. This is something that happens regardless of what OS you're on. |
https://opensearch.org/docs/latest/ml-commons-plugin/api/#request-fields We mentioned in this request fields table. |
I am having the same problem when uploading my custom model, but then when registering the model:
Also running OpenSearch in Docker on Mac OS. I have no problems with the pre-loaded models, even when uploading them from URL. Any idea how to fix this? Edit: Found the solution. You need to generate the checksum of your zip file with the custom model and pass it when uploading the model in the |
This is perhaps what I was doing a poor job of articulating. I dont' think this issue is specific to Darwin / OSX, but happens when uploading models without a model_content_hash_value field. The field was in the list of accepted parameters, but I don't think we do a very good job explaining that the user can calculate this field on their own. Can we perhaps change the error message to give the recipients of it a lead? "model content changed - please use the |
@dhrubo-os In Step 4 of the sequence you describe, you unloaded the model and in Step 5, you said you deployed it, but did you run _load followed by _deploy? |
@juntezhang Can you create a separate issue for doing input validation on custom model registration? |
|
The model hash after _unload/_load differed because the zip file in
I noticed that the zip file stayed and its size kept growing. I found two issues related to this. The first is that the security manager was disallowing
After I added the necessary permission to the plugin-security.policy, _unload/_load started to work. While I was looking into why the delete was failing on Mac, I came across this issue: https://issues.apache.org/jira/browse/IO-787 It says there was regression in commons-io 2.11 (this is the version ml-commons is currently using) specifically related to the forceDelete method that we're using. Strangely, I don't see what code change they made to fix that in commons-io 2.12, but since we are currently on 2.11 and it has a known issue for Mac, I am upgrading commons-io to the latest 2.15. |
@austintlee Thanks Austin for deep dive this issue. After upgrading commons-io, the issue is gone? |
That and the plugin-security.policy change. I am not able to repro it after my fix. |
Any update on this one? I can reproduce it with docker on linux (aws) |
can it be that this was fixed with 2.12. and the commons-io upgrade? I can't test it right now, because we have a dependency on 2.11.
|
Closing this issue, tested in my end. I don't see this issue anymore. |
opensearch-ml-gpu | [2024-08-09T07:17:54,585][DEBUG][o.o.m.e.u.FileUtils ] [opensearch-ml-gpu] merge 61 files into /usr/share/opensearch/data/ml_cache/models_cache/deploy/HdPgNZEBkGu7typLkQJX/cre_pt_v0_2_0_test2.zip |
What is the bug?
Model content hash can't match original hash value
How can one reproduce the bug?
I tried with the current code base. I executed the command
./gradlew run
to test.Then Load the model in memory. This time it works fine.
http://localhost:9200/_plugins/_ml/models/l-bnSYcBQg5VYC5uIxWy/_load
I also generated the embedding with the loaded model and it works fine.
Now I unload the model:
http://localhost:9200/_plugins/_ml/models/l-bnSYcBQg5VYC5uIxWy/_unload
And now I try to deploy again. Then I face the error:
What is the expected behavior?
Model should load again.
What is your host/environment?
Do you have any screenshots?
If applicable, add screenshots to help explain your problem.
Do you have any additional context?
Add any other context about the problem.
The text was updated successfully, but these errors were encountered: