-
Notifications
You must be signed in to change notification settings - Fork 44
Error on save PyTorch model defined in another module #332
Comments
For now images are not supported, but we are working on this in #310. |
Hey, what about saving optimizer state_dict?
and what about the order in state dict for the models? Seems now it is saved by alphabet order. |
Is there a reproducible example in your repo? |
Yes, I prepared one example
Are there any examples with Pytorch models? Or can you describe whether the module must have methods |
They are automatically generated from appropriate model methods. mlem/tests/contrib/test_torch.py Line 127 in ee994a0
I will take a look at your example and try to fix it though |
Ok I see now what is going on. You try to save state with mlem, and state is a dict which mlem assumes as data type, not model type. So I guess we dont support saving pytorch models as state dict :( |
Btw, with fix above saving and loading as dict does work |
But you still wont be able to deploy it as only MlemModel can be deployed, not MlemData |
Yes, sure. You need it when you randomly stopped training, so you would like to start with the same optimizer state. Most optimisers have a training parameter. So, if you want to continue exactly from the same point, you need the optimizer's state too. |
I think my use case is similar so I will post it here instead of a new issue. I have the following MRE import mlem
import torch
# pip install yolov5
from yolov5 import train, load
train.run(imgsz=640, data='coco128.yaml', epochs=1)
model = load('runs/train/exp/weights/best.pt')
data = torch.randn(1, 3, 640, 640)
mlem.api.save(model, 'best', sample_data=data) and this gives me
Any tips on how to proceed? Please tell me if you need more info. Thanks!
|
Can you check using MLEM from |
@mike0sv, sure.
Unfortunately, it didn't change much
|
Hmm, "works on my machine" TM. Can you please also try #397 branch? Btw my env:
Can you share yours? |
Actually, try #453 with |
It all got merged into |
@mike0sv, I upgraded to v0.3.0 and now it works! Thank you for your help! |
Closing since it was fixed! |
I defined custom PyTorch network. If I try to save it using MLEM, it fails.
Environment
OS: Ubuntu Linux 20.04
Python: 3.8.10
Virtual envronment: venv
Python dependencies:
Code
Full code: https://github.com/ankxyz/mlem-pytorch-demo
src/utils/train.py
src/stages/train.py
Commad
Error
Error
At first I assumed the error occures because of the network defined in seprate module. I tried to move the network to the same module where it is used. But I got the same error.
Could you please provide exhaustive example of usage MLEM + Pytorch? It would be great to understand:
save()
if my data are imagesThanks for advance
The text was updated successfully, but these errors were encountered: