-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
the model mismatch for mege 'model.load state_dict(checkpoint['model']),when I code with viz_rcg.ipynb #13
Comments
and my training mode is base mode,not the large or huge one ,that's the detail: |
I changed the ipynb's code from "model = models_mage.mage_vit_large_patch16" to model = models_mage.mage_vit_base_patch16,because I trained model in base mode ,now the error change into a new one:RuntimeError Traceback (most recent call last) |
Did you set |
I'll try it later when my gpu available,thx for reply! |
No -- you should simply comment out the concat_all_gather line. Your modification will return an output full of 1, as you use tensors_gather = [torch.ones_like(tensor) for _ in range(1)]. |
ok, I think that will be fine,but how to comment it?because there are code in Line 107 in 4f1c32f
when I comment it , the new error is the new error is |
You just comment it out and it should be fine. This error is caused by use_rep, not by commenting. You need to set
|
I feel sorry for repeatedly asking questions,when I follow the args ,it works,the code is running now. the |
No worries -- please let me know if you encounter other problems. |
Hello!thank you for your great work.
I trained rdm.pth from main_rdm.py,and trained mage.pth from main_mage.py, when I want to visualize the genereation, I encount this problem:
RuntimeError Traceback (most recent call last)
Cell In[9], line 2
1 checkpoint = torch.load(os.path.join('output/checkpoint-last.pth'), map_location='cpu')
----> 2 model.load_state_dict(checkpoint['model'], strict=True)
3 model.cuda()
4 _ = model.eval()
RuntimeError: Error(s) in loading state_dict for MaskedGenerativeEncoderViT:
size mismatch for cls_token: copying a param with shape torch.Size([1, 1, 768]) from checkpoint, the shape in current model is torch.Size([1, 1, 1024]).
size mismatch for pos_embed: copying a param with shape torch.Size([1, 257, 768]) from checkpoint, the shape in current model is torch.Size([1, 257, 1024]).
size mismatch for mask_token: copying a param with shape torch.Size([1, 1, 768]) from checkpoint, the shape in current model is torch.Size([1, 1, 1024]).
size mismatch for decoder_pos_embed: copying a param with shape torch.Size([1, 257, 768]) from checkpoint, the shape in current model is torch.Size([1, 257, 1024]).
size mismatch for decoder_pos_embed_learned: copying a param with shape torch.Size([1, 257, 768]) from checkpoint, the shape in current model is torch.Size([1, 257, 1024]).
size mismatch for token_emb.word_embeddings.weight: copying a param with shape torch.Size([2025, 768]) from checkpoint, the shape in current model is torch.Size([2025, 1024]).
size mismatch for token_emb.position_embeddings.weight: copying a param with shape torch.Size([257, 768]) from checkpoint, the shape in current model is torch.Size([257, 1024]).
...
size mismatch for decoder_pred.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([768, 1024]).
size mismatch for mlm_layer.fc.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for mlm_layer.fc.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for mlm_layer.ln.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for mlm_layer.ln.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
I can't understand why ,when I come with this problem.
I used my own dataset for training and use no distributed training
The text was updated successfully, but these errors were encountered: