How to load the pretrained safesensor and continue to train? #13

JunyuanDeng · 2024-06-19T08:57:18Z

Hello, Thanks for your sharing code!

I am now try to train the stage 2 with the provided vista.safetensors

So I change the command to below:

torchrun \
    --nnodes=1 \
    --nproc_per_node=8 \
    train.py \
    --base configs/training/vista_phase2_stage2.yaml \
    --finetune ${PATH_TO_STAGE1_CKPT}/vista.safetensors \
    --num_nodes 1 \
    --n_devices 8

But there are lots of missing keys like:

And the loss, in my expectation, should be low, which is not true in my observation:

I download the sampled video "samples_mp4_epoch00_batch0000_step000001.mp4":

samples_mp4_epoch00_batch0000_step000001.mp4

What should I do to use the provided weight to start the phase 2 stage 2 traning?

The text was updated successfully, but these errors were encountered:

Little-Podi · 2024-07-29T12:42:30Z

Sorry for the trouble. I haven't verify this resuming feature yet. It seems that there are some random weights after initialization. Make sure the new weights are initialized as zeros. In addition, if there are some "unexpected" weights when loading the checkpoint, make sure all of them are remapped to "missing" weights. It can be realized by renaming the keys in the state dictionary and loading the dictionary to the model again.

zhoujiawei3 · 2024-11-07T08:45:28Z

@JunyuanDeng
Hi, have you resolved this issue? Could you please share how you did it? Thank you!

zhoujiawei3 · 2024-11-10T08:58:41Z

@Little-Podi Hi，I want to make sure your words mean that we need to change the code to set the missing keys initialized as zeros in this case? As when I set these missing keys's value to zero, the samples_mp4_epoch00_batch0000_step000001.mp4 is still in that strange form

jywu511 · 2024-12-25T01:55:51Z

@Little-Podi Hi, thanks a lot for sharing the great work! I met the same question, could you share the checkpoint after stage1 for continue training? Thanks a lot!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to load the pretrained safesensor and continue to train? #13

How to load the pretrained safesensor and continue to train? #13

JunyuanDeng commented Jun 19, 2024

Little-Podi commented Jul 29, 2024

zhoujiawei3 commented Nov 7, 2024

zhoujiawei3 commented Nov 10, 2024 •

edited

Loading

jywu511 commented Dec 25, 2024

How to load the pretrained safesensor and continue to train? #13

How to load the pretrained safesensor and continue to train? #13

Comments

JunyuanDeng commented Jun 19, 2024

Little-Podi commented Jul 29, 2024

zhoujiawei3 commented Nov 7, 2024

zhoujiawei3 commented Nov 10, 2024 • edited Loading

jywu511 commented Dec 25, 2024

zhoujiawei3 commented Nov 10, 2024 •

edited

Loading