-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AttributeError: 'Tensor' object has no attribute 'config' #3
Comments
It seems to be related to the version of transformers. Actually, the latest version of transformers also works. You can have a try. |
But I have try transformers-4.39.3, which also doesn't work. |
does it report the same error? |
yes |
after saved the checkpoint at the first checkpointing_step |
It's a little weird. You can disable checkpointing operation by setting the checkpointing_steps in config.yaml and try to load the motion embedding via the infererence code. I will check it later. |
Yes,I have set checkpointing_steps: 200 and max_train_steps: 200 . But is this operation will affect the final effect ?And my results are as follows: "A knight in armor rides a Segway", tmp_yhbhzx1.mp4"A cat in armor driving a go-kart", 1.mp4 |
I can't see you results |
"A toy train chugs around a roundabout tree" "A cat in armor driving a go-kart", "A knight in armor rides a Segway", "A teddy bear is riding a tricycle in Times Square" |
It looks like you are not using any noise initialization strategy. The quality of video model generation strongly depends on the initial noise, which is discussed in our paper and other related literature. Since our motion embedding parameters are very limited, it is not recommended to use it alone. Alternatively, if you wish to use motion embedding purely for video customization, you will need to update config.yaml to enlarge the size of the motion embedding by including 320 into the dim parameter and change the loss type to BaseLoss. Note that doing so also increases the risk of overfitting. |
Thanks for your recommendation. I have used the noise initialization strategy, and I used the training input video as input for the initialization video, which is not sure whether reasonable or not. But I still can't get a reasonable result. Here is my test result. https://github.com/WenshuangSong/file/blob/main/longboard-24%20(1).mp4 at the inference stage, my prompt is "A pigeon is strutting around a town square", It doesn't seem as reasonable as the results on your project page. Did something go wrong? Thanks a lot for your reply. |
I can't check your errors based on the results alone. Were you able to successfully run the checkpoint steps in your training? It is recommended to follow this process completely for inference. Also, you can wait for us to release the online gradio demo if you still in trouble with the AttributeError. |
No, I can't successfully run the checkpoint steps in my training stage. So, when will the online gradio demo be released?Thanks~ |
We will release it as soon as possible, please be patient.:) |
when I run "python train.py --config ./configs/config.yaml", I got the flowing error:
File "/home/ubuntu/us/project/MotionInversion/train.py", line 463, in
main(config)
File "/home/ubuntu/us/project/MotionInversion/train.py", line 407, in main
log_validation(
File "/home/ubuntu/us/project/MotionInversion/train.py", line 84, in log_validation
video_frames = pipeline(
File "/home/ubuntu/anaconda3/envs/sdwebui/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/ubuntu/anaconda3/envs/sdwebui/lib/python3.10/site-packages/diffusers/pipelines/text_to_video_synthesis/pipeline_text_to_video_synth.py", line 644, in call
prompt_embeds, negative_prompt_embeds = self.encode_prompt(
File "/home/ubuntu/anaconda3/envs/sdwebui/lib/python3.10/site-packages/diffusers/pipelines/text_to_video_synthesis/pipeline_text_to_video_synth.py", line 290, in encode_prompt
prompt_embeds = self.text_encoder(text_input_ids.to(device), attention_mask=attention_mask)
File "/home/ubuntu/anaconda3/envs/sdwebui/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ubuntu/anaconda3/envs/sdwebui/lib/python3.10/site-packages/accelerate/utils/operations.py", line 581, in forward
return model_forward(*args, **kwargs)
File "/home/ubuntu/anaconda3/envs/sdwebui/lib/python3.10/site-packages/accelerate/utils/operations.py", line 569, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/home/ubuntu/anaconda3/envs/sdwebui/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
return func(*args, **kwargs)
File "/home/ubuntu/anaconda3/envs/sdwebui/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 818, in forward
return_dict = return_dict if return_dict is not None else self.config.use_return_dict
AttributeError: 'Tensor' object has no attribute 'config'
And my diffusers==0.26.3 transformers==4.27.4
When I print "self" in File "/home/ubuntu/anaconda3/envs/sdwebui/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 818,
I found self is CLIPTextModel when not at checkpointing_steps as follows:
CLIPTextModel(
(text_model): CLIPTextTransformer(
(embeddings): CLIPTextEmbeddings(
(token_embedding): Embedding(49408, 1024)
(position_embedding): Embedding(77, 1024)
)
(encoder): CLIPEncoder(
(layers): ModuleList(
(0-22): 23 x CLIPEncoderLayer(
(self_attn): CLIPAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(layer_norm1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(mlp): CLIPMLP(
(activation_fn): GELUActivation()
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
)
(layer_norm2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
)
)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
)
But "self" is a tensor at checkpointing_steps as follows:
tensor([[49406, 320, 31777, 15939, 2528, 320, 1305, 3980, 49407, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0]], device='cuda:0')
The text was updated successfully, but these errors were encountered: