You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The HunyuanVideoPipeline (and I believe the model itself) does not support execution with a batch size > 1. There are some shape mismatches in the attention calculation. Trying to set the batch size to 2 will result in an error like this:
importtorchfromdiffusersimportHunyuanVideoPipeline, HunyuanVideoTransformer3DModelfromdiffusers.utilsimportexport_to_videomodel_id="hunyuanvideo-community/HunyuanVideo"transformer=HunyuanVideoTransformer3DModel.from_pretrained(
model_id, subfolder="transformer", torch_dtype=torch.bfloat16
)
pipe=HunyuanVideoPipeline.from_pretrained(model_id, transformer=transformer, torch_dtype=torch.float16)
# Enable memory savingspipe.vae.enable_tiling()
pipe.enable_model_cpu_offload()
output=pipe(
prompt="A cat walks on the grass, realistic",
height=320,
width=512,
num_frames=61,
num_inference_steps=30,
num_videos_per_prompt=2, # <--- This is the only line I changed
).frames[0]
export_to_video(output, "output.mp4", fps=15)
Logs
File "H:\stable-diffusion\one-trainer\venv\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "H:\stable-diffusion\one-trainer\venv\src\diffusers\src\diffusers\pipelines\hunyuan_video\pipeline_hunyuan_video.py", line 647, in __call__
noise_pred = self.transformer(
File "H:\stable-diffusion\one-trainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "H:\stable-diffusion\one-trainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "H:\stable-diffusion\one-trainer\venv\lib\site-packages\accelerate\hooks.py", line 170, in new_forward
output = module._old_forward(*args, **kwargs)
File "H:\stable-diffusion\one-trainer\venv\src\diffusers\src\diffusers\models\transformers\transformer_hunyuan_video.py", line 763, in forward
hidden_states, encoder_hidden_states = block(
File "H:\stable-diffusion\one-trainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "H:\stable-diffusion\one-trainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "H:\stable-diffusion\one-trainer\venv\src\diffusers\src\diffusers\models\transformers\transformer_hunyuan_video.py", line 478, in forward
attn_output, context_attn_output = self.attn(
File "H:\stable-diffusion\one-trainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "H:\stable-diffusion\one-trainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "H:\stable-diffusion\one-trainer\venv\src\diffusers\src\diffusers\models\attention_processor.py", line 588, in forward
return self.processor(
File "H:\stable-diffusion\one-trainer\venv\src\diffusers\src\diffusers\models\transformers\transformer_hunyuan_video.py", line 117, in __call__
hidden_states = F.scaled_dot_product_attention(
RuntimeError: The expanded size of the tensor (24) must match the existing size (2) at non-singleton dimension 1. Target sizes: [2, 24, 10496, 10496]. Tensor sizes: [2, 10496, 10496]
### System Info
- 🤗 Diffusers version: 0.33.0.dev0
- Platform: Windows-10-10.0.22631-SP0
- Running on Google Colab?: No
- Python version: 3.10.8
- PyTorch version (GPU?): 2.5.1+cu124 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 0.26.2
- Transformers version: 4.47.0
- Accelerate version: 1.0.1
- PEFT version: not installed
- Bitsandbytes version: 0.44.1
- Safetensors version: 0.4.5
- xFormers version: 0.0.28.post3
- Accelerator: NVIDIA RTX A5000, 24564 MiB
- Using GPU in script?: NVIDIA RTX A5000
- Using distributed or parallel set-up in script?: no
### Who can help?
_No response_
The text was updated successfully, but these errors were encountered:
Describe the bug
The HunyuanVideoPipeline (and I believe the model itself) does not support execution with a batch size > 1. There are some shape mismatches in the attention calculation. Trying to set the batch size to 2 will result in an error like this:
Reproduction
This example is directly taken from the model card of https://huggingface.co/hunyuanvideo-community/HunyuanVideo. The only change is the added line
num_videos_per_prompt=2,
Logs
The text was updated successfully, but these errors were encountered: