Hunyuan Video Batch Size > 1 is broken again #10542

Nerogar · 2025-01-12T22:06:12Z

Describe the bug

I reported this previously in #10453, and a fix was merged in #10454. But now after #10482 was merged, I get a similar error again.

Reproduction

(copied from the privious issue report)

import torch
from diffusers import HunyuanVideoPipeline, HunyuanVideoTransformer3DModel
from diffusers.utils import export_to_video

model_id = "hunyuanvideo-community/HunyuanVideo"
transformer = HunyuanVideoTransformer3DModel.from_pretrained(
    model_id, subfolder="transformer", torch_dtype=torch.bfloat16
)
pipe = HunyuanVideoPipeline.from_pretrained(model_id, transformer=transformer, torch_dtype=torch.float16)

# Enable memory savings
pipe.vae.enable_tiling()
pipe.enable_model_cpu_offload()

output = pipe(
    prompt="A cat walks on the grass, realistic",
    height=320,
    width=512,
    num_frames=1,
    num_inference_steps=30,
    num_videos_per_prompt=2,
).frames[0]
export_to_video(output, "output.mp4", fps=15)

Logs

Traceback (most recent call last):
  File "H:\stable-diffusion\one-trainer\venv\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "H:\stable-diffusion\one-trainer\venv\src\diffusers\src\diffusers\pipelines\hunyuan_video\pipeline_hunyuan_video.py", line 651, in __call__
    noise_pred = self.transformer(
  File "H:\stable-diffusion\one-trainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "H:\stable-diffusion\one-trainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "H:\stable-diffusion\one-trainer\venv\lib\site-packages\accelerate\hooks.py", line 170, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "H:\stable-diffusion\one-trainer\venv\src\diffusers\src\diffusers\models\transformers\transformer_hunyuan_video.py", line 770, in forward
    hidden_states, encoder_hidden_states = block(
  File "H:\stable-diffusion\one-trainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "H:\stable-diffusion\one-trainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "H:\stable-diffusion\one-trainer\venv\src\diffusers\src\diffusers\models\transformers\transformer_hunyuan_video.py", line 478, in forward
    attn_output, context_attn_output = self.attn(
  File "H:\stable-diffusion\one-trainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "H:\stable-diffusion\one-trainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "H:\stable-diffusion\one-trainer\venv\src\diffusers\src\diffusers\models\attention_processor.py", line 588, in forward
    return self.processor(
  File "H:\stable-diffusion\one-trainer\venv\src\diffusers\src\diffusers\models\transformers\transformer_hunyuan_video.py", line 117, in __call__
    hidden_states = F.scaled_dot_product_attention(
RuntimeError: The expanded size of the tensor (24) must match the existing size (2) at non-singleton dimension 1.  Target sizes: [2, 24, 896, 896].  Tensor sizes: [2, 1, 896]

System Info

🤗 Diffusers version: 0.33.0.dev0
Platform: Windows-10-10.0.22631-SP0
Running on Google Colab?: No
Python version: 3.10.8
PyTorch version (GPU?): 2.5.1+cu124 (True)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Huggingface_hub version: 0.26.2
Transformers version: 4.47.0
Accelerate version: 1.0.1
PEFT version: not installed
Bitsandbytes version: 0.44.1
Safetensors version: 0.4.5
xFormers version: 0.0.28.post3
Accelerator: NVIDIA RTX A5000, 24564 MiB
Using GPU in script?: NVIDIA RTX A5000
Using distributed or parallel set-up in script?: no

Who can help?

No response

Who can help?

@a-r-r-o-w @hlky

sayakpaul · 2025-01-13T01:48:47Z

Think we could have a more general test suite for batching in video models.

Nerogar added the bug Something isn't working label Jan 12, 2025

hlky mentioned this issue Jan 13, 2025

Fix batch > 1 in HunyuanVideo #10548

Merged

DN6 closed this as completed in #10548 Jan 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hunyuan Video Batch Size > 1 is broken again #10542

Hunyuan Video Batch Size > 1 is broken again #10542

Nerogar commented Jan 12, 2025

sayakpaul commented Jan 13, 2025

Hunyuan Video Batch Size > 1 is broken again #10542

Hunyuan Video Batch Size > 1 is broken again #10542

Comments

Nerogar commented Jan 12, 2025

Describe the bug

Reproduction

Logs

System Info

System Info

Who can help?

Who can help?

sayakpaul commented Jan 13, 2025