-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HunyuanVideoPipeline produces NaN values #10314
Comments
Transformer needs to be in bfloat16. Could you try with that? |
Same result @a-r-r-o-w |
On CUDA we've seen the same issue when not using the latest PyTorch, from |
Thanks for the suggestion @hlky , I'll try some more combinations.
|
Same, I also get nan value. |
@tanshuai0219 Is this on a CUDA GPU or MPS/ROCm? I'm unable to replicate when using the transformer in |
Yes, it's on a CUDA GPU, CUDA version: 12.4 Then I run: `import torch model_id = "hunyuanvideo-community/HunyuanVideo" output = pipe( import numpy as np print(np.array(output[0])) export_to_video(output, "output.mp4", fps=15)` np.array(output[0]) is all zero. output.mp4 |
Can you share the output of
output.mp4 |
here is mine:
|
If I upgrade the transformers from 4.46.3 to 4.48.0.dev0, I get the error like: |
I would recommend trying to replicate in a clean environment if you are current in a broken state. Atleast 5 people have confirmed so far that upgrading torch to 2.5.1 does not lead to black videos any more. We are still unsure why it doesn't work on 2.4 or below. |
I was not able to get a usable output with pytorch 2.5.1 either.
Hardware: AMD Instinct MI300X
`pip freeze````bash absl-py==2.1.0 accelerate==1.2.1 aiohappyeyeballs==2.4.4 aiohttp==3.11.9 aiosignal==1.3.1 amdsmi @ file:///opt/rocm-6.3.0/share/amd_smi apex @ file:///var/lib/jenkins/apex asgiref==3.8.1 astunparse==1.6.3 async-timeout==5.0.1 attrs==24.2.0 audioread==3.0.1 autocommand==2.2.2 backports.tarfile==1.2.0 boto3==1.19.12 botocore==1.22.12 cachetools==5.5.0 certifi==2024.8.30 cffi==1.17.1 charset-normalizer==3.4.0 click==8.1.7 colorama==0.4.6 coremltools==5.0b5 cryptography==44.0.0 Cython==3.0.11 decorator==5.1.1 Deprecated==1.2.15 -e git+https://github.com/huggingface/diffusers.git@1826a1e#egg=diffusers dill==0.3.7 Django==5.1.4 exceptiongroup==1.2.2 execnet==2.1.1 expecttest==0.2.1 fbscribelogger==0.1.6 filelock==3.16.1 flatbuffers==2.0 frozenlist==1.5.0 fsspec==2024.10.0 future==1.0.0 geojson==2.5.0 ghstack==0.8.0 google-auth==2.36.0 google-auth-oauthlib==1.0.0 grpcio==1.68.1 huggingface-hub==0.27.1 hypothesis==5.35.1 idna==3.10 image==1.5.33 imageio==2.36.1 imageio-ffmpeg==0.5.1 importlib_metadata==8.0.0 importlib_resources==6.4.0 inflect==7.3.1 iniconfig==2.0.0 jaraco.collections==5.1.0 jaraco.context==5.3.0 jaraco.functools==4.0.1 jaraco.text==3.12.1 Jinja2==3.1.4 jmespath==0.10.0 joblib==1.4.2 junitparser==2.1.1 lark==0.12.0 lazy_loader==0.4 librosa==0.10.2.post1 lintrunner==0.12.5 llvmlite==0.38.1 lxml==5.0.0 Markdown==3.7 MarkupSafe==3.0.2 ml_dtypes==0.5.0 more-itertools==10.3.0 mpmath==1.3.0 msgpack==1.1.0 multidict==6.1.0 mypy==1.10.0 mypy-extensions==1.0.0 networkx==2.8.8 numba==0.55.2 numpy==1.21.2 oauthlib==3.2.2 onnx==1.16.1 onnxscript==0.1.0.dev20240817 opencv-python==4.10.0.84 opt-einsum==3.3.0 optionloop==1.0.7 optree==0.12.1 packaging==24.2 pillow==10.3.0 platformdirs==4.3.6 pluggy==1.5.0 ply==3.11 pooch==1.8.2 propcache==0.2.1 protobuf==3.20.2 psutil==6.1.0 pyasn1==0.6.1 pyasn1_modules==0.4.1 pycparser==2.22 PyGithub==2.3.0 Pygments==2.15.0 PyJWT==2.10.1 PyNaCl==1.5.0 pytest==7.3.2 pytest-cpp==2.3.0 pytest-flakefinder==1.1.0 pytest-rerunfailures==14.0 pytest-xdist==3.3.1 python-dateutil==2.9.0.post0 PyWavelets==1.4.1 PyYAML @ file:///croot/pyyaml_1728657952215/work redis==5.2.0 regex==2024.11.6 requests==2.32.3 requests-oauthlib==2.0.0 rockset==1.0.3 rsa==4.9 s3transfer==0.5.2 safetensors==0.5.0 scikit-image==0.22.0 scikit-learn==1.5.2 scipy==1.10.1 sentencepiece==0.2.0 six @ file:///tmp/build/80754af9/six_1644875935023/work sortedcontainers==2.4.0 soundfile==0.12.1 soxr==0.5.0.post1 sqlparse==0.5.2 sympy==1.13.1 tb-nightly==2.13.0a20230426 tensorboard==2.13.0 tensorboard-data-server==0.7.2 threadpoolctl==3.5.0 thriftpy2==0.5.2 tifffile==2024.9.20 tlparse==0.3.7 tokenizers==0.21.0 tomli==2.2.1 torch @ file:///var/lib/jenkins/pytorch/dist/torch-2.5.1%2Bgitabbfe77-cp310-cp310-linux_x86_64.whl#sha256=b5fecdb1e666ea7de99d5ca164c7dbe22f341f4bd07a288beeeddca65f2232be torchvision==0.20.0a0+afc54f7 tqdm==4.67.1 transformers==4.47.1 # Editable install with no version control (triton==3.1.0) -e /var/lib/jenkins/triton/python typeguard==4.3.0 typing_extensions==4.12.2 unittest-xml-reporting==3.2.0 urllib3==1.26.20 Werkzeug==3.1.3 wrapt==1.17.0 xdoctest==1.1.0 yarl==1.18.3 z3-solver==4.12.2.0 zipp==3.19.2 ``` |
Update to |
tested with
|
@smedegaard Could you test with these changes? diff --git a/src/diffusers/models/transformers/transformer_hunyuan_video.py b/src/diffusers/models/transformers/transformer_hunyuan_video.py
index 6cb97af9..84610471 100644
--- a/src/diffusers/models/transformers/transformer_hunyuan_video.py
+++ b/src/diffusers/models/transformers/transformer_hunyuan_video.py
@@ -713,15 +713,15 @@ class HunyuanVideoTransformer3DModel(ModelMixin, ConfigMixin, PeftAdapterMixin,
condition_sequence_length = encoder_hidden_states.shape[1]
sequence_length = latent_sequence_length + condition_sequence_length
attention_mask = torch.zeros(
- batch_size, sequence_length, sequence_length, device=hidden_states.device, dtype=torch.bool
- ) # [B, N, N]
+ batch_size, sequence_length, device=hidden_states.device, dtype=torch.bool
+ ) # [B, N]
effective_condition_sequence_length = encoder_attention_mask.sum(dim=1, dtype=torch.int) # [B,]
effective_sequence_length = latent_sequence_length + effective_condition_sequence_length
for i in range(batch_size):
- attention_mask[i, : effective_sequence_length[i], : effective_sequence_length[i]] = True
- attention_mask = attention_mask.unsqueeze(1) # [B, 1, N, N], for broadcasting across attention heads
+ attention_mask[i, : effective_sequence_length[i]] = True
+ attention_mask = attention_mask.unsqueeze(1) # [B, 1, N], for broadcasting across attention heads
# 4. Transformer blocks
if torch.is_grad_enabled() and self.gradient_checkpointing: I was able to generate successfully on CUDA with PyTorch 2.4.1 which is also known to produce NaN. output.mp4cc @a-r-r-o-w There's also a small performance gain Codeimport torch
from diffusers import HunyuanVideoPipeline, HunyuanVideoTransformer3DModel
from diffusers.utils import export_to_video
model_id = "hunyuanvideo-community/HunyuanVideo"
transformer = HunyuanVideoTransformer3DModel.from_pretrained(
model_id, subfolder="transformer", torch_dtype=torch.bfloat16
)
pipe = HunyuanVideoPipeline.from_pretrained(model_id, transformer=transformer, torch_dtype=torch.float16).to("cuda")
pipe.vae.enable_tiling()
output = pipe(
prompt="A cat walks on the grass, realistic",
height=320,
width=512,
num_frames=61,
num_inference_steps=30,
).frames[0]
export_to_video(output, "output.mp4", fps=15)
|
@hlky Thanks for the tip. I'm afraid it didn't fix the problem for me. I added your suggested changes to |
For clarity, here's my changes to @staticmethod
def numpy_to_pil(images: np.ndarray) -> List[PIL.Image.Image]:
"""
Convert numpy image array(s) to PIL images with validation.
Args:
images (np.ndarray): Image array in range [0, 1] with shape (N, H, W, C) or (H, W, C)
Returns:
List[PIL.Image.Image]: List of PIL images
Raises:
ValueError: If images contain invalid values
TypeError: If input is not a numpy array or has invalid shape/type
"""
if not isinstance(images, np.ndarray):
raise TypeError(f"Expected numpy array, got {type(images)}")
# Handle single image case
if images.ndim == 3:
images = images[None, ...]
elif images.ndim != 4:
raise ValueError(f"Expected 3D or 4D array, got {images.ndim}D")
# Check for NaN/inf before any operations
if np.any(np.isnan(images)):
raise ValueError("Image array contains NaN values")
if np.any(np.isinf(images)):
raise ValueError("Image array contains infinite values")
# Check value range
min_val = np.min(images)
max_val = np.max(images)
if min_val < 0 or max_val > 1:
raise ValueError(
f"Image values must be in range [0, 1], got range [{min_val}, {max_val}]"
)
try:
# Convert to uint8
images_uint8 = (images * 255).round().astype("uint8")
# Verify the conversion worked correctly
if np.any(np.isnan(images_uint8)):
raise ValueError("Conversion to uint8 produced NaN values")
except Exception as e:
raise ValueError(f"Failed to convert to uint8: {str(e)}")
try:
# Convert to PIL images
if images.shape[-1] == 1:
pil_images = [Image.fromarray(image.squeeze(), mode="L") for image in images_uint8]
else:
pil_images = [Image.fromarray(image) for image in images_uint8]
return pil_images
except Exception as e:
raise ValueError(f"Failed to create PIL images: {str(e)}") |
Could you double check with the PR #10482? I was able to generate the following on AMD Instinct MI300X using the PR branch. output.10.mp4output.9.mp4 |
Thanks @hlky and @a-r-r-o-w , we have confirmed on our side that it produces video images after the recent patch. |
Describe the bug
Running
diffusers.utils.export_to_video()
on the output ofHunyuanVideoPipeline
results inAfter adding some checks to
numpy_to_pil()
inimage_processor.py
I have confirmed that the output containsNaN
valuesReproduction
Logs
No response
System Info
GPU: AMD MI300X
Who can help?
@DN6 @a-r-r-o-w
The text was updated successfully, but these errors were encountered: