Update pyramid_dit_for_video_gen_pipeline.py #100

Quasimondo · 2024-10-14T13:28:45Z

Several optimizations that try to reduce memory allocations (so far only implemented for image-to-video)
Tested it locally on my RTX 3090 and it seemed to reduce memory leakage, so that subsequent runs were possible without the machine locking up.

Several optimizations that try to reduce memory allocations (so far only implemented for image-to-video)

feifeiobama · 2024-10-14T13:55:12Z

Thank you for your contribution. I noticed that there are several changes in the file. Could you help me identify which are the critical ones related to memory leakage? I will merge them into the main branch.

Quasimondo · 2024-10-14T14:07:08Z

Oh yeah I realize that I should have made this in smaller steps .

There is one main improvement -which is the changes inside of generate_i2v() which pre-allocate the generated_latents tensor before the loop and thus avoiding to create a list which then needs to be concatenated.

In there I also delete a few objects after their use - not sure if it makes a difference since garbage collection should take care of them, but I don't think it makes it worse either.

The other smaller change is to sample_block_noise() which now generates that tensor directly on the GPU - unfortunately it has to do it in float since "cholesky_cusolver" not implemented for 'BFloat16')

There are several places where I replaced torch.cat([xy]*2) with repeat_interleave(2, dim=0) - not sure if that does much, but it also does not seem to hurt.

And there are one or two places where I changed a calculation to run in-place
-> latents.mul_(alpha).add_(noise, alpha=beta)

dillfrescott · 2024-10-14T14:58:56Z

No good sadly. Can't even make it past step 17 with this PR.

Traceback (most recent call last):
  File "text.py", line 23, in <module>
    frames = model.generate(
  File "C:\Users\cross\miniconda3\envs\pyramid\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\cross\Downloads\Pyramid-Flow\pyramid_dit\pyramid_dit_for_video_gen_pipeline.py", line 737, in generate
    intermed_latents = self.generate_one_unit(
  File "C:\Users\cross\miniconda3\envs\pyramid\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\cross\Downloads\Pyramid-Flow\pyramid_dit\pyramid_dit_for_video_gen_pipeline.py", line 288, in generate_one_unit
    noise_pred = self.dit(
  File "C:\Users\cross\miniconda3\envs\pyramid\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\cross\miniconda3\envs\pyramid\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\cross\Downloads\Pyramid-Flow\pyramid_dit\modeling_pyramid_mmdit.py", line 479, in forward
    encoder_hidden_states, hidden_states = block(
  File "C:\Users\cross\miniconda3\envs\pyramid\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\cross\miniconda3\envs\pyramid\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\cross\Downloads\Pyramid-Flow\pyramid_dit\modeling_mmdit_block.py", line 640, in forward
    attn_output, context_attn_output = self.attn(
  File "C:\Users\cross\miniconda3\envs\pyramid\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\cross\miniconda3\envs\pyramid\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\cross\Downloads\Pyramid-Flow\pyramid_dit\modeling_mmdit_block.py", line 548, in forward
    hidden_states, encoder_hidden_states = self.var_len_attn(
  File "C:\Users\cross\Downloads\Pyramid-Flow\pyramid_dit\modeling_mmdit_block.py", line 308, in __call__
    stage_hidden_states = F.scaled_dot_product_attention(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 13.68 GiB. GPU 0 has a total capacty of 23.99 GiB of which 12.46 GiB is free. Of the allocated memory 6.10 GiB is allocated by PyTorch, and 3.78 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Its using more and more memory every step till it uses all 24 GB and I run out.

Quasimondo · 2024-10-14T15:14:48Z

Well, can you run it without the patch and it works on your machine?

dillfrescott · 2024-10-14T15:27:54Z

Yes it works with or without the patch but in both cases it eventually runs out of memory and crashes.

Implemented the pre-allocation of generated_latents also in the generate() method

Quasimondo · 2024-10-14T15:38:30Z

Okay, it sounded like it does not work at all with the patch. Well, unfortunately this fix cannot work wonders. On my 24G I can do 31 frames at 384p, but I cannot do 768p at all (with or without patch)

dillfrescott · 2024-10-14T15:44:16Z

Oh. Gotcha!

Update pyramid_dit_for_video_gen_pipeline.py

363eb0b

Several optimizations that try to reduce memory allocations (so far only implemented for image-to-video)

Update pyramid_dit_for_video_gen_pipeline.py

0a3ea74

Implemented the pre-allocation of generated_latents also in the generate() method

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update pyramid_dit_for_video_gen_pipeline.py #100

Update pyramid_dit_for_video_gen_pipeline.py #100

Quasimondo commented Oct 14, 2024

feifeiobama commented Oct 14, 2024 •

edited

Loading

Quasimondo commented Oct 14, 2024

dillfrescott commented Oct 14, 2024

Quasimondo commented Oct 14, 2024

dillfrescott commented Oct 14, 2024

Quasimondo commented Oct 14, 2024

dillfrescott commented Oct 14, 2024

Update pyramid_dit_for_video_gen_pipeline.py #100

Are you sure you want to change the base?

Update pyramid_dit_for_video_gen_pipeline.py #100

Conversation

Quasimondo commented Oct 14, 2024

feifeiobama commented Oct 14, 2024 • edited Loading

Quasimondo commented Oct 14, 2024

dillfrescott commented Oct 14, 2024

Quasimondo commented Oct 14, 2024

dillfrescott commented Oct 14, 2024

Quasimondo commented Oct 14, 2024

dillfrescott commented Oct 14, 2024

feifeiobama commented Oct 14, 2024 •

edited

Loading