-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update pyramid_dit_for_video_gen_pipeline.py #100
base: main
Are you sure you want to change the base?
Conversation
Several optimizations that try to reduce memory allocations (so far only implemented for image-to-video)
Thank you for your contribution. I noticed that there are several changes in the file. Could you help me identify which are the critical ones related to memory leakage? I will merge them into the main branch. |
Oh yeah I realize that I should have made this in smaller steps . There is one main improvement -which is the changes inside of generate_i2v() which pre-allocate the generated_latents tensor before the loop and thus avoiding to create a list which then needs to be concatenated. In there I also delete a few objects after their use - not sure if it makes a difference since garbage collection should take care of them, but I don't think it makes it worse either. The other smaller change is to sample_block_noise() which now generates that tensor directly on the GPU - unfortunately it has to do it in float since "cholesky_cusolver" not implemented for 'BFloat16') There are several places where I replaced torch.cat([xy]*2) with repeat_interleave(2, dim=0) - not sure if that does much, but it also does not seem to hurt. And there are one or two places where I changed a calculation to run in-place |
No good sadly. Can't even make it past step 17 with this PR.
Its using more and more memory every step till it uses all 24 GB and I run out. |
Well, can you run it without the patch and it works on your machine? |
Yes it works with or without the patch but in both cases it eventually runs out of memory and crashes. |
Implemented the pre-allocation of generated_latents also in the generate() method
Okay, it sounded like it does not work at all with the patch. Well, unfortunately this fix cannot work wonders. On my 24G I can do 31 frames at 384p, but I cannot do 768p at all (with or without patch) |
Oh. Gotcha! |
Several optimizations that try to reduce memory allocations (so far only implemented for image-to-video)
Tested it locally on my RTX 3090 and it seemed to reduce memory leakage, so that subsequent runs were possible without the machine locking up.