[no issue] Congratulation and a question how to break the 64 / 61 frames? #9

ibrainventures · 2024-08-23T16:31:55Z

Hi,
thank you very much for your work and make this accesable to the world.
I tested the last 20 hours many generations (mostly realistic) and i am very impressed about
the results. The results from "people action" dedicated prompts are looking absolute realistic.

No morph-style, so its great to see - what to squeeze out of the (good-old) 1.5 SD on Enduser Hardware (okay my vast-ai-ed 4090s are not typical :- ) EU ) ..

I tried to understand your paper and saw the interpolation and 3D UNET related solutions / experiments.

Questions:

A) How would you estimate the possibility for a > 10 sec (+250frames) or longer generations?
B) if based on the 64/61 frames - >by offloading and unrecognizable merging / stitching pipelines ?
C) By wider-stepped interpolation? - under less quality but longer "action" ?

Would be great to get a feedback and Chapeau to the Team! Great work!!

MaAo · 2024-08-24T05:22:42Z

Hi, thank you very much for your work and make this accesable to the world. I tested the last 20 hours many generations (mostly realistic) and i am very impressed about the results. The results from "people action" dedicated prompts are looking absolute realistic.

No morph-style, so its great to see - what to squeeze out of the (good-old) 1.5 SD on Enduser Hardware (okay my vast-ai-ed 4090s are not typical :- ) EU ) ..

I tried to understand your paper and saw the interpolation and 3D UNET related solutions / experiments.

Questions:

A) How would you estimate the possibility for a > 10 sec (+250frames) or longer generations? B) if based on the 64/61 frames - >by offloading and unrecognizable merging / stitching pipelines ? C) By wider-stepped interpolation? - under less quality but longer "action" ?

Would be great to get a feedback and Chapeau to the Team! Great work!!

Thank you for your attention and recognition of our work. Here are the answers to your questions:

A) To obtain more frames in a video, you have two options:

a) Using the current 61-frame video generation model, you can iteratively generate additional frames by using the end frame of the previous video as the reference for the next video.

b) We will release models in the future with more frames, such as 125 or more. However, keep in mind that these models will require more memory for inference.

B) I didn't fully understand your question B) . If my response does not completely address your issue, please clarify further, and I will do my best to assist you.

C) We are currently utilizing Video VAEs, which obviates the need for interpolation during model generation. For instance, with a latent space of (1,4,16,64,64), decoding the Video VAEs produces a video with dimensions of (1,3,61,512,512). The temporal dimension is computed as 4𝑛-3, where 𝑛 represents the number of frames in the latent space. Our research indicates that the current Video VAEs are constrained by the number of channels. Therefore, we plan to train 16-channel Video VAEs and integrate them into our project in the future.

ibrainventures changed the title ~~[no issue] Congratulation and how to break the 64 / 61 frames?~~ [no issue] Congratulation and a question how to break the 64 / 61 frames? Aug 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[no issue] Congratulation and a question how to break the 64 / 61 frames? #9

[no issue] Congratulation and a question how to break the 64 / 61 frames? #9

ibrainventures commented Aug 23, 2024 •

edited

Loading

MaAo commented Aug 24, 2024 •

edited

Loading

[no issue] Congratulation and a question how to break the 64 / 61 frames? #9

[no issue] Congratulation and a question how to break the 64 / 61 frames? #9

Comments

ibrainventures commented Aug 23, 2024 • edited Loading

MaAo commented Aug 24, 2024 • edited Loading

ibrainventures commented Aug 23, 2024 •

edited

Loading

MaAo commented Aug 24, 2024 •

edited

Loading