You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It's not very clear in the paper how time embedding affects the decoder layers. Do I understand correctly that every DDIM step involve calling all 9 decoder layers?
Given that the multiple diffusion steps in inference do not improve the result, do you actually use "diffusion" in inference? If so, which time step value are you using for this 1-step process?
It's not very clear in the paper how time embedding affects the decoder layers. Do I understand correctly that every DDIM step involve calling all 9 decoder layers?
Am I right that time embedding does scale and shift transformer embeddings? Is it the only use of the time? Are there any ablations on its influence? https://github.com/cp3wan/DFormer/blob/main/dformer/modeling/transformer_decoder/dformer_transformer_decoder.py#L438-L442:
Given that the multiple diffusion steps in inference do not improve the result, do you actually use "diffusion" in inference? If so, which time step value are you using for this 1-step process?
I found these two lines:
DFormer/dformer/config.py
Line 47 in d3eef80
DFormer/dformer/DFormer_model.py
Line 98 in d3eef80
Is single step used only for inference? and in training max timestep of 1000 is used?
Would the benefit of multistep inference diffusion be larger if fewer layers were used in the decoder?
Thank you!
The text was updated successfully, but these errors were encountered: