Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How decoder layers take into account time embedding? #4

Open
vadimkantorov opened this issue Jul 15, 2023 · 0 comments
Open

How decoder layers take into account time embedding? #4

vadimkantorov opened this issue Jul 15, 2023 · 0 comments

Comments

@vadimkantorov
Copy link

vadimkantorov commented Jul 15, 2023

It's not very clear in the paper how time embedding affects the decoder layers. Do I understand correctly that every DDIM step involve calling all 9 decoder layers?

Am I right that time embedding does scale and shift transformer embeddings? Is it the only use of the time? Are there any ablations on its influence? https://github.com/cp3wan/DFormer/blob/main/dformer/modeling/transformer_decoder/dformer_transformer_decoder.py#L438-L442:

#biases the query
scale_shift = self.block_time_mlp(time).unsqueeze(0)
scale_shift = scale_shift.type(torch.float32)
scale_shift = torch.repeat_interleave(scale_shift, self.num_queries, dim=0)
scale, shift = scale_shift.chunk(2, dim=2)
output = output * (scale + 1) + shift

Given that the multiple diffusion steps in inference do not improve the result, do you actually use "diffusion" in inference? If so, which time step value are you using for this 1-step process?
image

I found these two lines:

cfg.MODEL.DFORMER.SAMPLE_STEP=1

timesteps = 1000

Is single step used only for inference? and in training max timestep of 1000 is used?

Would the benefit of multistep inference diffusion be larger if fewer layers were used in the decoder?

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant