Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sizes of tensors must match except in dimension 1. Expected size 8 but got size 4 for tensor number 1 in the list. #182

Open
canrly opened this issue Aug 28, 2024 · 6 comments

Comments

@canrly
Copy link

canrly commented Aug 28, 2024

[Debug] Generate image using aspect ratio [Instagram (1:1)] => 1024 x 1024
Start inference...
[Debug] Prompt: instagram photo, portrait photo of a woman img, colorful, perfect face, natural skin, hard shadows, film grain,
[Debug] Neg Prompt: (asymmetry, worst quality, low quality, illustration, 3d, 2d, painting, cartoons, sketch), open mouth
10

Traceback (most recent call last):
/photomaker/model.py", line 49, in fuse_fn
stacked_id_embeds = torch.cat([prompt_embeds, id_embeds], dim=-1)
last line
Sizes of tensors must match except in dimension 1. Expected size 8 but got size 4 for tensor number 1 in the list.

@rudy2steiner
Copy link

met the same issue

@channyi
Copy link

channyi commented Aug 29, 2024

+1

1 similar comment
@codewritz-yuri
Copy link

+1

@hotpot-killer
Copy link

same issue

@YIYANGCAI
Copy link

I found out why prompt_embeds's dim zero is always 2 times of id_embeds. This is because the num_tokens = 2. Could anyone give a hint of this parameter's correspondence in the original paper?

I think according its original paper's stacking strategy, shouldn't the prompt_embeds be calculated out of the expansion of the embedding of token of "man" or "woman" ([1x2048]) to [id_num x 2048], and be concated with id_embeds to be [id_num, 4096] then be processed by MLPs? However, in the code's implementation, prompt_embeds are sliced from the original text_embedding with the length of (id_num * num_token).

@Sooplex
Copy link

Sooplex commented Sep 18, 2024

image
image

The image is from update Photomaker v2
V2 seems different from the paper(which may be the so-called 'V1'),that makes the updated code implementation incompatible with V1.

@YIYANGCAI I found out why prompt_embeds's dim zero is always 2 times of id_embeds. This is because the num_tokens = 2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants