Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get last-layer hidden states $$H_{link}$$ during testing ? #11

Closed
xushilin1 opened this issue Jun 30, 2024 · 7 comments
Closed

Comments

@xushilin1
Copy link

As mentioned in your paper, the Super-Link Queries are automatically added after the input embeddings of the routing token. However, during testing, users' input prompts do not include any routing token. How can you send the Super-Link Queries to MLLM and obtain the corresponding hidden states $H_{link}$?

@wjn922
Copy link
Collaborator

wjn922 commented Jul 2, 2024

Thanks for your question.

During testing, we rely on the LLM to interpret the users' input prompts and output the different routing tokens when needed. That's why we need to construct instruction templates for different tasks and finetune the LLM, which is specified in Sec.3.2 (1) and Appendix E.

Here is an example for detection.
USER: Where can we locate the dog in the image?
ASSISTANT: The detection results for dog [DET] are presented.

@xushilin1
Copy link
Author

xushilin1 commented Jul 2, 2024

During training, you will input the [DET] and corresponding super-link queries $Q_{link}$ into LLM to obtain $H_{link}$, which is then sent to the downstream decoder.

During testing, since the input prompt does not include the [DET] and $Q_{link}$, so how can you get the $H_{link}$ ?

Is it correct that during training, the downstream decoders receive $H_{link}$ while during testing they receive $Q_{link}$?

Is there any inconsistency in the input of downstream decoders during training and testing?

@wjn922
Copy link
Collaborator

wjn922 commented Jul 2, 2024

During testing, the LLM will output [DET], and we immediately append the $Q_{link}$ after it. Then, in the current generation step, the input_embeds will expand from [1, C] to [1 + num_embeds, C]. We can still get the last-layer hidden states $H_{link}$ during testing.

This part is the code for handling the super-link queries, which works well for both training and testing:

# NOTE: special operation for the [emb] tokens, this works well for both train and generation (use_cache=True)

1 similar comment
@wjn922
Copy link
Collaborator

wjn922 commented Jul 2, 2024

During testing, the LLM will output [DET], and we immediately append the $Q_{link}$ after it. Then, in the current generation step, the input_embeds will expand from [1, C] to [1 + num_embeds, C]. We can still get the last-layer hidden states $H_{link}$ during testing.

This part is the code for handling the super-link queries, which works well for both training and testing:

# NOTE: special operation for the [emb] tokens, this works well for both train and generation (use_cache=True)

@pangzss
Copy link

pangzss commented Jul 8, 2024

During testing, the LLM will output [DET], and we immediately append the Qlink after it. Then, in the current generation step, the input_embeds will expand from [1, C] to [1 + num_embeds, C]. We can still get the last-layer hidden states Hlink during testing.

This part is the code for handling the super-link queries, which works well for both training and testing:

# NOTE: special operation for the [emb] tokens, this works well for both train and generation (use_cache=True)

Does this mean that during training, the earlier superlink embeddings do not attend to later ones due to the causal attention mask, but during test, different superlink embeddings get to attend to each other as one forward pass is used to get all their hidden states?

@wjn922
Copy link
Collaborator

wjn922 commented Jul 8, 2024

During both training and testing, the LLM always uses the causal mask.

@haofuly
Copy link

haofuly commented Oct 15, 2024

hi, @wjn922
Thanks for your detailed reply. I am not sure how does the [EMB] play a role during training. Can you show an example to illustrate what is the relationship between [EMB] and the emb_embeddings_det tensor? And where should we insert the emb_embeddings_det tensor into input_ids during training?
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants