-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to get last-layer hidden states $$H_{link}$$ during testing ? #11
Comments
Thanks for your question. During testing, we rely on the LLM to interpret the users' input prompts and output the different routing tokens when needed. That's why we need to construct instruction templates for different tasks and finetune the LLM, which is specified in Sec.3.2 (1) and Appendix E. Here is an example for detection. |
During training, you will input the [DET] and corresponding super-link queries During testing, since the input prompt does not include the [DET] and Is it correct that during training, the downstream decoders receive Is there any inconsistency in the input of downstream decoders during training and testing? |
During testing, the LLM will output [DET], and we immediately append the This part is the code for handling the super-link queries, which works well for both training and testing:
|
1 similar comment
During testing, the LLM will output [DET], and we immediately append the This part is the code for handling the super-link queries, which works well for both training and testing:
|
Does this mean that during training, the earlier superlink embeddings do not attend to later ones due to the causal attention mask, but during test, different superlink embeddings get to attend to each other as one forward pass is used to get all their hidden states? |
During both training and testing, the LLM always uses the causal mask. |
hi, @wjn922 |
As mentioned in your paper, the Super-Link Queries are automatically added after the input embeddings of the routing token. However, during testing, users' input prompts do not include any routing token. How can you send the Super-Link Queries to MLLM and obtain the corresponding hidden states$H_{link}$ ?
The text was updated successfully, but these errors were encountered: