You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! I found that the Modality Interaction Task section of the paper says to use visual modality as a query and text modality as key and value, but in the code you provided, in lines 497-499 of the “model_init.py” file, the first input of CrossAttention is 'text_tokens', and the first input of CrossAttention is as query . Is there any error in the provided code?
The text was updated successfully, but these errors were encountered:
Hi! I found that the Modality Interaction Task section of the paper says to use visual modality as a query and text modality as key and value, but in the code you provided, in lines 497-499 of the “model_init.py” file, the first input of CrossAttention is 'text_tokens', and the first input of CrossAttention is as query . Is there any error in the provided code?
The text was updated successfully, but these errors were encountered: