You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The shape of the provided features are [64, 257, 1408], for these provided features, I have the following questions:
(1) What are 257 and 1408 mean? Does 257 indicate the number of tokens of each frame and 1408 indicate feature dim?
(2) Can I only use the feature representation of cls token of each frame when training the model and evaluating model performance? The size of the complete feature is about 16T, I don't have enough storage space to restore the complete feature.
The text was updated successfully, but these errors were encountered:
Hello,
Greetings for this wonderful work!
The shape of the provided features are [64, 257, 1408], for these provided features, I have the following questions:
(1) What are 257 and 1408 mean? Does 257 indicate the number of tokens of each frame and 1408 indicate feature dim?
(2) Can I only use the feature representation of cls token of each frame when training the model and evaluating model performance? The size of the complete feature is about 16T, I don't have enough storage space to restore the complete feature.
The text was updated successfully, but these errors were encountered: