You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With regard to "But if any Pokémon enthusiasts feel like writing some captions manually please get in touch!", if there's a way to do this from a webpage on a phone I'll do this. I can't promise I'll do a lot, but if it's easy enough to get into then perhaps other people will join in.
If each image had multiple captions (either from augmented images put through BLIP or from actual people) perhaps training with all of the encoded captions blended together within the attention of the model a la MixFeat blending features in a hidden state (rather than each caption independently) will create a more expressive model. Not blending the conditioning output with the 77 tokens, since that would only blend by token in the same position.
The text was updated successfully, but these errors were encountered:
With regard to "But if any Pokémon enthusiasts feel like writing some captions manually please get in touch!", if there's a way to do this from a webpage on a phone I'll do this. I can't promise I'll do a lot, but if it's easy enough to get into then perhaps other people will join in.
If each image had multiple captions (either from augmented images put through BLIP or from actual people) perhaps training with all of the encoded captions blended together within the attention of the model a la MixFeat blending features in a hidden state (rather than each caption independently) will create a more expressive model. Not blending the conditioning output with the 77 tokens, since that would only blend by token in the same position.
The text was updated successfully, but these errors were encountered: