Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"if any Pokémon enthusiasts feel like writing some captions manually please get in touch!" #40

Open
torridgristle opened this issue Nov 16, 2022 · 0 comments

Comments

@torridgristle
Copy link

With regard to "But if any Pokémon enthusiasts feel like writing some captions manually please get in touch!", if there's a way to do this from a webpage on a phone I'll do this. I can't promise I'll do a lot, but if it's easy enough to get into then perhaps other people will join in.

If each image had multiple captions (either from augmented images put through BLIP or from actual people) perhaps training with all of the encoded captions blended together within the attention of the model a la MixFeat blending features in a hidden state (rather than each caption independently) will create a more expressive model. Not blending the conditioning output with the 77 tokens, since that would only blend by token in the same position.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant