Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create faces with stylegan and dalle? #27

Open
molo32 opened this issue Jan 24, 2021 · 12 comments
Open

create faces with stylegan and dalle? #27

molo32 opened this issue Jan 24, 2021 · 12 comments

Comments

@molo32
Copy link

molo32 commented Jan 24, 2021

amazing job, I was wondering if anyone has a colab notebook to create faces from text with or without stylegan2 by ex.

@lucidrains
Copy link
Owner

@molo32 yes! it is now possible, but DALLE is not needed. by simply combining CLIP with Stylegan2, we can now summon images from the latent space of a trained generator. I will add it to my stylegan2 repository https://github.com/lucidrains/stylegan2-pytorch in due time :) focused on equivariant attention for alphafold2 at the moment

@lucidrains
Copy link
Owner

@molo32 somebody else already did it :) you can try it at https://twitter.com/advadnoun/status/1353453719510163459?s=20

@powderblock
Copy link

@molo32 somebody else already did it :) you can try it at https://twitter.com/advadnoun/status/1353453719510163459?s=20

wow this is awesome!!! any idea how to use this for faces?

@lucidrains
Copy link
Owner

@lucidrains
Copy link
Owner

@powderblock it'll work with any generator! the latent space has suddenly become infinitely more traversable, by way of another neural network as the guide :)

@rom1504
Copy link
Contributor

rom1504 commented Jan 25, 2021

Is this done simply by generating a batch of images with a generator, ranking them with clip and trying again randomly until the dot product is high enough ?
Or could this be done instead by retropropagating the error like stylegan encoder is doing ? (https://github.com/Puzer/stylegan-encoder)

@lucidrains
Copy link
Owner

@rom1504 both!

@rom1504
Copy link
Contributor

rom1504 commented Jan 25, 2021

Using a multi modal encoder trained for similarity and a generator in that way seems like a really powerful idea.
I wonder if clip would work to generate very accurate descriptions too (take a picture, use a language model to generate text, retropropagate... Until you get good enough dot product with the picture)
And more generally, if we have more such multi modal encoders (text audio, text 3d model, ...), it seems to open the gate to generating almost anything.

@lucidrains
Copy link
Owner

lucidrains commented Jan 25, 2021

@rom1504 if you follow Mario @ quasimodo on twitter, he was able to coax out text from CLIP. i think the surer thing to do is to rank text generation from existing caption to text transformers, as an alternative to beam search

@lucidrains
Copy link
Owner

@rom1504 yes, multimodal is here, attention was all we need

@rom1504
Copy link
Contributor

rom1504 commented Jan 25, 2021

Is that https://twitter.com/Quasimodo ? Seems to be unavailable

@lucidrains
Copy link
Owner

oops, @ quasimondo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants