Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unbelievable results of the PixelSnail network #58

Open
ZhanYangen opened this issue Jan 19, 2021 · 8 comments
Open

Unbelievable results of the PixelSnail network #58

ZhanYangen opened this issue Jan 19, 2021 · 8 comments

Comments

@ZhanYangen
Copy link

ZhanYangen commented Jan 19, 2021

Hi,
just now I successfully ran the train_pixelsnail.py on the top levels (size: 8×8) of my own dataset, which consists of 300,000 encoded outcome from images (size: 64×64) of different sizes, rotations and numbers of geometrical shapes. After merely 5 steps (batches, 32 samples per batch) in epoch 1, the accuracy reached 100%. The spyder console displayed following messages when running:
epoch: 1; loss: 2.26627; acc: 0.99561; lr: 0.00030: 0%| | 4/9375 [00:00<21:57, 7.11it/s]
Moreover, as the number of step goes up, the loss would drop below 0.00001. Such an amazing result makes me can't help but wander whether it is because this network is indeed too powerful or there was something wrong with my adjusted code. Also, if it turns out to be the first scenario, is this model definitely overfitted?
It is worth mentioning that the VQVAE-2 network also achieved amazing results. It took less than 5 minutes, specifically, 2 epochs over these 300,000 samples, to get following reconstruction images (the first row is original image, and the second is the reconstructed). Before this, I have already tested this dataset on a VAE network, but the results are far more obscure than that of this network, and took much longer to train it. So all of a sudden, it is kind of hard to accept this outcome...
image

One more question, in fact I'm not sure why there is an accuracy indicator of some classifier in this network... Does it have something to do with the Section 3.3 in the paper? Or it is the accuracy that the outcome of the encoder of VQVAE to be classified to the right quantized vector?

Thanks.

@ZhanYangen
Copy link
Author

Well, here is some update about my status. I trained the PixelSnail network on the bottom level, and found that after 1 epoch the loss maintained to be around 0.7 and 80% or so for the accuracy.
Finally the generated samples are as follows:
image
The huge difference between the effect on top and bottom level still confuses me a little...

and I found that it took about 5 seconds to generate 16 images, but 26 seconds to generate 200 images. Except this non-proportional relation between image number and the time cost, I wander if there is any way to speed up the generation process? Because the vanilla VAE network can easily generate thousands of images in a few seconds. Or is the slow speed due to the innate property of PixelCNN and thus is meant to be much slower?

@rosinality
Copy link
Owner

I don't know the exact reason, but maybe top level codes is much more predictable or simple. (Maybe your data has large local correlations, so you don't need to use more higher abstractions.)

Accuracy is just the probabilities that autoregressive model correctly predicts next tokens given previous sequences. I have added it just to have more metrics for model performances.

For sampling speeds, it is very hard to accelerate the sampling from autoregressive models. I have tried caching mechanisms for PixelSNAIL, but it only about 2x faster.

@ZhanYangen
Copy link
Author

@rosinality Got it! Thanks, that's really helpful.

@sbhadra2020
Copy link

sbhadra2020 commented Apr 7, 2021

Extending on the answer provided by @rosinality, this phenomenon is well-explored by Gallucci et al. in this paper (attached). There are situations when the top code might collapse to a single value which can result in the loss going down quickly to zero during training of the top PixelSNAIL. The authors used the codes by @rosinality and showed that the PixelSNAIL training can be made somewhat tractable by varying n_embed and embed_dim, based on the application. The generated images in the paper also look pretty reasonable.
vqvae2_gallucci.pdf

@ZhanYangen
Copy link
Author

@sbhadra2020 Thanks for your useful advice! Indeed, I also found that the top code had collapsed to a single value.

@sbhadra2020
Copy link

@ZhanYangen You're welcome. You previously posted some generated samples after training both the top and bottom PixelSNAIL networks. However, I can see that the generated samples have arbitrary shapes and do not look like the geometric training dataset. Did you see any improvement in the generated samples after the images you posted? I am curious to know since I am still struggling to get reasonable results from my PixelSNAIL training.

@ZhanYangen
Copy link
Author

@sbhadra2020 Pity that after a few days I posted this question, it occurred to me that VQ-VAE2 seemed to be ill-suited for the further application in my research project. The details are somewhat complicated. Thus I switched to the original VQ-VAE model and the data generation function of it was not put to use. So I did not dive deeper into the improvement of image generation.

@ZhouCX117
Copy link

@rosinality @sbhadra2020 @ZhanYangen Hi,everyone!It seems that I meet the same problem. The reconstruction results are reasonable. However, the sample results are strange. So is the problem the top code?What changes should I do?

Sample‘s results are as follows:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants