-
Notifications
You must be signed in to change notification settings - Fork 276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unbelievable results of the PixelSnail network #58
Comments
I don't know the exact reason, but maybe top level codes is much more predictable or simple. (Maybe your data has large local correlations, so you don't need to use more higher abstractions.) Accuracy is just the probabilities that autoregressive model correctly predicts next tokens given previous sequences. I have added it just to have more metrics for model performances. For sampling speeds, it is very hard to accelerate the sampling from autoregressive models. I have tried caching mechanisms for PixelSNAIL, but it only about 2x faster. |
@rosinality Got it! Thanks, that's really helpful. |
Extending on the answer provided by @rosinality, this phenomenon is well-explored by Gallucci et al. in this paper (attached). There are situations when the top code might collapse to a single value which can result in the loss going down quickly to zero during training of the top PixelSNAIL. The authors used the codes by @rosinality and showed that the PixelSNAIL training can be made somewhat tractable by varying n_embed and embed_dim, based on the application. The generated images in the paper also look pretty reasonable. |
@sbhadra2020 Thanks for your useful advice! Indeed, I also found that the top code had collapsed to a single value. |
@ZhanYangen You're welcome. You previously posted some generated samples after training both the top and bottom PixelSNAIL networks. However, I can see that the generated samples have arbitrary shapes and do not look like the geometric training dataset. Did you see any improvement in the generated samples after the images you posted? I am curious to know since I am still struggling to get reasonable results from my PixelSNAIL training. |
@sbhadra2020 Pity that after a few days I posted this question, it occurred to me that VQ-VAE2 seemed to be ill-suited for the further application in my research project. The details are somewhat complicated. Thus I switched to the original VQ-VAE model and the data generation function of it was not put to use. So I did not dive deeper into the improvement of image generation. |
@rosinality @sbhadra2020 @ZhanYangen Hi,everyone!It seems that I meet the same problem. The reconstruction results are reasonable. However, the sample results are strange. So is the problem the top code?What changes should I do? |
Hi,
just now I successfully ran the train_pixelsnail.py on the top levels (size: 8×8) of my own dataset, which consists of 300,000 encoded outcome from images (size: 64×64) of different sizes, rotations and numbers of geometrical shapes. After merely 5 steps (batches, 32 samples per batch) in epoch 1, the accuracy reached 100%. The spyder console displayed following messages when running:
epoch: 1; loss: 2.26627; acc: 0.99561; lr: 0.00030: 0%| | 4/9375 [00:00<21:57, 7.11it/s]
Moreover, as the number of step goes up, the loss would drop below 0.00001. Such an amazing result makes me can't help but wander whether it is because this network is indeed too powerful or there was something wrong with my adjusted code. Also, if it turns out to be the first scenario, is this model definitely overfitted?
It is worth mentioning that the VQVAE-2 network also achieved amazing results. It took less than 5 minutes, specifically, 2 epochs over these 300,000 samples, to get following reconstruction images (the first row is original image, and the second is the reconstructed). Before this, I have already tested this dataset on a VAE network, but the results are far more obscure than that of this network, and took much longer to train it. So all of a sudden, it is kind of hard to accept this outcome...
One more question, in fact I'm not sure why there is an accuracy indicator of some classifier in this network... Does it have something to do with the Section 3.3 in the paper? Or it is the accuracy that the outcome of the encoder of VQVAE to be classified to the right quantized vector?
Thanks.
The text was updated successfully, but these errors were encountered: