-
Notifications
You must be signed in to change notification settings - Fork 640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
training is very slow for DALLE #82
Comments
I use the pretrained dVAE model released by OpenAI |
I did a hyperparameter sweep optimizing for loss (not runtime). Nevertheless, you'll find the average runtime for an A100 (a pretty good GPU) for about 12 runs (1200 iterations each) here. As you can see, a lower depth will provide you with lower runtimes, but you may take a hit in loss as well. These are all done with a batch size of twelve. Are you using the latest release? Your experience of needing 300 (is that a typo?! how long did that take?) epochs in order to get anywhere doesn't match mine at all. I generally achieve a result in a full 28 epochs training on a shuffle of my dataset containing about a million image-text pairs. How large is CUBS? The size of your dataset may be loosely correlated, but if you're shuffling it I would hope you get a result approaching reasonable reconstructions by at least 30,000 iterations. I realize that's a lot compared to other things but, well, you're running CLIP, the dVAE, all the while training a transformer. It's going to take awhile if you're not on the latest GPU. |
@afiaka87 As for my training, CUBS is a fine-grained dataset for birds classification, which has about 8000 images for training and 2500 for testing. Typically, I set the depth=8 and head=2 because my GPUs are not powerful enough for training a very large model. Moreover, I resize the images into 64x64 to make the training faster. Finally, a epoch costs about 3 minutes. Thanks for your sharing the optimized parameters. I will try your optimized parameters and post the results if I succeed. |
Are you aware of the the |
yeah, I set the 'reversible' parameter. Your suggestions are very help. I believe I can get some meaningful results in several days. |
Fantastic! Let me know if you find useful parameters for your configuration! |
@smallflyingpig We now have the VQGAN working! It's not a panacea, but it runs significantly faster and with significantly less memory. Plese check the README for how to use it. I've confirmed this usage in another closed issue:
|
I am training the DALLE model with 8 layers transformer (lr=3e-4) on CUB dataset, but the training is very slow. the loss almost doesnot decrease for about 300 epochs. Is there any tricks to train the model? Anyone successes to get a well-trained DALLE model?
The text was updated successfully, but these errors were encountered: