Skip to content

Latest commit

 

History

History
30 lines (21 loc) · 1.04 KB

README.md

File metadata and controls

30 lines (21 loc) · 1.04 KB

DALL-E

About

Re-implementation of Dall-E.

Model

This repository is loosely based on the original DALL-E paper by OpenAI. Instead of using a GPT2/GPT3 like autoregressive transformer decoder architecture, it uses the Megabyte based model from lucidrains.

Method

  • Use VQ-VAE to encode and decode images.
  • Ingest text tokens and predict VQ-VAE Codes
    • Use megabyte model (Will also allow massive context length)
    • Just encode text using chars for now
    • Auto-regressively predict VQ-VAE codes from text tokens
  • CIFAR 10 results bad. Perhaps because VQ-VAE bad with images below 64x64, switching to Tiny ImageNet. (NOTE: There was issues processing data, nothing to do with CIFAR-10).

Datasets

Tiny ImageNet

  • Validate Tiny ImageNet captions and images (so they matchup)
    • Labels needed to be sorted same as the trainloader/testloader.
  • Overfit DALL-E model on one caption image pair.
  • Overfit DALL-E model on one batch of caption image pairs.