🤲🏽
- [x] MLP
- [x] Token Embedding
- [x] Batch Normalization
- [x] Layer Normalization
- [x] Logistic Regression - Theory
- [x] Gradient Descent/Autograd - Theory
- [x] Byte Pair Encoding
- [x] Rotary Positional Encoding
- [ ] Quanitization
- [-] LoRA Finetuning
- [-] QLoRA
- [ ] RLHF
- [ ] DPO
- [x] Attention
- [x] Decoder Transformer (GPT)
- [x] Encoder Transformer (BERT - NER/Sentiment Analysis)
- [-] Full Transformer / Cross Attention
- [x] Convolution / CNN
- [x] ResNet
- [ ] YOLO
- [-] RNN
- [-] GRU
- [-] LSTM
- [x] VAE
- [ ] GAN
- [ ] Diffusion Model
- [ ] Mamba
- [ ] KV Cache
- [x] Parallelization
- [ ] Vision Transformer
- [ ] UNet
- [ ] Scaling laws - Theory
- [x] Profiling
- [x] Flash attention
- [ ] *black-forest-labs/flux