PyTorch Implementation of Stand-Alone Self-Attention in Vision Models
This is NOT an official implementation. Please let me know whether this implementation contains any misreadings of the original paper.
- Python +3.6
- pytorch +1.1.0
- scipy
- Pillow
- torchvision
Trained with ImageNet. (WIP: CIFAR-10, CIFAR-100)
Backbone network and parameters are based on the official torchvision ResNet and trainer example.
Trained up to 90 epochs / batch 64 on a single NVIDIA 1080Ti GPU, with SGD optimizer with a learning rate of 0.1 which is linearly warmed up for 10 epochs followed by cosine decay. (according to the SASA paper)