SplitMixer: Fat Trimmed From MLP-like Models (Arxiv)
PyTorch implementation of the SplitMixer MLP model for visual recognition
The most important code is in splitmixer.py
. We trained SplitMixers (on ImageNet) using the timm
framework, which we copied from here.
For CIFAR-{10,100} trainings or standalone model definitions, please refer to the cifar notebook.
Inside pytorch-image-models
, we have made the following modifications:
- Added ConvMixers
- added
timm/models/splitmixer.py
- modified
timm/models/__init__.py
- added
Patch Size p=2, Kernel Size k=5
Model Name | Params (M) | FLOPS (M) | Acc |
---|---|---|---|
ConvMixer-256/8 | 0.60 | 152.6 | 94.17 |
SplitMixer-I 256/8 | 0.28 | 71.8 | 93.91 |
SplitMixer-II 256/8 | 0.17 | 46.2 | 92.25 |
SplitMixer-III 256/8 | 0.17 | 79.8 | 92.52 |
SplitMixer-IV 256/8 | 0.31 | 79.8 | 93.38 |
Patch Size p=2, Kernel Size k=5
Model Name | Params (M) | FLOPS (M) | Acc |
---|---|---|---|
ConvMixer-256/8 | 0.62 | 152.6 | 73.92 |
Splitixer-I 256/8 | 0.30 | 71.9 | 72.88 |
SplitMixer-II 256/8 | 0.19 | 46.2 | 70.44 |
SplitMixer-III 256/8 | 0.19 | 79.8 | 70.89 |
SplitMixer-IV 256/8 | 0.32 | 79.8 | 71.75 |
Patch Size p=7, Kernel Size k=7
Model Name | Params (M) | FLOPS (M) | Acc |
---|---|---|---|
ConvMixer-256/8 | 0.70 | 696 | 60.47 |
Splitixer-I 256/8 | 0.34 | 331 | 62.03 |
SplitMixer-II 256/8 | 0.24 | 229 | 59.33 |
SplitMixer-III 256/8 | 0.24 | 363 | 59.00 |
SplitMixer-IV 256/8 | 0.37 | 363 | 61.51 |
Patch Size p=7, Kernel Size k=7
Model Name | Params (M) | FLOPS (M) | Acc |
---|---|---|---|
ConvMixer-256/8 | 0.70 | 696 | 74.59 |
Splitixer-I 256/8 | 0.34 | 331 | 73.56 |
SplitMixer-II 256/8 | 0.24 | 229 | 71.74 |
SplitMixer-III 256/8 | 0.24 | 363 | 72.78 |
SplitMixer-IV 256/8 | 0.37 | 363 | 72.92 |
Stay Tuned!
If you use this code in your research, please cite this project.
@inproceedings{borji2022SplitMixer,
title={SplitMixer: Fat Trimmed From MLP-like Models},
author={Ali Borji and Sikun Lin},
booktitle={Arxiv},
year={2022},
url={https://arxiv.org/pdf/2207.10255.pdf}
}