ViT-Large vs XResnet50 (using Adam and Ranger optimizers)

This repository is comprised of notebooks that contains code for testing Vision Transformer and XResnet50, both of them pre-trained, on the ImageWoof dataset, using Adam and Ranger optimizers.

Acknowledgement

The Objective

The objective of this project is to get a clear comparison between the performances of pre-trained Vision Transformer (here, ViT-Large) and pre-trained XResnet50, when they are fine-tuned using different (here, Adam and Ranger) optimizers, on the ImageWoof dataset. This project is helpful to people who want to use state-of-the-art pre-trained vision models, but have limited computational resources and are dependent on online environments such as Google Colab.

NB : For this project, ViT-Large was used as I (that is, Prakash Pandey) wanted to get the state-of-the-art model, and, also, because training ViT-Huge model threw 'CUDA Out of Memory' error even with batch size = 1, on Google Colab. So, I found ViT-Large to be the 'deepest' vision transformer that could be trained on Google Colab.

The ImageWoof dataset

The ImageWoof dataset is a subset of 10 classes from Imagenet that aren't so easy to classify, since they're all dog breeds. The breeds are: Australian terrier, Border terrier, Samoyed, Beagle, Shih-Tzu, English foxhound, Rhodesian ridgeback, Dingo, Golden retriever, Old English sheepdog.

The Models

ViT_Large :

The vision transformer, introduced here, has some variants, and ViT-Large is one of them. It comprises 24 layers and 307M parameters. For this project, I have used a pre-trained ViT-Large model.

XResnet50 :

This is a pre-trained Resnet50 model with some tricks based on Bag of Tricks for Resnet paper. There are few other tricks as well :

Mish - A new activation function that has shown fantastic results
Self-Attention - Bringing in ideas from GAN's into image classification
MaxBlurPool - Better generalization
Flatten + Anneal Scheduling - Mikhail Grankin
Label Smoothing Cross Entropy - A threshold base (were you close) rather than yes or no

The Optimizers

Adam :

Adam is a method for efficient stochastic optimization that only requires first-order gradients with little memory requirement. The method computes individual adaptive learning rates for different parameters from estimates of first and second moments of the gradients; the name Adam is derived from adaptive moment estimation.

Ranger :

Ranger is an optimizer based on two seperate papers :

The Result

1. Model based comparision :

A. Using Adam :

The pre-trained ViT-Large model achieved an accuracy of 81.29%, whereas, the pre-trained XResnet50 model achieved 34.69%; both of them on the ImageWoof dataset.

B. Using Ranger :

The pre-trained ViT-Large model achieved an accuracy of 27.28%, whereas, the pre-trained XResnet50 model achieved 43.72%; both of them on the ImageWoof dataset.

2. Optimizer based comparision :

A. Using pre-trained ViT-Large model :

The pre-trained ViT-Large model achieved an accuracy of 81.29% with Adam, whereas, 27.28% with Ranger; both of them on the ImageWoof dataset.

B. Using pre-trained XResnet50 model :

The pre-trained XResnet50 model achieved an accuracy of 34.69% with Adam, whereas, 43.72% with Ranger; both of them on the ImageWoof dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
README.md		README.md
ViT_Large_adam.ipynb		ViT_Large_adam.ipynb
Vit_Large_ranger.ipynb		Vit_Large_ranger.ipynb
xresnet50_adam.ipynb		xresnet50_adam.ipynb
xresnet50_ranger.ipynb		xresnet50_ranger.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ViT-Large vs XResnet50 (using Adam and Ranger optimizers)

Acknowledgement

The Objective

The ImageWoof dataset

The Models

ViT_Large :

XResnet50 :

The Optimizers

Adam :

Ranger :

The Result

1. Model based comparision :

A. Using Adam :

B. Using Ranger :

2. Optimizer based comparision :

A. Using pre-trained ViT-Large model :

B. Using pre-trained XResnet50 model :

Clearly, we see that the best combination of models and optimizers, used herein, is ViT-Large + Adam.

About

Releases

Packages

Languages

Ys-Prakash/Vit-vs-xresnet

Folders and files

Latest commit

History

Repository files navigation

ViT-Large vs XResnet50 (using Adam and Ranger optimizers)

Acknowledgement

The Objective

The ImageWoof dataset

The Models

ViT_Large :

XResnet50 :

The Optimizers

Adam :

Ranger :

The Result

1. Model based comparision :

A. Using Adam :

B. Using Ranger :

2. Optimizer based comparision :

A. Using pre-trained ViT-Large model :

B. Using pre-trained XResnet50 model :

Clearly, we see that the best combination of models and optimizers, used herein, is ViT-Large + Adam.

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages