This project implements Vision Transformers (ViT) using PyTorch to classify images from the CIFAR-10 dataset. It includes pre-trained models like ViT and CaiT, fine-tuned on CIFAR-10, demonstrating how transformers can be adapted for image classification.
- Utilizes pre-trained Vision Transformer (ViT) and Class-Attention in Image Transformers (CaiT) models.
- Supports fine-tuning of transformer models on the CIFAR-10 dataset.
- Visualizes training and validation loss, accuracy, and confusion matrices.
- Demonstrates data preprocessing and augmentation techniques for image data.
- Evaluates model performance with metrics such as F1-score, recall, accuracy, and precision.
- Clone the repository from GitHub.
- Navigate to the project directory.
- Install the required dependencies listed in the
requirements.txt
file.
The CIFAR-10 dataset is used, consisting of 60,000 32x32 color images in 10 different classes, with 6,000 images per class. The dataset is automatically downloaded and pre-processed for training and testing.
The training process involves fine-tuning the pre-trained Vision Transformer models on the CIFAR-10 dataset. The models are adjusted to work with the smaller image size and class count of CIFAR-10.
After training, the model's performance is evaluated on the test set of CIFAR-10. Metrics like accuracy, F1-score, recall, and precision are computed to assess the model.
Results are documented through confusion matrices, loss, and accuracy plots. These visualizations help in understanding the model's performance and areas of improvement.
This project is licensed under the MIT License - see the LICENSE file for details.
- Thanks to the creators of the CIFAR-10 dataset for providing the resources necessary for training and testing the model.
- PyTorch and timm library documentation for providing comprehensive guides and tutorials.
@misc{MJVisionTransformers2023, author = {Mohammad Javad (MJ) Ahmadi}, title = {Vision Transformers}, year = {2023}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/MJAHMADEE/Vision_Transformers}} }
For more information, please refer to the official repository.