This repository contains the code for HTD introduced in the following paper:
Stochastic Gradient Descent with Hyperbolic-Tangent Decay on Classification (Accepted to WACV 2019)
Learning rate scheduler has been a critical issue in the deep neural network training. Several schedulers and methods have been proposed, including step decay scheduler, adaptive method, cosine scheduler and cyclical scheduler. This paper proposes a new scheduling method, named hyperbolic-tangent decay (HTD). We run experiments on several benchmarks such as: ResNet, Wide ResNet and DenseNet for CIFAR-10 and CIFAR-100 datasets, LSTM for PAMAP2 dataset, ResNet on ImageNet and Fashion-MNIST datasets. In our experiments, HTD outperforms step decay and cosine scheduler in nearly all cases, while requiring less hyperparameters than step decay, and more flexible than cosine scheduler.
- (If you want to train CIFAR datasets only) Install tensorflow and keras.
- (If you want to train ImageNet) Install Torch and required dependencies like cuDNN. See the instructions here for a step-by-step guide.
- Clone this repo:
https://github.com/BIGBALLON/HTD.git
├─ our_Net % Our CIFAR dataset training code
├─ fb.resnet.torch % [facebook/fb.resnet.torch]
└─ DenseNet % [liuzhuang13/DenseNet]
See the following examples. To run the training with ResNet, on CIFAR-10,
using step decay scheduler, simply run:
python train.py --batch_size 128 \
--epochs 200 \
--data_set cifar10 \
--learning_rate_method step_decay \
--network resnet \
--log_path ./logs \
--network_depth 5
using other learning rate scheduler(cos or tanh), by changing --learning_rate_method
flag:
python train.py --batch_size 128 \
--epochs 200 \
--data_set cifar10 \
--learning_rate_method tanh \
--network resnet \
--log_path ./logs \
--network_depth 5 \
--tanh_begin -4.0 \
--tanh_end 4.0
The table below shows the results of HTD on CIFAR datasets. Best results are written in blue.
The character * indicates results are directly obtained from the original paper.
The Torch models are trained under the same setting as in fb.resnet.torch. Best results are written in blue.
The character * indicates results are directly obtained from the original paper.
fm.bigballon at gmail.com
byshiue at gmail.com
If you use our code, please consider citing the technical report as follows:
@inproceedings{hsueh2019stochastic,
title={Stochastic Gradient Descent with Hyperbolic-Tangent Decay on Classification},
author={Hsueh, Bo-Yang and Li, Wei and Wu, I-Chen},
booktitle={2019 IEEE Winter Conference on Applications of Computer Vision (WACV)},
pages={435--442},
year={2019},
organization={IEEE}
}
Please feel free to contact us if you have any discussions, suggestions or questions!!