Skip to content

Repository for "Accelerating Neural Architecture Search using Performance Prediction" (ICLR Workshop 2018)

License

Notifications You must be signed in to change notification settings

MITAutoML/accelerating_nas

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Neural Network Performance Prediction Dataset

This repository contains datasets for neural network performance prediction. Each CSV file in the "data" directory contains architecture details, training hyperparamters, and per-epoch validation accuracy for a variety of neural networks trained on image classification and language modeling datasets. Please cite the following paper if you use this dataset:

Accelerating Neural Architecture Search using Performance Prediction
Bowen Baker*, Otkrist Gupta*, Ramesh Raskar, Nikhil Naik
International Conference on Learning Representations (ICLR) Workshop 2018
(* denotes equal contribution)

In this paper, we showed that simple regression models can predict the final performance of partially trained model configurations using features based on network architectures, hyperparameters, and partially-trained time-series validation performance data. Using these prediction models, we implemented an early stopping method for hyperparameter optimization and neural architecture search, which obtained a speedup of a factor up to 6x. Our early stopping method can be seamlessly incorporated into both reinforcement learning-based architecture search algorithms and bandit-based hyperparameter search.

We hope that this dataset encourages further research in neural network performance prediction.

Dataset Description

1. MetaQNN CNNs - CIFAR10 (metaqnn_cifar10.csv) and MetaQNN CNN - SVHN (metaqnn_svhn.csv)

We sampled CNN architectures from the search space detailed in the MetaQNN paper by Baker et al. (2017), which allows for varying the numbers and orderings of convolution, pooling, and fully connected layers. The model depths were between 1 and 12 layers for the SVHN experiment and between 1 and 18 layers for the CIFAR-10 experiment. Each model was trained for a total of 20 epochs with the Adam optimizer with β1 = 0.9, β2 = 0.999, and ε= 10-8. The batch size was set to 128. We reduced the learning rate by a factor of 0.2 every 5 epochs. The CSV files contain the following fields:

net: This is a string description of the trained neural network. For CNNs, C(n, f, l) denotes a Convolutional layer with n filters, receptive field size f, and stride l. P(f, l) denotes a Pooling layer with receptive field size f and stride l. SM(n) denotes a softmax layer with n outputs. GAP(n) denotes a Global Average Pooling layer with n outputs.

num_params: The total number of parameters in the neural network

depth: The total number of layers in the neural network, excluding GAP and SM layers

lr: Initial learning rate

acc_n: Validation set accuracy at epoch n

2. Deep ResNets - TinyImageNet: (deep_resnets_tinyimagenet.csv)

We sampled ResNet architectures and trained them on the TinyImageNet dataset (containing 200 classes with 500 training images of 32×32 pixels) for 140 epochs. We varied depths, filter sizes and number of convolutional filter block outputs. Each network was trained for 140 epochs, using Nesterov optimizer. For all models, learning rate reduction and momentum were set to 0.1 and 0.9 respectively. The CSV files contain the following fields:

num_blocks: Number of ResNet blocks in the model. Each ResNet block is composed of three convolutional layers followed by batch normalization and summation layers. We vary the number of blocks from 2 to 18, giving us networks with depths between 14 and 110 (since depth = 6*num_blocks + 2)

num_filters: Number of filters in each convolutional layer. Sampled from {2, 3, ..., 22}

filter_size: Filter size for each convolutional layer. Sampled from {3, 5, 7}

lr: Initial learning rate

ss: Learning rate reduction step size

acc_n: Validation set accuracy at epoch n. Note that acc_0 refers to the accuracy of a randomly initialized model.

3. Deep ResNets - CIFAR10: (deep_resnets_cifar10.csv)

We sampled ResNet architectures from a search space similar to Zoph & Le (2017). Each architecture consists of 39 layers: 12 conv, a 2x2 max pool, 9 conv, a 2x2 max pool, 15 conv, and softmax. Each conv layer is followed by batch normalization and ReLU. A ResNet block contains 3 conv layers, which share the same kernel width, kernel height, and number of learnable kernels. We varied these parameters across architectures. Each network was trained on the CIFAR-10 dataset for 50 epochs using the RMSProp optimizer, with variable initial learning rate, weight decay of 10−4, and a learning rate reduction to 10−5 at epoch 30. The CSV files contain the following fields:

num_params: The total number of parameters in the neural network.

lr: Initial learning rate

acc_n: Validation set accuracy at epoch n. Note that acc_0 refers to the accuracy of a randomly initialized model.

4. CudaConvnet - CIFAR10 (cudaconvnet_cifar10.csv) and CudaConvet - SVHN (metaqnn_svhn.csv)

We trained the Cuda-Convnet architecture with varying values of initial learning rate, learning rate reduction step size, weight decay for convolutional and fully connected layers, and scale and power of local response normalization layers. We trained models with CIFAR-10 for 60 epochs and with SVHN for 12 epochs. The CSV files contain the following fields:

lr: Initial learning rate

ss: Learning rate reduction step size

cn_l2: L_2 penalty of Conv"n" layer

f4_l2: L_2 penalty for FC4 layer

weight_decay: Weight decay parameter

lrn_scale: Response normalization scale

lrn_pow: Response normalization power

acc_n: Validation set accuracy at epoch n.

5. LSTM - Penn Treebank (lstm_ptb.csv)

We sampled LSTM models and trained them on the Penn Treebank dataset for 60 epochs. Both the number of hidden layer inputs and number of LSTM cells were varied between 10 to 1400 in steps of 20. Each network was trained with SGD for 60 epochs with batch size of 50 and dropout ratio of 0.5. We used a dictionary size of 400 words to generate embeddings when vectorizing the data. The CSV files contain the following fields:

n_inputs: number of LSTM cells

n_layers: depth of neural network

base_lr: Initial learning rate

ss: Learning rate reduction step size

acc_n: Validation set perplexity at epoch n. Note that acc_0 refers to the perplexity of a randomly initialized model.

6. AlexNet - ImageNet (alexnet_imagenet.csv)

We trained the AlexNet model on the original ImageNet (ILSVRC12) dataset. To compensate for our limited computation resources, we randomly sampled 10% of dataset, trained each configuration for 500 epochs, and only varied initial learning rate and learning rate reduction step size across models.

lr: Initial learning rate

ss: Learning rate reduction step size

acc_n: Validation set accuracy at epoch n. Note that acc_0 refers to the accuracy of a randomly initialized model.

About

Repository for "Accelerating Neural Architecture Search using Performance Prediction" (ICLR Workshop 2018)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published