PaddlePaddle · qingqing01 · Jun 15, 2018 · May 31, 2018 · May 31, 2018 · May 31, 2018
diff --git a/fluid/image_classification/README.md b/fluid/image_classification/README.md
@@ -1,38 +1,35 @@
-The minimum PaddlePaddle version needed for the code sample in this directory is the lastest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
+# Image Classification and Model Zoo
+Image classification, which is an important field of computer vision, is to classify an image into pre-defined labels. Recently, many researchers developed different kinds of neural networks and highly improve the classification performance. This page introduces how to do image classification with PaddlePaddle, including [data preparation](#data-preparation), [training](#training-a-model), [finetuning](#finetuning), [evaluation](#evaluation) and [inference](#inference).
 
 ---
+## Table of Contents
+- [Installation](#installation)
+- [Data preparation](#data-preparation)
+- [Training a model with flexible parameters](#training-a-model)
+- [Finetuning](#finetuning)
+- [Evaluation](#evaluation)
+- [Inference](#inference)
+- [Supported models and performances](#supported-models)
 
-# SE-ResNeXt for image classification
+## Installation
 
-This model built with paddle fluid is still under active development and is not
-the final version. We welcome feedbacks.
+Running sample code in this directory requires PaddelPaddle v0.10.0 and later. If the PaddlePaddle on your device is lower than this version, please follow the instructions in [installation document](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html) and make an update.
 
-## Introduction
+## Data preparation
 
-The current code support the training of [SE-ResNeXt](https://arxiv.org/abs/1709.01507) (50/152 layers).
-
-## Data Preparation
-
-1. Download ImageNet-2012 dataset
+An example for ImageNet classification is as follows. First of all, preparation of imagenet data can be done as:
 ```
-cd data/
-mkdir -p ILSVRC2012/
-cd ILSVRC2012/
-# get training set
-wget http://www.image-net.org/challenges/LSVRC/2012/nnoupb/ILSVRC2012_img_train.tar
-# get validation set
-wget http://www.image-net.org/challenges/LSVRC/2012/nnoupb/ILSVRC2012_img_val.tar
-# prepare directory
-tar xf ILSVRC2012_img_train.tar
-tar xf ILSVRC2012_img_val.tar
-
-# unzip all classes data using unzip.sh
-sh unzip.sh
+cd data/ILSVRC2012/
+sh download_imagenet2012.sh
 ```
 
-2. Download training and validation label files from [ImageNet2012 url](https://pan.baidu.com/s/1Y6BCo0nmxsm_FsEqmx2hKQ)(password:```wx99```). Untar it into workspace ```ILSVRC2012/```. The files include
+In the shell script ```download_imagenet2012.sh```,  there are two steps to prepare data:
+
+**step-1:** Download ImageNet-2012 dataset from website. The training and validation data will be downloaded into folder "train" and "val" respectively.
 
-**train_list.txt**: training list of imagenet 2012 classification task, with each line seperated by SPACE.
+**step-2:** Download training and validation label files. There are two label files which contain train and validation image labels respectively:
+
+* *train_list.txt*: label file of imagenet-2012 training set, with each line seperated by ```SPACE```, like:
 ```
 train/n02483708/n02483708_2436.jpeg 369
 train/n03998194/n03998194_7015.jpeg 741
@@ -41,7 +38,7 @@ train/n04596742/n04596742_3032.jpeg 909
 train/n03208938/n03208938_7065.jpeg 535
 ...
 ```
-**val_list.txt**: validation list of imagenet 2012 classification task, with each line seperated by SPACE.
+* *val_list.txt*: label file of imagenet-2012 validation set, with each line seperated by ```SPACE```, like.
 ```
 val/ILSVRC2012_val_00000001.jpeg 65
 val/ILSVRC2012_val_00000002.jpeg 970
@@ -50,38 +47,111 @@ val/ILSVRC2012_val_00000004.jpeg 809
 val/ILSVRC2012_val_00000005.jpeg 516
 ...
 ```
-**synset_words.txt**: the semantic label of each class.
 
-## Training a model
+## Training a model with flexible parameters
 
-To start a training task, one can use command line as:
+After data preparation, one can start the training step by:
 
 ```
-python train.py --num_layers=50 --batch_size=8 --with_mem_opt=True --parallel_exe=False
+python train.py \
+       --model=SE_ResNeXt50_32x4d \
+       --batch_size=32 \
+       --total_images=1281167 \
+       --class_dim=1000
+       --image_shape=3,224,224 \
+       --model_save_dir=output/ \
+       --with_mem_opt=False \
+       --lr_strategy=piecewise_decay \
+       --lr=0.1
 ```
-## Finetune a model
+**parameter introduction:**
+* **model**: name model to use. Default: "SE_ResNeXt50_32x4d".
+* **num_epochs**: the number of epochs. Default: 120.
+* **batch_size**: the size of each mini-batch. Default: 256.
+* **use_gpu**: whether to use GPU or not. Default: True.
+* **total_images**: total number of images in the training set. Default: 1281167.
+* **class_dim**: the class number of the classification task. Default: 1000.
+* **image_shape**: input size of the network. Default: "3,224,224".
+* **model_save_dir**: the directory to save trained model. Default: "output".
+* **with_mem_opt**: whether to use memory optimization or not. Default: False.
+* **lr_strategy**: learning rate changing strategy. Default: "piecewise_decay".
+* **lr**: initialized learning rate. Default: 0.1.
+* **pretrained_model**: model path for pretraining. Default: None.
+* **checkpoint**: the checkpoint path to resume. Default: None.
+
+**data reader introduction:**
+
+Data reader is defined in ```reader.py```. In [training stage](#training-a-model), random crop and flipping are used, while center crop is used in [evaluation](#inference) and [inference](#inference) stages. Supported data augmentation includes:
+* rotation
+* color jitter
+* random crop
+* center crop
+* resize
+* flipping
+
+## Finetuning
+
+Finetuning is to finetune model weights in a specific task by loading pretrained weights. After initializing ```path_to_pretrain_model``` , one can finetune a model as:
 ```
-python train.py --num_layers=50 --batch_size=8 --with_mem_opt=True --parallel_exe=False --pretrained_model="pretrain/96/"
+python train.py
+       --model=SE_ResNeXt50_32x4d \
+       --pretrained_model=${path_to_pretrain_model} \
+       --batch_size=32 \
+       --total_images=1281167 \
+       --class_dim=1000 \
+       --image_shape=3,224,224 \
+       --model_save_dir=output/ \
+       --with_mem_opt=True \
+       --lr_strategy=piecewise_decay \
+       --lr=0.1
 ```
-TBD
-## Inference
+
+## Evaluation
+Evaluation is to evaluate the performance of a trained model. One can get top1/top5 accuracy by running the following command:
 ```
-python infer.py --num_layers=50 --batch_size=8 --model='model/90' --test_list=''
+python eval.py \
+       --model=SE_ResNeXt50_32x4d \
+       --batch_size=32 \
+       --class_dim=1000 \
+       --image_shape=3,224,224 \
+       --with_mem_opt=True \
+       --pretrained_model=${path_to_pretrain_model}
 ```
-TBD
 
-## Results
-
-The SE-ResNeXt-50 model is trained by starting with learning rate ```0.1``` and decaying it by ```0.1``` after each ```10``` epoches. Top-1/Top-5 Validation Accuracy on ImageNet 2012 is listed in table.
-
-|model | [original paper(Fig.5)](https://arxiv.org/abs/1709.01507) | Pytorch | Paddle fluid
-|- | :-: |:-: | -:
-|SE-ResNeXt-50 | 77.6%/- | 77.71%/93.63% | 77.42%/93.50%
+## Inference
+Inference is used to get prediction score or image features based on trained models.
+```
+python infer.py \
+       --model=SE_ResNeXt50_32x4d \
+       --batch_size=32 \
+       --class_dim=1000 \
+       --image_shape=3,224,224 \
+       --with_mem_opt=True \
+       --pretrained_model=${path_to_pretrain_model}
+```
 
+## Supported models and performances
 
+Models are trained by starting with learning rate ```0.1``` and decaying it by ```0.1``` after each ```30``` epoches, if not special introduced. Available top-1/top-5 validation accuracy on ImageNet 2012 is listed in table. Pretrained models can be downloaded by clicking related model names.
 
-## Released models
-|model | Baidu Cloud
+|model | top-1/top-5 accuracy
 |- | -:
-|SE-ResNeXt-50 | [url]()
-TBD
+|[AlexNet](http://paddle-imagenet-models.bj.bcebos.com/alexnet_model.tar) | 57.21%/79.72%
+|VGG11 | -
+|VGG13 | -
+|VGG16 | -
+|VGG19 | -
+|GoogleNet | -
+|InceptionV4 | -
+|MobileNet | -
+|[ResNet50](http://paddle-imagenet-models.bj.bcebos.com/resnet_50_model.tar) | 76.63%/93.10%
+|ResNet101 | -
+|ResNet152 | -
+|[SE_ResNeXt50_32x4d](http://paddle-imagenet-models.bj.bcebos.com/se_resnext_50_model.tar) | 78.33%/93.96%
+|SE_ResNeXt101_32x4d | -
+|SE_ResNeXt152_32x4d | -
+|DPN68 | -
+|DPN92 | -
+|DPN98 | -
+|DPN107 | -
+|DPN131 | -
diff --git a/fluid/image_classification/README_cn.md b/fluid/image_classification/README_cn.md
@@ -0,0 +1,156 @@
+
+# 图像分类以及模型库
+图像分类是计算机视觉的重要领域，它的目标是将图像分类到预定义的标签。近期，需要研究者提出很多不同种类的神经网络，并且极大的提升了分类算法的性能。本页将介绍如何使用PaddlePaddle进行图像分类，包括[数据准备](#data-preparation)、 [训练](#training-a-model)、[参数微调](#finetuning)、[模型评估](#evaluation)以及[模型推断](#inference)。
+
+---
+## 内容
+- [安装](#installation)
+- [数据准备](#data-preparation)
+- [模型训练](#training-a-model)
+- [参数微调](#finetuning)
+- [模型评估](#evaluation)
+- [模型推断](#inference)
+- [已有模型及其性能](#supported-models)
+
+## 安装
+
+在当前目录下运行样例代码需要PadddlePaddle的v0.10.0或以上的版本。如果你的运行环境中的PaddlePaddle低于此版本，请根据[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明来更新PaddlePaddle。
+
+## 数据准备
+
+下面给出了ImageNet分类任务的样例，首先，通过如下的方式进行数据的准备：
+```
+cd data/ILSVRC2012/
+sh download_imagenet2012.sh
+```
+在```download_imagenet2012.sh```脚本中，通过下面两步来准备数据：
+
+**步骤一：** 从ImageNet官网下载ImageNet-2012的图像数据。训练以及验证数据集会分别被下载到"train" 和 "val" 目录中。
+
+**步骤二：** 下载训练与验证集合对应的标签文件。下面两个文件分别包含了训练集合与验证集合中图像的标签：
+
+* *train_list.txt*: ImageNet-2012训练集合的标签文件，每一行采用"空格"分隔图像路径与标注，例如：
+```
+train/n02483708/n02483708_2436.jpeg 369
+train/n03998194/n03998194_7015.jpeg 741
+train/n04523525/n04523525_38118.jpeg 884
+train/n04596742/n04596742_3032.jpeg 909
+train/n03208938/n03208938_7065.jpeg 535
+...
+```
+* *val_list.txt*: ImageNet-2012验证集合的标签文件，每一行采用"空格"分隔图像路径与标注，例如：
+```
+val/ILSVRC2012_val_00000001.jpeg 65
+val/ILSVRC2012_val_00000002.jpeg 970
+val/ILSVRC2012_val_00000003.jpeg 230
+val/ILSVRC2012_val_00000004.jpeg 809
+val/ILSVRC2012_val_00000005.jpeg 516
+...
+```
+
+## 模型训练
+
+数据准备完毕后，可以通过如下的方式启动训练：
+```
+python train.py \
+       --model=SE_ResNeXt50_32x4d \
+       --batch_size=32 \
+       --total_images=1281167 \
+       --class_dim=1000
+       --image_shape=3,224,224 \
+       --model_save_dir=output/ \
+       --with_mem_opt=False \
+       --lr_strategy=piecewise_decay \
+       --lr=0.1
+```
+**参数说明：**
+* **model**: name model to use. Default: "SE_ResNeXt50_32x4d".
+* **num_epochs**: the number of epochs. Default: 120.
+* **batch_size**: the size of each mini-batch. Default: 256.
+* **use_gpu**: whether to use GPU or not. Default: True.
+* **total_images**: total number of images in the training set. Default: 1281167.
+* **class_dim**: the class number of the classification task. Default: 1000.
+* **image_shape**: input size of the network. Default: "3,224,224".
+* **model_save_dir**: the directory to save trained model. Default: "output".
+* **with_mem_opt**: whether to use memory optimization or not. Default: False.
+* **lr_strategy**: learning rate changing strategy. Default: "piecewise_decay".
+* **lr**: initialized learning rate. Default: 0.1.
+* **pretrained_model**: model path for pretraining. Default: None.
+* **checkpoint**: the checkpoint path to resume. Default: None.
+
+**数据读取器说明：**
+
+数据读取器定义在```reader.py```中。在[训练阶段](#training-a-model), 默认采用的增广方式是随机裁剪与水平翻转, 而在[评估](#inference)与[推断](#inference)阶段用的默认方式是中心裁剪。当前支持的数据增广方式有：
+* 旋转
+* 颜色抖动
+* 随机裁剪
+* 中心裁剪
+* 长宽调整
+* 水平翻转
+
+## 参数微调
+
+参数微调是指在特定任务上微调已训练模型的参数。通过初始化```path_to_pretrain_model```，微调一个模型可以采用如下的命令：
+```
+python train.py
+       --model=SE_ResNeXt50_32x4d \
+       --pretrained_model=${path_to_pretrain_model} \
+       --batch_size=32 \
+       --total_images=1281167 \
+       --class_dim=1000 \
+       --image_shape=3,224,224 \
+       --model_save_dir=output/ \
+       --with_mem_opt=True \
+       --lr_strategy=piecewise_decay \
+       --lr=0.1
+```
+
+## 模型评估
+模型评估是指对训练完毕的模型评估各类性能指标。运行如下的命令，可以获得一个模型top-1/top-5精度:
+```
+python eval.py \
+       --model=SE_ResNeXt50_32x4d \
+       --batch_size=32 \
+       --class_dim=1000 \
+       --image_shape=3,224,224 \
+       --with_mem_opt=True \
+       --pretrained_model=${path_to_pretrain_model}
+```
+
+## 模型推断
+模型推断可以获取一个模型的预测分数或者图像的特征：
+```
+python infer.py \
+       --model=SE_ResNeXt50_32x4d \
+       --batch_size=32 \
+       --class_dim=1000 \
+       --image_shape=3,224,224 \
+       --with_mem_opt=True \
+       --pretrained_model=${path_to_pretrain_model}
+```
+
+## 已有模型及其性能
+
+表格中列出了在"models"目录下支持的神经网络种类，并且给出了已完成训练的模型在ImageNet-2012验证集合上的top-1/top-5精度。预训练模型可以通过点击相应模型的名称进行下载。
+
+|model | top-1/top-5 accuracy
+|- | -:
+|[AlexNet](http://paddle-imagenet-models.bj.bcebos.com/alexnet_model.tar) | 57.21%/79.72%
+|VGG11 | -
+|VGG13 | -
+|VGG16 | -
+|VGG19 | -
+|GoogleNet | -
+|InceptionV4 | -
+|MobileNet | -
+|[ResNet50](http://paddle-imagenet-models.bj.bcebos.com/resnet_50_model.tar) | 76.63%/93.10%
+|ResNet101 | -
+|ResNet152 | -
+|[SE_ResNeXt50_32x4d](http://paddle-imagenet-models.bj.bcebos.com/se_resnext_50_model.tar) | 78.33%/93.96%
+|SE_ResNeXt101_32x4d | -
+|SE_ResNeXt152_32x4d | -
+|DPN68 | -
+|DPN92 | -
+|DPN98 | -
+|DPN107 | -
+|DPN131 | -
diff --git a/fluid/image_classification/__init__.py b/fluid/image_classification/__init__.py