Added Detailed RNN results (#77)

* Modified RNN benchmark document * Added minor comments * fixed broken image link
awslabs · Jun 15, 2018 · bd97766 · bd97766
1 parent 8edb8bf
commit bd97766
Show file tree

Hide file tree

Showing 3 changed files with 53 additions and 19 deletions.
diff --git a/benchmark/README.md b/benchmark/README.md
@@ -94,50 +94,64 @@ Example Usage:
 
 We used benchmark scripts from [TensorFlow Benchmark](https://github.com/tensorflow/benchmarks/tree/keras-benchmarks/scripts/keras_benchmarks) official repo, and modified slightly for our use case.
 
+Put `lstm_synthetic` as models parameter in `run_<backend-type>_backend.sh`
+
 ### Nietzsche Dataset
 
 We have used an official Keras LSTM example scripts [lstm_text_generation.py](https://github.com/keras-team/keras/blob/master/examples/lstm_text_generation.py), and modified slightly for our use case.
 
+Put `lstm_nietzsche` as models parameter in `run_<backend-type>_backend.sh`
+
 ### WikiText-2 Dataset
 
 We have used an official WikiText-2 character level Dataset from this [link](https://einstein.ai/research/the-wikitext-long-term-dependency-language-modeling-dataset).
 
 The `lstm_text_generation.py` includes a dataset that is hosted on S3 bucket from this [link](https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-raw-v1.zip) (This is a WikiText-2 raw character level data).
 
+Put `lstm_wikitext2` as models parameter in `run_<backend-type>_backend.sh`
+
 ### RNN Benchmark Results
 
-Here, we list the result on Synthetic, Nietzsche, and WikiText-2 dataset using Sequential model(LSTM) on Amazon AWS C5.18xLarge(CPU), C5.xLarge(CPU), and P3.8xLarge(1, 4 GPUs) instance with MXNet backend. Batch size is 128. For more details about the instance configuration, please refer [P3](https://aws.amazon.com/ec2/instance-types/p3/) and [C5](https://aws.amazon.com/ec2/instance-types/c5/).
+Here, we list the result on Synthetic, Nietzsche, and WikiText-2 dataset using Sequential model(LSTM) on Amazon AWS C5.18xLarge(CPU), C5.xLarge(CPU), and P3.8xLarge(1, 4 GPUs) instance with MXNet and TensorFlow backend. Batch size is 128 and set `unroll=True` in Recurrent layer. For more details about the instance configuration, please refer [P3](https://aws.amazon.com/ec2/instance-types/p3/) and [C5](https://aws.amazon.com/ec2/instance-types/c5/).
 
 For more detailed benchmark results, please refer to [RNN results.](benchmark_result/RNN_result.md)
 
 | Framework/Library | Version |
-| ----------------- | ------- |
+| :----------------- | :------- |
 | Keras             | 2.1.5   |
 | MXNet             | 1.1.0   |
+| TensorFlow        | 1.7.0   |
 | CUDA              | 9.0.176 |
 
+#### 1. Synthetic Dataset
+
+![lstm_Synthetic_128.png](benchmark_result/lstm_Synthetic_128.png)
+
+| Instance    | GPUs | MXNet Backend<br />Speed/Epoch | TensorFlow Backend<br />Speed/Epoch |
+| :----------- | :---- | :------------------------------ | :----------------------------------- |
+| C5.18xLarge | 0    | 24s 485us/step                 | 14s 284us/step                      |
+| P3.8xLarge  | 1    | 13s 261us/step                 | 12s 249us/step                      |
+| P3.8xLarge  | 4    | 12s 240us/step                 | 21s 430us/step                      |
+
+#### 2. Nietzsche Dataset
 
+![lstm_Nietzsche_128.png](benchmark_result/lstm_Nietzsche_128.png)
 
-| Instance    | GPUs | Data Set   | Speed/Epoch (Lower is better) |
-| ----------- | ---- | ---------- | ----------------------------- |
-| C5.18xLarge | 0    | Synthetic  | 24s 485us/step                |
-| C5.xLarge   | 0    | Synthetic  | 93s 2ms/step                  |
-| P3.8xLarge  | 1    | Synthetic  | 13s 261us/step                |
-| P3.8xLarge  | 4    | Synthetic  | 12s 240us/step                |
-|             |      |            |                               |
-| C5.18xLarge | 0    | Nietzsche  | 78s 389us/step                |
-| C5.xLarge   | 0    | Nietzsche  | 360s 2ms/step                 |
-| P3.8xLarge  | 1    | Nietzsche  | 52s 262us/step                |
-| P3.8xLarge  | 4    | Nietzsche  | 47s 235us/step                |
-|             |      |            |                               |
-| C5.18xLarge | 0    | WikiText-2 | 1345s 398us/step              |
-| C5.xLarge   | 0    | WikiText-2 | 6417s 2ms/step                |
-| P3.8xLarge  | 1    | WikiText-2 | 868s 257us/step               |
-| P3.8xLarge  | 4    | WikiText-2 | 775s 229us/step               |
+| Instance    | GPUs | MXNet Backend<br />Speed/Epoch | TensorFlow Backend<br />Speed/Epoch |
+| :----------- | :---- | :------------------------------ | :----------------------------------- |
+| C5.18xLarge | 0    | 78s 389us/step                 | 55s 273us/step                      |
+| P3.8xLarge  | 1    | 52s 262us/step                 | 51s 252us/step                      |
+| P3.8xLarge  | 4    | 47s 235us/step                 | 87s 435us/step                      |
 
+#### 3. WikiText-2 Dataset
 
-![rnn_mxnet_dataset](benchmark_result/rnn_mxnet_dataset.png)
+![lstm_Wikitext2_128.png](benchmark_result/lstm_Wikitext2_128.png)
 
+| Instance    | GPUs | MXNet Backend<br />Speed/Epoch | TensorFlow Backend<br />Speed/Epoch |
+| :----------- | :---- | :------------------------------ | :----------------------------------- |
+| C5.18xLarge | 0    | 1345s 398us/step               | 875s 259us/step                     |
+| P3.8xLarge  | 1    | 868s 257us/step                | 817s 242us/step                     |
+| P3.8xLarge  | 4    | 775s 229us/step                | 1468s 434us/step                    |
 ## Credits
 
 Synthetic Data scripts modified from 

diff --git a/benchmark/benchmark_result/RNN_result.md b/benchmark/benchmark_result/RNN_result.md
@@ -1,5 +1,10 @@
 # Detailed RNN Benchmark Results
 
+Below is the result of GPU memory usage while running LSTM model on Synthetic, Nietzsche, and WikiText-2 character level dataset.
+
+![MemoryConsumption.png](MemoryConsumption.png)
+
+Note: All the data for performance diagram shown below is taken from the cell having `unroll Type=True`
 ## Synthetic Dataset
 
 ### Configuration
@@ -17,6 +22,11 @@
 
 ### Results
 
+|                                                          |                                                              |
+| :------------------------------------------------------- | :----------------------------------------------------------- |
+| ![lstm_Synthetic_32.png](lstm_Synthetic_32.png)          | ![lstm_Synthetic_128.png](lstm_Synthetic_128.png)            |
+
+
 | Instance    | GPUs | Backend    | Batch size | Data Set  | Training  Method | Speed/Epoch (Lower is better) | Unroll Type | No. of samples | Memory(MiB) |
 | ----------- | ---- | ---------- | ---------- | --------- | ---------------- | ----------------------------- | ----------- | -------------- | ----------- |
 | C5.18xLarge | 0    | MXNet      | 32         | Synthetic | fit()            | 50s 1ms/step                  | TRUE        | 50000          | 0           |
@@ -65,6 +75,11 @@
 
 ### Results
 
+|                                                          |                                                              |
+| :------------------------------------------------------- | :----------------------------------------------------------- |
+| ![lstm_Nietzsche_32.png](lstm_Nietzsche_32.png)          | ![lstm_Nietzsche_128.png](lstm_Nietzsche_128.png)            |
+
+
 | Instance    | GPUs | Backend    | Batch size | Data Set  | Training  Method | Speed/Epoch (Lower is better) | Unroll Type | No. of samples | Memory(MiB) |
 | ----------- | ---- | ---------- | ---------- | --------- | ---------------- | ----------------------------- | ----------- | -------------- | ----------- |
 | C5.18xLarge | 0    | MXNet      | 32         | Nietzsche | fit()            | 226s 1ms/step                 | TRUE        | 200285         | 0           |
@@ -113,6 +128,11 @@
 
 ### Results
 
+|                                                          |                                                              |
+| :------------------------------------------------------- | :----------------------------------------------------------- |
+| ![lstm_Wikitext2_32.png](lstm_Wikitext2_32.png)          | ![lstm_Wikitext2_128.png](lstm_Wikitext2_128.png)            |
+
+
 | Instance    | GPUs | Backend    | Batch size | Data Set   | Training  Method | Speed/Epoch (Lower is better) | Unroll Type | No. of samples | Memory(MiB) |
 | ----------- | ---- | ---------- | ---------- | ---------- | ---------------- | ----------------------------- | ----------- | -------------- | ----------- |
 | C5.18xLarge | 0    | MXNet      | 32         | WikiText-2 | fit()            | 3530s 1ms/step                | TRUE        | 1562175        | 0           |

diff --git a/benchmark/benchmark_result/rnn_mxnet_dataset.png b/benchmark/benchmark_result/rnn_mxnet_dataset.png