Optimize the rowwise add function. #7047

qingqing01 · 2017-12-26T14:08:04Z

Mainly for the broadcast in Eigen. The time changes after optimization are as follows:

Experiments Env:
- config: 3 stacked LSTM network, the hidden size is 64
- 2 epoc
Total time of 2 epoc:
- CPU: 345.54137s -> 304.96511s .
- GPU: 89.72162s vs 89.22058s. This optimization does not change the execution time on GPU.

chengduoZH · 2017-12-27T01:48:09Z

paddle/operators/math/math_function.cu

+template <typename T>
+__global__ void RowwiseAddKernel(const T* a, const T* b, T* c, int64_t height,
+                                 int64_t width) {
+  int64_t num = height * width;


num can be passed in as a parameter.

Done. Thank you!

chengduoZH · 2017-12-27T02:01:29Z

paddle/operators/math/math_function.cu

+  for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < num;
+       i += blockDim.x * gridDim.x) {
+    int h = i / width;
+    int w = i % width;


The integer modulo (%) and division operations are expensive in GPU hardware.
The division seems can be replaced by the multiplication. And modulo (%) can be replaced by subtraction and multiplication.

Done. Thank you!

chengduoZH · 2017-12-27T06:58:09Z

paddle/operators/math/math_function.cu

+__global__ void RowwiseAddKernel(const T* a, const T* b, T* c, int64_t width,
+                                 int64_t num) {
+  T tmp = 1.0 / width;
+  for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < num;


It would be better to change the type of num to int. Otherwise, there is a comparison of int data and int64_t data.

Done. Thanks!

chengduoZH

LGTM

qingqing01 added 2 commits December 26, 2017 05:46

Optimize the rowwise add function.

32d881b

Resume CPU implenmentation.

41372de

chengduoZH reviewed Dec 27, 2017

View reviewed changes

qingqing01 force-pushed the rowwise_add branch from 01bc012 to 5c94725 Compare December 27, 2017 03:07

chengduoZH reviewed Dec 27, 2017

View reviewed changes

Update the CUDA kernel.

1936738

qingqing01 force-pushed the rowwise_add branch from 58efe9b to 1936738 Compare December 27, 2017 09:33

chengduoZH approved these changes Dec 27, 2017

View reviewed changes

qingqing01 merged commit 95da78a into PaddlePaddle:develop Dec 27, 2017

qingqing01 deleted the rowwise_add branch November 14, 2019 05:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize the rowwise add function. #7047

Optimize the rowwise add function. #7047

qingqing01 commented Dec 26, 2017

chengduoZH Dec 27, 2017

qingqing01 Dec 27, 2017

chengduoZH Dec 27, 2017 •

edited

Loading

qingqing01 Dec 27, 2017

chengduoZH Dec 27, 2017

qingqing01 Dec 27, 2017

chengduoZH left a comment

Optimize the rowwise add function. #7047

Optimize the rowwise add function. #7047

Conversation

qingqing01 commented Dec 26, 2017

chengduoZH Dec 27, 2017

Choose a reason for hiding this comment

qingqing01 Dec 27, 2017

Choose a reason for hiding this comment

chengduoZH Dec 27, 2017 • edited Loading

Choose a reason for hiding this comment

qingqing01 Dec 27, 2017

Choose a reason for hiding this comment

chengduoZH Dec 27, 2017

Choose a reason for hiding this comment

qingqing01 Dec 27, 2017

Choose a reason for hiding this comment

chengduoZH left a comment

Choose a reason for hiding this comment

chengduoZH Dec 27, 2017 •

edited

Loading