Refine for_range #8152

chengduoZH · 2018-02-05T11:31:11Z

Don't merge. Need to further discuss and analyze.

qingqing01 · 2018-02-05T13:48:24Z

paddle/platform/for_range.h

+    if (block_size < 1024) {
+      int size = 1;
+      while (size < block_size) size <<= 1;
+      block_size = size;


The warp size is 32 in Nvidia GPU, if limit_ is less than 1024, make the block size be divided by 32 by the following calculation:

if (block_size < 1024) { block_size = ((blokc_size + 31) >> 5) << 5; }

The following function from line 44 to line 48 also needs to be updated.

template <typename Function> __global__ static void ForRangeElemwiseOpGridIsOne(Function func) { size_t idx = static_cast<size_t>(threadIdx.x); func(idx); }

I am not sure this optimization is necessary.

If the number of thread cannot be divided by wrap size, there will be inactive threads. However, even we make the number of thread can be divided by wrap size, there will be inactive threads, too. Because all threads run the same code, there will be some thread not to fulfil the if statement and wait for other threads.

@qingqing01 If so, ForRangeElemwiseOpGridIsOne should be removed.

@reyoung It seems reasonable, but I have not seen any document directing uses to set block_size in this way.
I don't do benchmark, so I don't tell you which is better.

… feature/refine_for_range

reyoung · 2018-02-12T04:56:57Z

I do not think this PR is useful, considering that it is not saving any computation and SMs in CUDA.

refine for_range

7356141

chengduoZH requested review from reyoung and QiJune February 5, 2018 11:35

qingqing01 reviewed Feb 5, 2018

View reviewed changes

chengduoZH and others added 2 commits February 6, 2018 11:10

follow comments

6cd874e

Merge branch 'develop' of http://github.com/paddlepaddle/paddle into…

76f1480

… feature/refine_for_range

chengduoZH closed this Jul 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refine for_range #8152

Refine for_range #8152

chengduoZH commented Feb 5, 2018 •

edited

Loading

qingqing01 Feb 5, 2018

reyoung Feb 6, 2018 •

edited

Loading

chengduoZH Feb 6, 2018

chengduoZH Feb 6, 2018

reyoung commented Feb 12, 2018 •

edited

Loading

Refine for_range #8152

Refine for_range #8152

Conversation

chengduoZH commented Feb 5, 2018 • edited Loading

qingqing01 Feb 5, 2018

Choose a reason for hiding this comment

reyoung Feb 6, 2018 • edited Loading

Choose a reason for hiding this comment

chengduoZH Feb 6, 2018

Choose a reason for hiding this comment

chengduoZH Feb 6, 2018

Choose a reason for hiding this comment

reyoung commented Feb 12, 2018 • edited Loading

chengduoZH commented Feb 5, 2018 •

edited

Loading

reyoung Feb 6, 2018 •

edited

Loading

reyoung commented Feb 12, 2018 •

edited

Loading