Accelerate AddTakeGrad + Support Sorting #153

sxjscience · 2016-07-14T14:56:23Z

Add range operation to MShadow, which is similar to python's range
Accelerate the original AddTakeGrad using Torch's kernel, the backward pass of embedding layer will be much faster. Slow speed of the AddTakeGradKernel in embedding layer apache/mxnet#2612. I will make a PR to the MXNet side once it's merged.
Add two sort functions: SortByKey and VectorizedSort. The GPU side is implemented using Thrust. Also, sorting for half_t is currently not implemented. May add later.

piiswrong · 2016-07-14T20:01:36Z

mshadow/cuda/tensor_gpu-inl.cuh

@@ -6,6 +6,11 @@
 */
 #ifndef MSHADOW_CUDA_TENSOR_GPU_INL_CUH_
 #define MSHADOW_CUDA_TENSOR_GPU_INL_CUH_
+#include <thrust/device_ptr.h>


Is thrust installed with cuda by default? Does this complicate compiling and linking? Do we need to update config and cmake?

Yes, thrust is included in cuda toolkit. Also, since it's a template library, there is no need to install. http://docs.nvidia.com/cuda/thrust/#axzz4EPrET26A . (I find that thrust has been added to CUDA toolkit long ago, maybe since CUDA 5 -- http://developer.download.nvidia.com/compute/cuda/5_0/rel/docs/CUDA_Toolkit_Release_Notes_And_Errata.txt).

Revise type of RangeExp Add repeat option to range Add new backward kernel for embedding + Add sort functions. Fix format Fix header Fix Lint + Doc Turn off the sorting support for CUDA < 7.0

sxjscience · 2016-07-15T15:17:21Z

@piiswrong I choose to use LOG(FATAL) finally.

sxjscience · 2016-07-16T06:27:38Z

@piiswrong Is it OK to merge now?

piiswrong reviewed Jul 14, 2016
View reviewed changes

Add range operator

32ffa76

Revise type of RangeExp Add repeat option to range Add new backward kernel for embedding + Add sort functions. Fix format Fix header Fix Lint + Doc Turn off the sorting support for CUDA < 7.0

sxjscience force-pushed the acc_addgrad_add_sort branch from 9b12a5c to 32ffa76 Compare July 15, 2016 15:10

sxjscience merged commit 867be36 into dmlc:master Jul 16, 2016

sxjscience mentioned this pull request Nov 7, 2019

[Numpy] Fix collect_params().zero_grad() in gluon numpy interface apache/mxnet#16716

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accelerate AddTakeGrad + Support Sorting #153

Accelerate AddTakeGrad + Support Sorting #153

sxjscience commented Jul 14, 2016

piiswrong Jul 14, 2016

sxjscience Jul 14, 2016

sxjscience commented Jul 15, 2016

sxjscience commented Jul 16, 2016

Accelerate AddTakeGrad + Support Sorting #153

Accelerate AddTakeGrad + Support Sorting #153

Conversation

sxjscience commented Jul 14, 2016

piiswrong Jul 14, 2016

Choose a reason for hiding this comment

sxjscience Jul 14, 2016

Choose a reason for hiding this comment

sxjscience commented Jul 15, 2016

sxjscience commented Jul 16, 2016