-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a lookup table op and a CUDA helper. #3620
Conversation
1. finish lookup table CPU and GPU kernel 2. Add some cuda helper 3. Add some math funtor
paddle/framework/CMakeLists.txt
Outdated
@@ -55,5 +55,6 @@ cc_library(paddle_pybind SHARED | |||
recurrent_op | |||
uniform_random_op | |||
gaussian_random_op | |||
lookup_table_op |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lookup_table_op and other ops defined here in framework/CMakeLists.txt
are targets defined in operators/CMakeLists.txt
. This implies a cyclic dependency --
framework
and operators
depend on each other. Can we remove this cyclic dependency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the solution is to move target pybind into a third package, say, paddle/pybind
, in parallel with paddle/framework
and paddle/operators
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will do it.
namespace functor { | ||
|
||
template <typename T> | ||
struct Set<platform::CPUPlace, T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name Set
here is a little confusing -- it could refer to a data container and the assignment operation. If it is the latter case here, as a class name should be a noun, we might want Setter
instead of Set
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set is not needed. Use Eigen::Vector
's operator =
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@reyoung Done.
@@ -0,0 +1,5 @@ | |||
if(WITH_GPU) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that we are creating a subdirectory and a namespace for holding a new class Set
. Do we have a plan to add a lot of more functors in the near future? If so, I think it is OK; otherwise, the creation of a subdirectory looks an overkill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we have a subdirectory of paddle/operators/math
. But the plain function is used in math
subdirectory, instead of functor (or function object). I notice TensorFlow and WarpCTC use functors to write some common math functions. And the functors are also used in Paddle's function
subdirectory. The functors have some advantages, they can have state. And it is convenient to specialize in different data types compared with plain function (discussed with @hedaoyuan). We can see many duplicated codes in operators/math/math_function.cc
and operators/math/math_function.cu
for single float and double type.
So, maybe it's better to move math/math_function
to functor/math_functor
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have some operations on tensor in our code internally. Now, the internal operations are in paddle/operators/math directory, and there are some global functions. If functor make codes cleaner, we can use functor instead of function(Just like TensorFlow does).
@@ -0,0 +1,55 @@ | |||
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cuda_helper.h => cuda.h ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUDA also has a cuda.h
header. In caffe2, the file with similar function is called common_gpu.h
, and in TensorFlow, called cuda_kernel_helper.h
.
|
||
namespace paddle { | ||
namespace platform { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this header is necessary.
Using thrust
of Cuda can easily write an element-wise operator. and thrust
is a header only library provided by cuda toolkit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this header, the atomicAdd
for float and double type is defined. The code for double type is a little complex. I think we need a common file to write some basic and common usage for CUDA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@reyoung remove set functor and remove CUDA_1D_KERNEL_LOOP.
if (alpha == static_cast<T>(0)) { | ||
memset(YData, 0, N * sizeof(T)); | ||
} else { | ||
framework::EigenVector<T, Eigen::RowMajor, Eigen::DenseIndex>::Flatten(*Y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always use Eigen.setConstant in header of operator kernel is OK.
memset
is not always faster than std::copy
or other carefully implemented setConstant
or Copy
method. So it is no need to special handle zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll modify the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please refer to https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/fill_zeros_like_op.h#L29
Both for CPU and GPU
t.device(EigenDeivce) = t.constant(T(scalar));
And Set can be a function in operators/math_function.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* add gfl model and PicoDet
Fix #3621
Add a lookup table operator, which is used as embedding in RNN network.
SparseRow
tensor. And theTableProjection
in Paddle also uses a dense matrix to represent the table by default.SparseRow
will be supported in future.Add some CUDA helper.