-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the argsort operator #11174
Add the argsort operator #11174
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, maybe thrust::sort
can be used for GPU implementation.
https://thrust.github.io/doc/group__sorting.html#ga1099d781e06c43805be06a918f7b7499
paddle/fluid/operators/argsort_op.cc
Outdated
"Output(Indices) of ArgsortOp should not be null."); | ||
|
||
auto in_dims = ctx->GetInputDim("X"); | ||
int axis = static_cast<int>(ctx->Attrs().Get<int>("axis")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove static_cast<int>()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
paddle/fluid/operators/argsort_op.cc
Outdated
AddInput("X", "(Tensor) The input of Argsort op."); | ||
AddOutput("Out", "(Tensor) The sorted tensor of Argsort op."); | ||
AddOutput("Indices", | ||
"(Tensor) The indices of a tensor giving the sorted order."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Give the shape for Out
and Indices
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
paddle/fluid/operators/argsort_op.h
Outdated
auto* input = ctx.Input<framework::Tensor>("X"); | ||
auto* output = ctx.Output<framework::Tensor>("Out"); | ||
auto* indices = ctx.Output<framework::Tensor>("Indices"); | ||
int axis = static_cast<int>(ctx.Attr<int>("axis")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove static_cast<int>()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
paddle/fluid/operators/argsort_op.cc
Outdated
PADDLE_ENFORCE(axis >= 0 || axis == -1, | ||
"Attr(axis) %d of ArgsortOp must be nonnegative or equal to " | ||
"-1.", | ||
axis); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If axis < 0, we can re-set the axis = in_dims.size() + axis
? not limited to -1 for the negative value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
paddle/fluid/operators/argsort_op.h
Outdated
out_data[index] = in_vec[j].first; | ||
idx_data[index] = in_vec[j].second; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 40-73 can be changed to be more efficient and save memory used.
int64_t part_dims_prod = input->numel() / in_dims[axis];
int64_t step = 1;
for (int64_t i = in_dims.size()-1; i > axis; --i) step *= in_dims[i];
std::vector<int64_t> org_index_vec(in_dims.size());
std::vector<int64_t> idx_vec(in_dims.size());
idx_vec[axis] = 0;
for (int64_t i = 0; i < part_dims_prod; ++i) {
for (int64_t dim = in_dims.size() - 1; dim >= 0; --dim) {
if (dim != axis) {
idx_vec[dim] = idx % in_dims[dim];
idx /= in_dims[dim];
}
}
int64_t start_index = idx_vec[0];
for (int64_t dim = 1; dim < in_dims.size(); ++dim) {
start_index = start_index * in_dims[dim] + idx_vec[dim];
}
for (int64_t j = 0; j < in_dims.size(); ++j) {
org_index_vec[j] = start_index + j*step;
}
std::sort(
org_index_vec.begin(), org_index_vec.end(),
[in_data](int64_t idx1, int64_t idx2) {
return in_data[idx1] < in_data[idx2];
});
for (size_t j = 0; j < org_index_vec.size(); ++j) {
int64_t org_index = org_index_vec[j];
int64_t ret_index = start_index + j*step;
out_data[ret_index] = in_data[org_index];
idx_data[ret_index] = org_index;
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! It is a good idea to only sort the index, and I made the change. Please take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent!
@reyoung Yes, we can use |
paddle/fluid/operators/argsort_op.cu
Outdated
|
||
auto stream = reinterpret_cast<const platform::CUDADeviceContext&>( | ||
ctx.device_context()) | ||
.stream(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto stream = ctx.cuda_device_context().stream();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
paddle/fluid/operators/argsort_op.cu
Outdated
int64_t* med_ids) { | ||
int64_t index = threadIdx.x + blockDim.x * blockIdx.x; | ||
if (index < n) { | ||
const int max_rank = 9; // Max rank of a tensor allow in Fluid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this constant variable before line 19.
const int kMaxRank = 6;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean outside the kernel function? Then done.
class TestArgsortOp(OpTest): | ||
def setUp(self): | ||
self.init_axis() | ||
x = np.random.random((2, 3, 4, 5)).astype("float32") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This unit testing has no gradient checking. so, better to use large shape here to coverage more case, since PADDLE_CUDA_NUM_THREADS
is large.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
python/paddle/fluid/layers/tensor.py
Outdated
@@ -442,6 +443,56 @@ def argmax(x, axis=0): | |||
return out | |||
|
|||
|
|||
def argsort(input, axis=-1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Need unit testing in https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/tests/unittests/test_layers.py
-
Better add
name
.
def argsort(input, axis=-1, name=None):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All done, thx!
paddle/fluid/operators/argsort_op.cu
Outdated
|
||
auto stream = reinterpret_cast<const platform::CUDADeviceContext&>( | ||
ctx.device_context()) | ||
.stream(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
python/paddle/fluid/layers/tensor.py
Outdated
@@ -442,6 +443,56 @@ def argmax(x, axis=0): | |||
return out | |||
|
|||
|
|||
def argsort(input, axis=-1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
class TestArgsortOp(OpTest): | ||
def setUp(self): | ||
self.init_axis() | ||
x = np.random.random((2, 3, 4, 5)).astype("float32") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
paddle/fluid/operators/argsort_op.cu
Outdated
int64_t* med_ids) { | ||
int64_t index = threadIdx.x + blockDim.x * blockIdx.x; | ||
if (index < n) { | ||
const int max_rank = 9; // Max rank of a tensor allow in Fluid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean outside the kernel function? Then done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Add the argsort operator
Resolve #11399