-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ctc edit distance operator #5300
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- About the op's name, I'd like to name it to
edit_distance_op
, because it is not limited to ctc. - Need to support
batch_size
, the inputs should beLoDTensor
. - Need to call path2String to remove the uncared symbols link
blank
.
"hypothesis string"); | ||
AddInput("X2", | ||
"(2-D tensor with shape [N x 1]) The indices " | ||
"for reference string."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- As I know, the inputs of this evaluator are predicted result and ground truth, please use more meaningful names.
- What is the meaning of
M
andN
? Please give more explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
"(bool, default false) Indicated whether " | ||
"normalize the Output(Out) by the length of reference " | ||
"string (X2).") | ||
.SetDefault(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In ctc, the blank
should be removed from the predicted result. In attention, the eos
and sos
should be removed. Maybe here, we need an attribute named uncare
of type std::vector<int>
, to set the elements that need to be removed.
It is not implemented in CTCErrorEvaluator.cpp, but is a real need from ocr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with that the operator needs to accept some uncared input tokens. But it would make the code dirty allowing for the CUDA kernel. How about implementing an independent op to remove the uncared tokens?
.SetDefault(false); | ||
AddOutput("Out", | ||
"(2-D tensor with shape [1 x 1]) " | ||
"The output distance of CTCEditDistance operator."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
batch_size
should be supported in this op, the output's shape should be [batch_size, 1]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} else { | ||
framework::Tensor dist_t; | ||
dist_t.Resize({m + 1, n + 1}); | ||
dist_t.mutable_data<T>(ctx.GetPlace()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to initialize the dist_t to 0, as https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/gserver/evaluators/CTCErrorEvaluator.cpp#L90
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not necessary because filling the first row and column of this matrix is enough.
|
||
auto m = x1_t->numel(); | ||
auto n = x2_t->numel(); | ||
T distance = 0.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old codes count the substitution
, deletion
, insertion
error, why not implement it in the op?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually these counts seem useless.
void InferShape(framework::InferShapeContext *ctx) const override { | ||
PADDLE_ENFORCE(ctx->HasInput("X1"), "Input(X1) shouldn't be null."); | ||
PADDLE_ENFORCE(ctx->HasInput("X2"), "Input(X2) shouldn't be null."); | ||
PADDLE_ENFORCE(ctx->HasOutput("Out"), "Output(Out) shouldn't be null."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to check the shape of inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update by following most the comments. Thanks!
void InferShape(framework::InferShapeContext *ctx) const override { | ||
PADDLE_ENFORCE(ctx->HasInput("X1"), "Input(X1) shouldn't be null."); | ||
PADDLE_ENFORCE(ctx->HasInput("X2"), "Input(X2) shouldn't be null."); | ||
PADDLE_ENFORCE(ctx->HasOutput("Out"), "Output(Out) shouldn't be null."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
"hypothesis string"); | ||
AddInput("X2", | ||
"(2-D tensor with shape [N x 1]) The indices " | ||
"for reference string."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
"(bool, default false) Indicated whether " | ||
"normalize the Output(Out) by the length of reference " | ||
"string (X2).") | ||
.SetDefault(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with that the operator needs to accept some uncared input tokens. But it would make the code dirty allowing for the CUDA kernel. How about implementing an independent op to remove the uncared tokens?
.SetDefault(false); | ||
AddOutput("Out", | ||
"(2-D tensor with shape [1 x 1]) " | ||
"The output distance of CTCEditDistance operator."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
auto m = x1_t->numel(); | ||
auto n = x2_t->numel(); | ||
T distance = 0.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually these counts seem useless.
} else { | ||
framework::Tensor dist_t; | ||
dist_t.Resize({m + 1, n + 1}); | ||
dist_t.mutable_data<T>(ctx.GetPlace()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not necessary because filling the first row and column of this matrix is enough.
paddle/operators/edit_distance_op.cc
Outdated
"The indices for hypothesis strings."); | ||
AddInput("Refs", | ||
"(2-D LoDTensor, 2nd dim. equal to 1) " | ||
"The indices for reference strings."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should tell the users the type of Hyps
and Refs
is int
.
2-D LoDTensor <int>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
paddle/operators/edit_distance_op.h
Outdated
n); | ||
distance[num] = distance[num] / n; | ||
} | ||
out[num] = distance[num]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems, distance
can be variable in T type, not a std::vector<T>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} | ||
SetOutput<T><<<1, 1, 0, stream>>>(out + num, dist, m, n, normalized); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GPU implementation may be less efficient, it may be slower than CPU implementation. The for loop in line 97 also can be paralleled. But you can not change it in this PR. We can optimize it in the future when necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. There should be a lot efficiency improvement for the batch input
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Thanks
paddle/operators/edit_distance_op.cc
Outdated
"The indices for hypothesis strings."); | ||
AddInput("Refs", | ||
"(2-D LoDTensor, 2nd dim. equal to 1) " | ||
"The indices for reference strings."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} | ||
SetOutput<T><<<1, 1, 0, stream>>>(out + num, dist, m, n, normalized); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. There should be a lot efficiency improvement for the batch input
paddle/operators/edit_distance_op.h
Outdated
n); | ||
distance[num] = distance[num] / n; | ||
} | ||
out[num] = distance[num]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Resolve #4744