-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/fc converter #11043
Feature/fc converter #11043
Conversation
943a90c
to
5871e04
Compare
framework::OpDesc op_desc(op, nullptr, nullptr); | ||
PADDLE_ENFORCE_EQ(op_desc.Input("X").size(), 1); | ||
PADDLE_ENFORCE_EQ(op_desc.Input("Y").size(), 1); // Y is a weight | ||
PADDLE_ENFORCE_EQ(op_desc.Output("Out").size(), 1); // Y is a weight |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
61行的注释错了。
for (int h = 0; h < shape.h(); ++h) { | ||
for (int w = 0; w < shape.w(); ++w) { | ||
odata[h * ostrides.h() + w * ostrides.w()] = | ||
idata[h * ostrides.h() + w * ostrides.w()]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
从函数实现看,odata[i] = idata[i],所以还需要转么?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
从 TF 里拷贝的,试了下,很奇怪,这个函数好像必须要用。还有一个 Reorder4的函数,后面用到的时候我再仔细看下。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
后面写Reorder4的时候,可以这里再优化下:
for (int h = 0; h < shape.h(); ++h) {
for (int w = 0; w < shape.w(); ++w) {
int index = h * ostrides.h() + w * ostrides.w();
odata[index] = idata[index];
namespace inference { | ||
namespace tensorrt { | ||
|
||
template <typename T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reorder2和ReorderCKtoKC和函数功能和参数含义,请加一下注释。
PADDLE_ENFORCE_NOT_NULL(Y_v); | ||
auto* Y_t = Y_v->GetMutable<framework::LoDTensor>(); | ||
// This may trigger a CPU->GPU copy. | ||
// TODO(Superjomn) use some smarter mutable_data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Y_v从scope里读取的话,可以一开始就在gpu环境下,这里就不用拷贝了。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
更新了 comment
void DeclParamVar(const std::string& name, const nvinfer1::Dims& dims) { | ||
DeclVar(name, dims); | ||
} | ||
|
||
void DeclOutputVar(const std::string& name, const nvinfer1::Dims& dims) { | ||
DeclVar(name, dims); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DeclParamVar和DeclOutputVar是一模一样的,需要封两个函数么?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
用的时候的确需要不同语义,这里区分下 program 和 output, 不然代码里两处都是 DeclVar
,看不出区别
|
||
ASSERT_FALSE(op_desc_->OutputArgumentNames().empty()); | ||
for (const auto& output : op_desc_->OutputArgumentNames()) { | ||
std::vector<float> fluid_out; | ||
std::vector<float> trt_out(200); | ||
std::vector<float> trt_out(200, 2008.); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
请注释说下200和2008的含义。
@@ -31,8 +31,10 @@ void paddle::operators::TensorRTEngineKernel<DeviceContext, T>::Prepare( | |||
auto max_workspace = context.Attr<int>("max_workspace"); | |||
engine_.reset(new inference::tensorrt::TensorRTEngine( | |||
max_batch_, max_workspace, nullptr)); | |||
// TODO(Superjomn) parameters should be passed be analysised and passed from | |||
// outside. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
34-35行注释请更新下,有两个be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
for (int h = 0; h < shape.h(); ++h) { | ||
for (int w = 0; w < shape.w(); ++w) { | ||
odata[h * ostrides.h() + w * ostrides.w()] = | ||
idata[h * ostrides.h() + w * ostrides.w()]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
后面写Reorder4的时候,可以这里再优化下:
for (int h = 0; h < shape.h(); ++h) {
for (int w = 0; w < shape.w(); ++w) {
int index = h * ostrides.h() + w * ostrides.w();
odata[index] = idata[index];
No description provided.