-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
init Inference top APIs #10549
init Inference top APIs #10549
Conversation
@@ -0,0 +1,27 @@ | |||
# Embed Paddle Inference in Your Application | |||
|
|||
Paddle inference offers the APIs in `C` and `C++` languages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里有必要分C和C++两个么?目前只是C++ api,能否先只写C++ api?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
嗯,另外加一个 c api,估计另外一个pr里
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c如果暂时不需要就先别写了
|
||
Paddle inference offers the APIs in `C` and `C++` languages. | ||
|
||
One can easily deploy a model trained by Paddle following the steps as below: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Paddle->PaddlePaddle
|
||
## Optimize the native Fluid Model | ||
|
||
The native model that get from the training phase needs to be optimized for that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我们是拿了train阶段的save_inference_model,这样会加入feed和fetch op,并做了一定的剪裁优化。如果直接拿train阶段的模型,没有feed和fetch op,就跑不了了。
这里提到的策略1,2,3,应该在save_inference_model的时候就做了。
这里是否应该只提供一些额外的优化策略,比如third-party engine, fuse operators等
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
对,这里只是解释这个工具的必要性。
const std::vector<std::vector<int>>& input_shapes, | ||
const std::vector<std::vector<int>>& output_shapes, | ||
const std::vector<std::vector<float>>& input_data, | ||
std::vector<std::vector<float>>* output_data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个接口,对NLP的已经不适用了。是否考虑接口中直接使用LoDTensor。
因为用户的数据格式千变万化,让用户自己转成LoDTensor比较合理。我们也可以给出一些转换的工具或函数,但run的接口里保持使用LoDTensor。
bool Run(const std::vector<LoDTensor>& input,
std::vector<LoDTensor>* output);
inputs和outputs不需要,feed和fetch op里面都有的。
Paddle/paddle/fluid/inference/tests/test_helper.h
Lines 93 to 96 in 4c8ff72
void TestInference(const std::string& dirname, | |
const std::vector<paddle::framework::LoDTensor*>& cpu_feeds, | |
const std::vector<paddle::framework::LoDTensor*>& cpu_fetchs, | |
const int repeat = 1, const bool is_combined = false) { |
单侧里面已经封装的比较干净了。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里还需要考虑多线程预测的情况,需要加一个const int thread_nums
的参数。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
内部没有多线程,多线程是外面的线程调预测库。
|
||
class Predictor { | ||
public: | ||
struct Attr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Attr-》Network?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不是Network,是 attribute
kAnakin, // Use Anakin for inference. | ||
kTensorRT, // Use TensorRT for inference. | ||
kAutoMixedAnakin, // Automatically mix Fluid with Anakin. | ||
kAutoMixedTensorRT, // Automatically mix Fluid with TensorRT. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- kAutoMixedAnakin和kAutoMixedTensorRT可以去掉,kAnakin应该就包括kAutoMixedAnakin
- kNone里面应该还要分CPU模式,GPU模式
- MKLDNN属于kNone还是单列?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不包括,这里 kTensorRT指的是全图用,子图那个是单独的开关kAutoMixedTensorRT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
对用户来说,子图全图概念有点复杂,选了TensorRT,就理解为用TensorRT来做优化了,至于用子图还是全图优化(而且全图是子图的一部分),应该内部实现。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
部分支持的feature现在还没有,放在这里只剩为了让业务方知道我们在做这个feature
- Memory reuse for native Fluid executor; | ||
- Translate the model storage format to some third-party engine's, so that the inference API can utilize the engine for acceleration; | ||
|
||
We have an official tool to do the optimization, call `paddle_inference_optimize --help` for more information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paddle_inference_optimize是binary还是python脚本?
比如python paddle_inference_optimize src_model_dir dst_model_dir --inference_optimize_method=2
代表使用第二种优化策略。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
binary或者脚本
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's kick off this thing. It's in contrib, just for experiment for now
@@ -0,0 +1,27 @@ | |||
# Embed Paddle Inference in Your Application | |||
|
|||
Paddle inference offers the APIs in `C` and `C++` languages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c如果暂时不需要就先别写了
With a README.md with some description/plan of how to use the APIs.