-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add design doc of inference API for fluid. #7315
Conversation
doc/design/inference.md
Outdated
@@ -0,0 +1,105 @@ | |||
# Design Doc: Inferencer | |||
|
|||
In fluid, a nueral network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the python wrapper of which is `Program`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fluid => Fluid
This mistake appears in many places in this document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/design/inference.md
Outdated
@@ -0,0 +1,105 @@ | |||
# Design Doc: Inferencer | |||
|
|||
In fluid, a nueral network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the python wrapper of which is `Program`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please be aware that Fluid doesn't represent a network at all. The protobuf message represents the program, not the network.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no network in Fluid. However, neural network
seems a phrase in deep learning. I'll think about a better expression.
doc/design/inference.md
Outdated
@@ -0,0 +1,105 @@ | |||
# Design Doc: Inferencer | |||
|
|||
In fluid, a nueral network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the python wrapper of which is `Program`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python => Python
This mistake appears in many other places in the document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/design/inference.md
Outdated
# Design Doc: Inferencer | ||
|
||
In fluid, a nueral network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the python wrapper of which is `Program`. | ||
Given a `ProgramDesc`, it can be run on any execution environment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Runtime => runtime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/design/inference.md
Outdated
Given a `ProgramDesc`, it can be run on any execution environment. | ||
In fluid, we call the execution environment `Runtime`, which includes `Place`, `Scope` and `Executor`. | ||
|
||
## Representation of the Inference Network |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, here it is the inference program, not the inference network.
doc/design/inference.md
Outdated
act='softmax') | ||
``` | ||
|
||
After training for serval passes, the parameters can be saved use `fluid.io.save_inference_model`, which will save the binary proto string of the network at the same time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pass => epoch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/design/inference.md
Outdated
act='softmax') | ||
``` | ||
|
||
After training for serval passes, the parameters can be saved use `fluid.io.save_inference_model`, which will save the binary proto string of the network at the same time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use passive voice (被动语态), which is highly suppressed in English writing, unless it is really necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do in following commits.
doc/design/inference.md
Outdated
|
||
Given a `inference_program`, it is easy to derive a `load_program` which is composed of `load_op` and is responsible for initializing all the parameter variables in `inference_program`. `load_program` will be executed once and `inference_program` will be executed as many times as you need. | ||
|
||
To summerize, a inferencer should: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a inferencer
-> an inference module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/design/inference.md
Outdated
|
||
In fluid, the execution environment is composed of three key concepts: `Place`, `Scope` and `Executor`. | ||
|
||
There are two types of Place in current framework, `CPUPlace` for CPU and `CUDAPlace` for CUDA GPU. `Scope` is independent to `Place`. Given the place, you need to define a `Executor`, and run the `Executor` among the `Scope`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a Executor -> an Executor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this. I have added my review comments. Most of it is rephrasing and typos.
We should decide on a new term for Inferencer. I have proposed Inference Engine in my review, but I am open to other suggestions.
doc/design/inference.md
Outdated
@@ -0,0 +1,105 @@ | |||
# Design Doc: Inferencer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inferencer => Let's decide on a new term for this, maybe Inference Engine ?
I am replacing this with Inference Engine in my review right now, but if people decide on something else, we can replace it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I vote for Inference Engine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking about it and will improve it in following commit.
doc/design/inference.md
Outdated
@@ -0,0 +1,105 @@ | |||
# Design Doc: Inferencer | |||
|
|||
In fluid, a nueral network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the python wrapper of which is `Program`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo "nueral" => neural
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/design/inference.md
Outdated
# Design Doc: Inferencer | ||
|
||
In fluid, a nueral network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the python wrapper of which is `Program`. | ||
Given a `ProgramDesc`, it can be run on any execution environment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on => inside
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/design/inference.md
Outdated
act='softmax') | ||
``` | ||
|
||
After training for serval passes, the parameters can be saved use `fluid.io.save_inference_model`, which will save the binary proto string of the network at the same time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type "serval" => several
"use" => using the method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/design/inference.md
Outdated
exe) | ||
``` | ||
|
||
The saved model contains everything of the inference network, including all operators and variables. Thus, the `inference_program` should be initilized by the model file or a pre-loaded buffer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"everything of the" => everything required by the
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/design/inference.md
Outdated
- Members: | ||
- the pointer of the `inference_program` | ||
- the pointer of the `load_program` | ||
- vectors of string to record the `feed_var_names` and `fetch_var_names` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to record the => to store the
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/design/inference.md
Outdated
- the pointer of the `inference_program` | ||
- the pointer of the `load_program` | ||
- vectors of string to record the `feed_var_names` and `fetch_var_names` | ||
- the pointer of current `Runtime` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/design/inference.md
Outdated
- `Run`, to run the inference based on the current runtime. | ||
- `SetRuntime`, to set the current runtime. When the runtime is set, the `load_program` will be run once to load parameters from files. | ||
- Utility interfaces: | ||
- `GetFeed/FetchVarNames`, to help users to debug. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to help users to debug => to help users debug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/design/inference.md
Outdated
- `SetRuntime`, to set the current runtime. When the runtime is set, the `load_program` will be run once to load parameters from files. | ||
- Utility interfaces: | ||
- `GetFeed/FetchVarNames`, to help users to debug. | ||
- `GetFeed/FetchVarShape`, to help users to verify the size of input and output data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to help users to verify => to help users verify
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/design/inference.md
Outdated
|
||
### Issues | ||
|
||
- Normally, all fetching variables' names should be written in the ProgramDesc and read from file. If users want to add some extra fetching variables for debug, or for some other use, they need to regenerate the file again. Do we need to allow user to append extra fetching variables? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for debug => for debugging purposes
extra fetching => extra fetch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Appart from the English corrections suggested by Kavya and Yi, the first draft of the design doc looks good and is a good starting point. Thank you for the great work.
Thanks for the PR, this is helpful. |
doc/design/inference.md
Outdated
```python | ||
fluid.io.save_inference_model( | ||
"./inference_model/", ["x"], [predict], | ||
exe) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- exe->executor?
- Can you give a source code link to
save_inference_model
(since we don't have io in http://www.paddlepaddle.org/docs/develop/documentation/zh/api/v2/fluid.html), in order to explain the meaning of each parameter?Or give a short explanation here? Thus, we can understand whyfeed_var_names
andfetch_var_names
in members come from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do in following commits.
doc/design/inference.md
Outdated
|
||
## Support of Switching Runtime | ||
|
||
In fluid, the execution environment is composed of three key concepts: `Place`, `Scope` and `Executor`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some links to Place
, Scope
and Executor
, if users want to know more details of these three key concepts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. For Scope
and Executor
, I add the link to the design doc. For Place
, there is no design doc so I add the link to the C++ header file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all of your reviews. I fixed all the typos and the English corrections.
After discussed with @qingqing01 , we may introduce the concept of ProgramBuilder
which will support the developing transpiler. I'll update the design doc as soon as possible.
Several issues list here to remind me:
- There is no network in Fluid.
- No passive voice in design doc.
- Rename
Inferencer
toInference Engine
.
doc/design/inference.md
Outdated
@@ -0,0 +1,105 @@ | |||
# Design Doc: Inferencer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking about it and will improve it in following commit.
doc/design/inference.md
Outdated
@@ -0,0 +1,105 @@ | |||
# Design Doc: Inferencer | |||
|
|||
In fluid, a nueral network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the python wrapper of which is `Program`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/design/inference.md
Outdated
@@ -0,0 +1,105 @@ | |||
# Design Doc: Inferencer | |||
|
|||
In fluid, a nueral network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the python wrapper of which is `Program`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/design/inference.md
Outdated
@@ -0,0 +1,105 @@ | |||
# Design Doc: Inferencer | |||
|
|||
In fluid, a nueral network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the python wrapper of which is `Program`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no network in Fluid. However, neural network
seems a phrase in deep learning. I'll think about a better expression.
doc/design/inference.md
Outdated
@@ -0,0 +1,105 @@ | |||
# Design Doc: Inferencer | |||
|
|||
In fluid, a nueral network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the python wrapper of which is `Program`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/design/inference.md
Outdated
- Members: | ||
- the pointer of the `inference_program` | ||
- the pointer of the `load_program` | ||
- vectors of string to record the `feed_var_names` and `fetch_var_names` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/design/inference.md
Outdated
- the pointer of the `inference_program` | ||
- the pointer of the `load_program` | ||
- vectors of string to record the `feed_var_names` and `fetch_var_names` | ||
- the pointer of current `Runtime` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/design/inference.md
Outdated
- `Run`, to run the inference based on the current runtime. | ||
- `SetRuntime`, to set the current runtime. When the runtime is set, the `load_program` will be run once to load parameters from files. | ||
- Utility interfaces: | ||
- `GetFeed/FetchVarNames`, to help users to debug. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/design/inference.md
Outdated
- `SetRuntime`, to set the current runtime. When the runtime is set, the `load_program` will be run once to load parameters from files. | ||
- Utility interfaces: | ||
- `GetFeed/FetchVarNames`, to help users to debug. | ||
- `GetFeed/FetchVarShape`, to help users to verify the size of input and output data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/design/inference.md
Outdated
|
||
### Issues | ||
|
||
- Normally, all fetching variables' names should be written in the ProgramDesc and read from file. If users want to add some extra fetching variables for debug, or for some other use, they need to regenerate the file again. Do we need to allow user to append extra fetching variables? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
…amBuilder, and rename Inferencer to InferenceEngine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very much for adding more details. This is helpful, I have 2 questions as of now. Will probably have more tomorrow :)
doc/design/inference.md
Outdated
- It is possible to support online optimization of the inference program. | ||
We will design an inference transpiler to do offline optimization for inference, which produce an optimized inference `ProgramDesc` for a given `ProgramDesc`. However, some optimization can be done online, such as | ||
- changing the layout from `NCHW` to `NHWC` | ||
- merging the computation of batch normalization layer to the front fc layer or conv layer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Xreki : Can you explain this merging of computation for batch norm (may not be needed for this doc)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh nice. Thank you.
doc/design/inference.md
Outdated
To summarize, an inferencer module should: | ||
- be initialized from files or from buffers | ||
- be composed of two `ProgramDesc`s, namely the `inference_program` and `load_program` | ||
In the first design, `ProgramBuilder` contains all the elements memtioned above, and is instanced by protobuf message of the `main_program`. Other members `startup_program`, `feed_var_names` and `fetch_var_names` will also be derived in the constructor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this protobuf of main_program
(which will be used to instantiate the ProgramBuilder
) have feed,fetch ops added to the original program-desc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sidgoyal78 your questions are welcome. There may be some weak point in this design doc. Please remind me. And any proposal will be appreciated and helpful for me.
doc/design/inference.md
Outdated
- It is possible to support online optimization of the inference program. | ||
We will design an inference transpiler to do offline optimization for inference, which produce an optimized inference `ProgramDesc` for a given `ProgramDesc`. However, some optimization can be done online, such as | ||
- changing the layout from `NCHW` to `NHWC` | ||
- merging the computation of batch normalization layer to the front fc layer or conv layer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doc/design/inference.md
Outdated
}; | ||
``` | ||
|
||
In the first design, `ProgramBuilder` contains all the elements memtioned above, and is instanced by protobuf message of the `main_program`. Other members `startup_program`, `feed_var_names` and `fetch_var_names` will also be derived in the constructor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the main_program
should have feed_op
s and fetch_op
s, or we'll need to clone the main_program
and insert feed_op
s and fetch_op
s to the copy in Run()
, like in the Python implementation. I think it is redundant.
However, how the feed_op
s and fetch_op
s come depends on the storing format. It may be inserted in the c++ code or may be initialized from protobuf message file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks to me that this design separates inference from training -- I don't see the necessity of having startup and main programs for inference as there are for training.
Please make sure that we can write an online training program, which means a training program can also provide the inference serving at the same time.
doc/design/inference.md
Outdated
@@ -0,0 +1,178 @@ | |||
# Design Doc: InferenceEngine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InferenceEngine => Inference Engine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for revising the design doc. I have added few comments for certain parts. I might have few design questions too, will post them in a separate review, so it is isn't too cluttered.
doc/design/inference.md
Outdated
@@ -0,0 +1,178 @@ | |||
# Design Doc: InferenceEngine | |||
|
|||
The main goal of inference API is easy to use. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main goal of inference API is easy to use. => The main goal of an inference API is to make it easy to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/design/inference.md
Outdated
# Design Doc: InferenceEngine | ||
|
||
The main goal of inference API is easy to use. | ||
In Fluid, a neural network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the Python wrapper of which is [Program](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/fluid/framework.py). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
protobuf message => protobuf message called
the Python wrapper of which is => the Python wrapper for which is a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/design/inference.md
Outdated
|
||
The main goal of inference API is easy to use. | ||
In Fluid, a neural network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the Python wrapper of which is [Program](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/fluid/framework.py). | ||
Given a [inference program](#inference-program), it can run inside any execution environment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a => an
it can run inside => it can be executed inside
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/design/inference.md
Outdated
The main goal of inference API is easy to use. | ||
In Fluid, a neural network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the Python wrapper of which is [Program](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/fluid/framework.py). | ||
Given a [inference program](#inference-program), it can run inside any execution environment. | ||
In Fluid, we call the execution environment runtime, which includes [Place](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/place.h), [Scope](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/scope.md) and [Executor](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/executor.md). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
environment runtime => environment a runtime
which includes => which includes a
[Scope] => a [Scope]
[Executor] => an [Executor]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/design/inference.md
Outdated
|
||
## Inference Program | ||
|
||
A simple inference program may be defined in Python API as: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be => can be
as => as the
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/design/inference.md
Outdated
|
||
1. An `InferenceEngine` can be constructed by a `ProgramBuilder`. | ||
1. An `InferenceEngine` also holds pointer to the current `Runtime`. Users can call `SetRuntime()` to set the current runtime, and the `startup_program` will be run once to initialize parameters for this runtime. | ||
1. After setting the current runtime, users can call `Run()` to run the inference program as many times as they required. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
required => require
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/design/inference.md
Outdated
1. An `InferenceEngine` can be constructed by a `ProgramBuilder`. | ||
1. An `InferenceEngine` also holds pointer to the current `Runtime`. Users can call `SetRuntime()` to set the current runtime, and the `startup_program` will be run once to initialize parameters for this runtime. | ||
1. After setting the current runtime, users can call `Run()` to run the inference program as many times as they required. | ||
1. Data structure, [framework::Tensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/tensor.md) and [framework::LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md), are used in user codes to feed input data and fetch output data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
user codes => user implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/design/inference.md
Outdated
|
||
### Example | ||
|
||
Here is the simplest example to use `InferenceEngine` to build a inference program directly from file and run on a single CPU. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a inference => an inference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/design/inference.md
Outdated
Runtime runtime("CPU"); | ||
|
||
InferenceEngine engine(builder); | ||
// Set the runtime, in which the startup_program will be ran to initialize parameters for the runtime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ran => run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/design/inference.md
Outdated
// Set the runtime, in which the startup_program will be ran to initialize parameters for the runtime | ||
engine.SetRuntime(&runtime); | ||
|
||
// Run the main_program many times |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
many => multiple
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for @kavyasrinet to correct the English again.
doc/design/inference.md
Outdated
@@ -0,0 +1,178 @@ | |||
# Design Doc: InferenceEngine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/design/inference.md
Outdated
@@ -0,0 +1,178 @@ | |||
# Design Doc: InferenceEngine | |||
|
|||
The main goal of inference API is easy to use. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/design/inference.md
Outdated
# Design Doc: InferenceEngine | ||
|
||
The main goal of inference API is easy to use. | ||
In Fluid, a neural network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the Python wrapper of which is [Program](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/fluid/framework.py). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/design/inference.md
Outdated
|
||
The main goal of inference API is easy to use. | ||
In Fluid, a neural network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the Python wrapper of which is [Program](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/fluid/framework.py). | ||
Given a [inference program](#inference-program), it can run inside any execution environment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/design/inference.md
Outdated
The main goal of inference API is easy to use. | ||
In Fluid, a neural network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the Python wrapper of which is [Program](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/fluid/framework.py). | ||
Given a [inference program](#inference-program), it can run inside any execution environment. | ||
In Fluid, we call the execution environment runtime, which includes [Place](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/place.h), [Scope](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/scope.md) and [Executor](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/executor.md). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/design/inference.md
Outdated
|
||
1. An `InferenceEngine` can be constructed by a `ProgramBuilder`. | ||
1. An `InferenceEngine` also holds pointer to the current `Runtime`. Users can call `SetRuntime()` to set the current runtime, and the `startup_program` will be run once to initialize parameters for this runtime. | ||
1. After setting the current runtime, users can call `Run()` to run the inference program as many times as they required. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/design/inference.md
Outdated
1. An `InferenceEngine` can be constructed by a `ProgramBuilder`. | ||
1. An `InferenceEngine` also holds pointer to the current `Runtime`. Users can call `SetRuntime()` to set the current runtime, and the `startup_program` will be run once to initialize parameters for this runtime. | ||
1. After setting the current runtime, users can call `Run()` to run the inference program as many times as they required. | ||
1. Data structure, [framework::Tensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/tensor.md) and [framework::LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md), are used in user codes to feed input data and fetch output data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/design/inference.md
Outdated
|
||
### Example | ||
|
||
Here is the simplest example to use `InferenceEngine` to build a inference program directly from file and run on a single CPU. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/design/inference.md
Outdated
Runtime runtime("CPU"); | ||
|
||
InferenceEngine engine(builder); | ||
// Set the runtime, in which the startup_program will be ran to initialize parameters for the runtime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/design/inference.md
Outdated
// Set the runtime, in which the startup_program will be ran to initialize parameters for the runtime | ||
engine.SetRuntime(&runtime); | ||
|
||
// Run the main_program many times |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
For an inference system where there is no training program and the inference program is initialized from file, there needs a
I update the PR. I think the API is easy to be extended to a common C++ API which supports online training and inference in the future. |
…aining in C++ API.
ca97606
to
baf2802
Compare
…eki/Paddle into core_inference_api_design_doc
There are three ways to define an inference program. | ||
- **Case 1**, split from a training program. A training program can provide the inference serving at the same time, in which case the inference program is part of the training program, and all the parameters have been set correctly. There is no need of an extra `startup_program` for this kind of inferencing now and the need of an separate `main_program` for inference may be removed in the future which depends on the implementation of `Executor.Run()`. | ||
- **Case 2**, write an inference program directly using API. In this case, parameters are stored in files. | ||
- **Case 3**, read a pre-trained inference program from file. In this case, both the `ProgramDesc` and parameters are stored in files. We can get a complete `ProgramDesc` straightway and keeping a `main_program` and a `startup_program` make it possible to perform some online optimization (discussed [below](#introduction-of-program-builder)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this saved inference ProgramDesc exactly the same as the training ProgramDesc (for this case, we can let user specify the pruning target, feed/fetch var names on the C++ side) or is it obtained after prune
, inference_optimize
and prepend/append
feed/fetch operator to the training ProgramDesc (since we don't want to change the framework.proto to add new fields to ProgramDesc, we directly prepend/append feed fetch before saving the model)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can support both inference and training ProgramDesc
.
- If supporting inference
ProgramDesc
, then we need to prepend/appendfeed_op
andfetch_op
influid.io.save_inference_model
- If supporting training
ProgramDesc
, we can call theoperator()(std::vector<std::string>& feed_var_names, std::vector<std::string>& fetch_var_names)
to get a inference program, and users need to specify the feed var names and fetch var names.
- **Case 3**, read a pre-trained inference program from file. In this case, both the `ProgramDesc` and parameters are stored in files. We can get a complete `ProgramDesc` straightway and keeping a `main_program` and a `startup_program` make it possible to perform some online optimization (discussed [below](#introduction-of-program-builder)). | ||
|
||
In this design doc, we mainly detail the interfaces for the **Case 3**. | ||
- The protobuf message of the `main_program` is saved using `fluid.io.save_inference_model` method. Thus, it can be initilized from file or from a pre-loaded buffer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Xreki : I discussed something with @kexinzhao regarding the protobuf message, and wrote it here: https://github.com/sidgoyal78/paddle_notes/blob/master/inference.md
Can you please take a look and maybe prefer one or the other approaches described?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great. I see you post this thought in #7580 . I'll have a look.
So, @sidgoyal78 @kexinzhao I wonder if you have any idea about the design doc? In fact, I need some suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the ProgramBuilder class is necessary (maybe we can think of a better name). But this class is necessary (it is just an analogous to the Program class in Python). Same is the case with Resolver class (again a better name can be thought, maybe MetaExecutor, or something), i think it is necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussing with @kexinzhao , it seems that maybe Runtime class could be avoided, and we could just get away with Builder and Resolver classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For names: few suggestions:
ProgramBuilder -> ProgramMaker / ProgramFactory
ProgramResolver -> ProgramRunner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other: ProgramBuilder -> InferenceEngineInitializer
ProgramResolver -> InferenceEngineRunner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sidgoyal78 Thanks very much. I introduced Runtime
so that users just need to know Runtime
, no need to care Place
, Executor
and Scope
. However, we can remove Runtime
and use the core concept just like Python.
bf0823c
to
46bbd7d
Compare
46bbd7d
to
339f4ed
Compare
Fix #7314