-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/memory transplier #11386
Closed
dzhwinter
wants to merge
20
commits into
PaddlePaddle:develop
from
dzhwinter:feature/memory_transplier
Closed
Feature/memory transplier #11386
dzhwinter
wants to merge
20
commits into
PaddlePaddle:develop
from
dzhwinter:feature/memory_transplier
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Since you haven't replied for a long time, we have closed this issue/pr. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
A re-implement of memory transpiler. Main changes
add in-place cache strategy.
we provide the Reuse tag in OpProto, and memory transpiler will reuse the memory if the op can run in-place.
more memory saving will be triggered, not only the shape equally block.
Currently, our memory transpiler only reuses the shape equally memory block. I didn't figure out the reason why the previous implement does not fully support the bigger memory block. Obviously, if the memory block is bigger than we needed, we also can reuse it.
https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/transpiler/memory_optimization_transpiler.py#L189
https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/transpiler/memory_optimization_transpiler.py#L252
compute liveness set with a new algorithm.
In our transpiler, an important step is that we need to compute the variable liveness set. However, when the op count goes higher, for example, the se-renext 152 has 4000 ops, the previous reach fixpoint algorithm cost long time to converge. This PR use the worklist algorithm, which converges faster than the reach fix point algorithm.
The SSA-form graph optimized liveness algorithm.
data:image/s3,"s3://crabby-images/59bda/59bda2b35beb1ebdad8fdb5c9cb23a95ab0742f2" alt="image"
Given our memory strategy heavily relied on variable liveness range algorithm, so one idea is to generate more concrete variable liveness information.
https://hal.inria.fr/inria-00558509v1/document
I follow this paper, try this SSA-graph based liveness algorithm in
[DO NOT REVIEW]Feature/inplace ssa graph #11385
However, I found some issues when I implement this algorithm.
first, the more concrete SSA-form liveness set is based on the hypothesis that loops and if condition happens everywhere. Because the loops and if condition can convert to phi variable*, which dominate the analysis later. If not, it works same as the normal liveness analysis algorithm.
second, What's more, I do a forward analysis of our test program by hand(That means to compute the variable live period), and found that the result is same with the normal liveness analysis algorithm.
third, In most ssa-graph application, the ssa program is the intermediate representation of user program. It's not easy to convert a ssa-form graph back to program desc.(Because I need to convert it back after finish the memory transpile).
As a conclusion, I implement the worklist based liveness algorithm instead of the SSA-graph.