-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memory optimization for dynamic RNN #8041
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is same with you. This PR will save a huge memory for while_grad, so we will merge it early.
Please fix the issues later.
@@ -205,6 +208,8 @@ class WhileGradOp : public framework::OperatorBase { | |||
sum_op->Run(cur_scope, dev_place); | |||
cur_scope.Rename(new_inside_name, inside_grad_name); | |||
} | |||
dev_ctx.Wait(); | |||
const_cast<framework::Scope &>(scope).DeleteScope(&cur_scope); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch!
This fix will solve the headache OOM.
paddle/framework/scope.h
Outdated
@@ -65,6 +65,8 @@ class Scope { | |||
/// Drop all kids scopes belonged to this scope. | |||
void DropKids(); | |||
|
|||
void EraseVars(std::vector<std::string>& var_names); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since our every variable will be released after we destruct the scope, In my humble view, this interface should never be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@@ -205,6 +208,8 @@ class WhileGradOp : public framework::OperatorBase { | |||
sum_op->Run(cur_scope, dev_place); | |||
cur_scope.Rename(new_inside_name, inside_grad_name); | |||
} | |||
dev_ctx.Wait(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the cost of adding this dev_ctx.Wait();
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not make detailed test yet. But if we do not delete the step scope, we can not training a larger RNN model because of OutOfMemory.
Have tested in machine translation demo.
In the first batch training, memory reduced from 5728446208 to 1115982080, saves 80.5% memory.
It seems that delete_var operator has no contribution for the result. I will remove it.