-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add model average optimizer for fluid #9082
Add model average optimizer for fluid #9082
Conversation
1. Rename inputs and outputs 2. Add some comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The review has not been completed yet.
"accumulating sums of parameter values with the same shape as " | ||
"input(param)."); | ||
AddInput("in_num_accumulates", | ||
"Input(Tensor): The accumulating times of current window with " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tensor<int64_t>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
AverageAccumulatesOpMaker(OpProto* proto, OpAttrChecker* op_checker) | ||
: OpProtoAndCheckerMaker(proto, op_checker) { | ||
AddInput("param", | ||
"Input(Tensor or LoDTensor): The parameter to be accumulated."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Input(Tensor or LoDTensor) -> (Tensor or LoDTensor)
There is no Input before (
https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/operators/mul_op.cc#L79
The same as below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
AddInput("param", | ||
"Input(Tensor or LoDTensor): The parameter to be accumulated."); | ||
AddInput("in_sum_1", | ||
"Input(Tensor or LoDTensor): A tensor used to store the parameter " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, maybe all the inputs and outputs are Tensor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
AddComment(R"DOC( | ||
AverageAccumulates Operator. | ||
Accumulate the sum of parameter whtin sliding window. The size of sliding window is determined by 'average_window', 'max_average_window' and 'min_average_window'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to more details to show how to average.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
using EigenVector = framework::EigenVector<T, MajorType, IndexType>; | ||
|
||
template <typename DeviceContext> | ||
void getAccumulators(const framework::ExecutionContext& ctx, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getAccumulators -> GetAccumulators
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
int64_t& old_num_accumulates); | ||
|
||
template <typename DeviceContext> | ||
void setAccumulators(const framework::ExecutionContext& ctx, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
setAccumulators -> SetAccumulators
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
public: | ||
void Compute(const framework::ExecutionContext& ctx) const override { | ||
// It is used to avoid loss of precision | ||
static const int64_t kMaxNumAccumulates = 16384; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reference paper for kMaxNumAccumulates 16384
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that 16384
is an experimental value. There are no reference papers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work!!
"before this batch with shape [1]."); | ||
|
||
AddAttr<float>("average_window", | ||
"The rate of average window size relative to num_updates."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set 0. as the default value here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
AddAttr<float>("average_window", | ||
"The rate of average window size relative to num_updates."); | ||
AddAttr<int64_t>("max_average_window", "Maximum size of average window."); | ||
AddAttr<int64_t>("min_average_window", "Minimu size of average window."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set 10000L
as the default value for min_average_window
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
out_sum_2_tensor.device(place) = in_sum_2_tensor; | ||
out_sum_3_tensor.device(place) = in_sum_3_tensor; | ||
if (num_updates % kMaxNumAccumulates == 0) { | ||
out_sum_2_tensor.device(place) = in_sum_2_tensor + in_sum_1_tensor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comments before lin 87:
Move the sum to a different buffer to avoid loss of precision due to too many sums.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
if (num_accumulates >= min_average_window && | ||
num_accumulates >= std::min<int64_t>(max_average_window, | ||
num_updates * average_window)) { | ||
out_sum_3_tensor.device(place) = in_sum_1_tensor + in_sum_2_tensor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comments before line 94:
Now the average window is too long, discard the old sum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
python/paddle/fluid/optimizer.py
Outdated
self._append_average_accumulate_op(param) | ||
|
||
def _add_average_apply_op(self, block, param_grad): | ||
param = block.clone_variable(param_grad[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use clone
here? 这里clone
实现来看,Variable的名字、存储内容(Tensor)都一样,为什么需要clone呢?可以直接用原始的Variable吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Op在做InferShape的时候,需要从当前block中查找input variables, 所以需要clone_variable function clone一份variable desc放到当前blcok中,同时修改variable.block为当前block. 否则,InferShape会有Input not found错误。
""" | ||
assert isinstance(var, Variable) | ||
return self.create_var( | ||
name=var.name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我理解‘clone’的var和输入的var是两片空间,这里var的name都一样,更像是‘共享’同一个var。
|
||
AddAttr<float>("average_window", | ||
"The rate of average window size relative to num_updates."); | ||
AddAttr<int64_t>("max_average_window", "Maximum size of average window."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
改下这里的注释吧,让用户手动设置成,一个pass/epoc里总共的mini-batch数。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
python/paddle/fluid/optimizer.py
Outdated
model_average.apply() | ||
for data in test_reader(): | ||
exe.run(inference_program...) | ||
model_average.restore(exe) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可用通过with model_average.apply()
语法,隐藏model_average.restore
调用。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thx.
"shape [1]."); | ||
AddInput("in_num_updates", | ||
"Input(Tensor): The total number of batches used by trainning " | ||
"before this batch with shape [1]."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in_num_accumulates
in_old_num_accumulates
in_num_updates
这3个标量用fill_constant初始化的时候可以用fore_cpu
属性,让这些标量始终在CPU上,这样GPU计算时,就不用拷贝了。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
明白了,那就现在这样吧。觉得更好的是,支持Variable<int/float>这样的变量作为op的输入。
1. Implement 'with model_average.apply()' syntax 2. Init apply_program and restore_program in __init__ functin of ModelAverage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the feature of model average uses much memory, need to support do_average_in_cpu
in next PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please create an issue for this two problems before merge this PR.
params_grads: A list of parameter-grad variable pairs. | ||
average_window_rate: The rate of average window. | ||
min_average_window: The minimum size of average window. | ||
max_average_window: The maximum size of average window. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user document needs to refine, should tell users how to set average_window_rate
, average_window_rate
, max_average_window
, and so on.
fix #9172
And the result of some experiments was attached in #9172.