-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
merge with lasted develop branch. Optimizer lib2 #2386
Conversation
paddle/optimizer/CMakeLists.txt
Outdated
add_library(optimizer STATIC ${OPITMIZER_SRCS}) | ||
add_dependencies(optimizer gen_proto_cpp) | ||
|
||
add_simple_unittest(Tensor_test) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use all low case for filenames (tensor_test.cpp
instead of Tensor_test.cpp
). Some filesystem is not case sensitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix done
paddle/optimizer/Tensor.h
Outdated
return data_[idx]; | ||
} | ||
// TODO: replace with tensorshape | ||
size_t size() const { return this->width_; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since height is already a member variable, I think you implemented a 2-d tensor, so here should be return this->width_ * this->height;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be. fix done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be. fixed done.
paddle/optimizer/Tensor.h
Outdated
}; | ||
|
||
// TODO(zhihong): design problem of dynamic datatype, need to fix it | ||
typedef TensorT<real> Tensor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not know whether real
means float
or double
. Doing a Google search does not answer my question. Maybe for clarity, we need to change all real
in this PR to float
? (since currently we are working with 32 bit floating point parameters.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
real type is a macro type in the whole project. https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/utils/Common.h#L30
which is determined when the project is compiled. It is not a good choice since our type is passed by caller's data type, which is determined at runtime. Need to be fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me explain this part more clear.
In the last PR, I just implement Tensor with template class, but it is the wrong way. Because template class will be lead to some specific types in compile period( explain the reason in a meeting will be better).
In fact, tensor should represent some sort of consistent memory, and support dynamic type in runtime
e.g. https://github.com/caffe2/caffe2/blob/master/caffe2/core/tensor.h#L643
just like an enhanced any
type in boost library, we can merge this PR, and enrich tensor library in the future. At this moment, we only support compile period datatype here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that's fine. Can you use float
instead of real
? Because parameter is supposed to be a general data store, should not be dependent on how real
is defined on Paddle.
After this PR we need to work on another PR to add tensors of required data types.
MATCH_ENUM_TYPE(int32_t, PADDLE_ELEMENT_TYPE_INT32);
MATCH_ENUM_TYPE(uint32_t, PADDLE_ELEMENT_TYPE_UINT32);
MATCH_ENUM_TYPE(int64_t, PADDLE_ELEMENT_TYPE_INT64);
MATCH_ENUM_TYPE(uint64_t, PADDLE_ELEMENT_TYPE_UINT64);
// only below is implemented, we need to implement other types in a follow up PR.
MATCH_ENUM_TYPE(float, PADDLE_ELEMENT_TYPE_FLOAT32);
MATCH_ENUM_TYPE(double, PADDLE_ELEMENT_TYPE_FLOAT64);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, fix done.
paddle/optimizer/Tensor_test.cpp
Outdated
using namespace paddle::optimizer; | ||
|
||
TEST(Tensor, indexer) { | ||
real* ptr = new real[3]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to release the ptr
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed done.
namespace paddle { | ||
namespace optimizer { | ||
|
||
void AdadeltaOptimizer::set_weight(Tensor* p) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the content of p
is no used in function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix done
accum_gradient = new Tensor(gptr, size); | ||
} | ||
|
||
void AdagradOptimizer::update(const Tensor* gradient) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to follow Google C++ code style for C++ function names: https://google.github.io/styleguide/cppguide.html#Function_Names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix done.
proto/OptimizerConfig.proto
Outdated
required string lr_policy = 11; | ||
optional ConstLr const_lr = 12; | ||
optional LinearLr linear_lr = 15; | ||
optional uint64 num_sample_passed = 13 [default = 0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this is part of the optimizer config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be removed, fix done.
Tensor *parameter_; | ||
|
||
// learning rate policy | ||
BaseLr *lr_policy; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should follow Google C++ style for class member variable naming: https://google.github.io/styleguide/cppguide.html#Variable_Comments
proto/OptimizerConfig.proto
Outdated
// lr_policy , type : string | ||
// ConstLr = 0; | ||
// LinearLr = 1; | ||
required string lr_policy = 11; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe enum rather than string is more appropriate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
proto/OptimizerConfig.proto
Outdated
// Adadelta = 2; | ||
// Adagrad = 3; | ||
// Adam = 4; | ||
required string optimizer_name = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe enum is better than string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rewrite it. done.
paddle/optimizer/Tensor.h
Outdated
@@ -0,0 +1,49 @@ | |||
#ifndef PADDLE_OPTIMIZER_TENSOR_H_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PaddlePaddle use #pragma once
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we will change that and follow the google c++ style.
https://google.github.io/styleguide/cppguide.html#The__define_Guard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a formal discussion, we decided to use #pragma once
(https://github.com/PaddlePaddle/cpp-primer-digest). So we can first change the digest then change our code style.
paddle/optimizer/serialization.h
Outdated
@@ -0,0 +1,36 @@ | |||
#ifndef PADDLE_OPTIMIZER_SERIALIZARION_H |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
感觉一个pr里边是不是不要放太多东西比较好?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed. But since this is almost a finished PR, add simple serialization feature for saving checkpoint. Which will be developed into a new PR alone.
namespace paddle { | ||
namespace optimizer { | ||
|
||
void AdadeltaOptimizer::set_weight(Tensor* p) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need Google style function names. Please replace all C++ function names with CamelCase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://google.github.io/styleguide/cppguide.html#Function_Names
Accessors and mutators (get and set functions) may be named like variables. These often correspond to actual member variables, but this is not required. For example, int count() and void set_count(int count).
since this function modify the parameter member of the optimizer, in my mind, it is an accessor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, sorry!
void AdadeltaOptimizer::set_weight(Tensor* p) { | ||
parameter_ = p; | ||
size_t size = p->size(); | ||
accum_gradient_ = new Tensor(size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we free previous accum_gradient_, accum_delta_, update_delta_ if not NULL? Same for other set_weight
functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my mind, an optimizer instance is mapping to a parameter, we never reuse the instance, even though restarting from a checkpoint. In such a situation, we should not implement with pointer guard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, the contract should be API, not an oral or documentation contract like "we never reuse the instance". An oral contract does not prevent client user from reusing the instance. If we have a oral contract, people could easily forget. If we have a documentation contract, it could be easily get out of date, or many people will not look at the document.
A good way to prevent user to reuse the instance is to through API: remove set_weight
, and put weight initialization into constructor.
If we provide the API, we need to make sure it's correct in all use case, not just how we "intend" to use it, our intention could easily change over time. I would suggest we just remove the set_weight
API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true in real life. Thanks a lot! fix it.
paddle/optimizer/lr_policy.h
Outdated
} | ||
|
||
protected: | ||
double learning_rate; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the learning_rate
in LinearLr is private, but here is protected? Consider change to private.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix done.
const paddle_element_type data_type, | ||
const void* grad_buffer, | ||
int num_bytes) { | ||
// TOOD(zhihong): datatype not work. need to add the runtime datatype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! let's fix it in a follow up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed.
paddle/optimizer/optimizer.cc
Outdated
|
||
int paddle_optimizer_get_state(paddle_optimizer* o, const char* state) { | ||
state = o->impl->SerializeState(); | ||
return PADDLE_SUCCESS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to return the length of state here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed. by the way, change the get_weights interface too.
config.linear_lr().lr_decay_a(), | ||
config.linear_lr().lr_decay_b()); | ||
// default | ||
return nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGDOptimizer并没有检查lr_policy_是不是nullptr。其实如果这里return new ConstLr(0.001);
就没有这个问题了。
另外,需要打印warning log,说找不到learning rate policy,用了default learning rate policy。不然如果用户以为自己设了learning rate policy,其实没有设,就完全没法debug了。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍,fix done.
lr); | ||
} | ||
// default | ||
return new SGDOptimizer(config.sgd().momentum(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
代码运行到这里,config.optimizer() != OptimizerConfig::SGD,所以config.sgd().momentum(), config.sgd().decay(), config.sgd().nesterov()也是没用的值吧,似乎
return new SGDOptimizer(0, 0, false);
更合理些?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
proto2支持default。因此default value都写在protobuf里了。fix done, 之后都会改到optimizer constructor里
config.adadelta().decay(), | ||
lr); | ||
} | ||
// default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please print warning log when user have not select any optimizer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix done.
virtual void set_weight(Tensor *parameter); | ||
|
||
protected: | ||
OptimizerConfig config_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix done.
return parameter_->get_buffer(); | ||
} | ||
|
||
void ParameterOptimizer::set_weight(Tensor *p) { parameter_ = p; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to free previous parameter_ if not NULL.
Btw, removing this function and put in constructor simplifies things, child class no longer need to implement this. More code = potentially more bugs. I am fine either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix done. 将C_API的接口合并成一个了。
paddle/optimizer/Tensor_test.cpp
Outdated
@@ -0,0 +1,20 @@ | |||
#include <iostream> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should add Copyright
message to these new files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CopyRights
declaration can be added when we merge these modules together. I noticed that go module part all need to add that header section.
paddle/optimizer/sgd_optimizer.h
Outdated
@@ -0,0 +1,35 @@ | |||
#ifndef PADDLE_SGD_OPTIMIZER_H_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#pragma once
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix done.
proto/OptimizerConfig.proto
Outdated
// match old training state with format parser | ||
required string version = 100; | ||
repeated TensorProto data = 1; | ||
repeated double hyperparam = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should follow a uniform naming style. hyperparam
==> hyper_param
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hyperparameter
is a word, not two words, named with hyperparam for short.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get it~~
proto/OptimizerConfig.proto
Outdated
optional LinearLr linear_lr = 13; | ||
|
||
// common config of optimizer | ||
optional double clipnorm = 101; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clip_norm
and clip_value
is better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be some comment about the usage of these two member is better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, change the name style.
@@ -0,0 +1,113 @@ | |||
syntax = "proto2"; | |||
|
|||
option optimize_for = LITE_RUNTIME; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
其实不是太明白option optimize_for = LITE_RUNTIME
的用途,能解释下不,包括LITE_RUNTIME
这个命名的来历。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
protobuf提供了option这个选项,可以作为优化来用, lite_runtime就是轻量级版本,include的头文件不同,相当于换了一组实现,序列化这个lite版本的接口多了几个。see here for detail. https://developers.google.com/protocol-buffers/docs/proto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
明白了,原来是protobuf的特性,多谢解释~~
paddle/optimizer/Tensor.h
Outdated
T* data_; | ||
}; | ||
|
||
// TODO(zhihong): design problem of dynamic datatype, need to fix it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that when porting "majel" to PaddlePaddle, we already included boost/variant.hpp for the "single value multiple type" container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, 👍. Either we can wait for their majel port job finish, or implement another one with typeid reflection
. It is a follow-up question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.
@@ -0,0 +1,108 @@ | |||
#include "parameter_optimizer.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we put unit-test files to a "test" folder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember that WangYi's suggestion that tests should be named with XXX_test.cpp and put it into the same folder. But I can't find the link....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM~
paddle/optimizer/optimizer.cc
Outdated
|
||
int paddle_optimizer_get_state(paddle_optimizer* o, const char** state) { | ||
*state = o->impl->SerializeState(); | ||
return strlen(*state); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strlen
does not work here. The state is a binary byte array, not a c string. It could have 0 in the body.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
额,的确是一个byte vector, 这么写是下面用了cstring,就没去累加长度
https://github.com/PaddlePaddle/Paddle/pull/2386/files#diff-14071fab2b21853a7bc47f69021ba1f1R46
fix done with accumulate length.
virtual ~ParameterOptimizer() { delete parameter_; }; | ||
|
||
static ParameterOptimizer *Create(const std::string &config_proto); | ||
virtual const char *SerializeState(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make SerializeState
and DeSerializeState
pure virtual? If derived class don't have a state, they can put a empty implementation. Otherwise derived class may forget to put an implementation but forget to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point! fix done.
paddle/optimizer/sgd_optmizer.cc
Outdated
<< "expected : " << kOptimizerVersion << "get : " << state.version(); | ||
|
||
ProtoToTensor(state.data(0), parameter_); | ||
if (state.data_size() == 2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我觉得引入proto的目的就是不用自己定义序列化和反序列化的逻辑。这里用了proto,但是还是自己定义了序列化/反序列化的逻辑,能自动生成的代码如果自己写了,会引入更多出bug可能性。
举个例子:
这里只有两个TensorProto: parameter_, momentums_,这一行(if (state.data_size() == 2) {
)说的是如果第二个TensorProto存在,把它parse成momentums。但是如果有三个TensorProto: parameter_, momentums_, foo。怎么表达有parameter_,foo,但是没有momentums_呢?
建议这样定义:
删掉OptimizerState,加入:
message SGDOptimizerState {
optional TensorProto parameter_ = 0;
optional TensorProto momentums_ = 1;
optional double momentum = 2;
optional double decay = 3;
optional bool nesterov = 4;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
写成repeated类型的原因是计划用 template trick 做addField, getField ,达到archive, unfolding的功能,避免把所有的参数都使用optional定义一次。写了一下发现实现并不容易,做了一个功能有点奇怪的版本。。。本来计划之后的PR里做这个。
这里这么写的确更简单,fix done.
proto/OptimizerConfig.proto
Outdated
|
||
message OptimizerState { | ||
// match old training state with format parser | ||
required string version = 100; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
protobuf这个序列化格式是支持向前兼容以及向后兼容的,再定义一个版本version有一些奇怪。如果定了一个version,我们以后怎么维护多个version呢?如果维护规则跟proto默认的规则一致,为何不交给protobuf维护向前兼容以及向后兼容。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我是考虑optimizer含有/不含 gradient clipping这样的功能,虽然解析格式相同,但是不能重用,因此加入version。不过其实可以去掉。
fix done.
proto/OptimizerConfig.proto
Outdated
} | ||
required DataType data_type = 1; | ||
repeated bytes content = 2; | ||
optional uint64 size = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
content
和size
貌似重复了?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, fix done.
proto/OptimizerConfig.proto
Outdated
|
||
message ConstLr { | ||
// learninRate Policy | ||
required double learning_rate = 1 [default = 1.0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
proto用不用required有很多争论,proto3也去掉了required,咱们这个use case应该都行。这里是些background:https://capnproto.org/faq.html#how-do-i-make-a-field-required-like-in-protocol-buffers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
很棒的材料,学习了! 这么看还是去掉比较好。
fix done
paddle/optimizer/Tensor.h
Outdated
TensorT(size_t size) : height_(1), width_(size) { data_ = new T[size]; } | ||
TensorT(T* data, size_t size) : height_(1), width_(size), data_(data) {} | ||
TensorT(T* data, size_t h, size_t w) : height_(h), width_(w), data_(data_) {} | ||
TensorT(const TensorT& t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you explain to me what does ":" do here? Sorry I am not too familiar, and don't know what's the keyword to search for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is initializer in c++, which is the idiomatic way in c++ initializes parameter.
please check here for detail. http://en.cppreference.com/w/cpp/language/direct_initialization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
括号后面的":",这个是c++里的初始化手段,和构造函数还不是同一个概念。初始化列表和构造函数的关系类比python的__new__和__init__的关系。初始化列表会在构造函数前完成(就是花括号里的东西)。
中文叫 初始化列表,英文叫 constructor initializer list。
1、初始化列表在任何函数执行之前完成
2、初始化列表中的参数赋值顺序是由成员声明顺序决定
并且对于非POD类型的成员具有限制:
https://stackoverflow.com/questions/5816218/difference-between-initializer-and-default-initializer-list-in-c
https://stackoverflow.com/questions/9903248/initializing-fields-in-constructor-initializer-list-vs-constructor-body
一般推荐非静态成员都使用该方法初始化
paddle/optimizer/Tensor.h
Outdated
TensorT(T* data, size_t size) : height_(1), width_(size), data_(data) {} | ||
TensorT(T* data, size_t h, size_t w) : height_(h), width_(w), data_(data_) {} | ||
TensorT(const TensorT& t) | ||
: TensorT(1, t.size(), 0, t.get_buffer(), false, false) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess here is copy constructor, and what here is doing is that it created a new tensor, copying from the old tensor. And they shared the same buffer.
Seems when the two tensors gets destroyed, they will try to destroy the same buffer twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 fix done.
state.set_momentum(rho_); | ||
state.set_decay(decay_); | ||
// can be used when memory alignment to system | ||
*state_len += CalStateSize(parameter_, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why +=
but not =
? Also, the comment above could be more descriptive :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it can be =
, I thought that we may add other options, +=
is a more general way to represent a accumulation relationships.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, that's a good point. However, if I were the user and see this function signature const char* AdadeltaOptimizer::SerializeState(int* state_len)
, I would think state_len
means the length of state returned by this function. Not including other length.
Besides, people may use it like this:
int state_len;
foo->SerializeState(&state_len);
The result is likely to be wrong, since state_len
is not initialized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍!that's right,it is a potential bug here. Thanks!
paddle/optimizer/serialization.h
Outdated
} | ||
} | ||
|
||
static void TensorToProto(const Tensor& tensor, TensorProto* proto) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add unit test for TensorToProto
and ProtoToTensor
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will add unit test, to be fixed.
proto/OptimizerConfig.proto
Outdated
repeated bytes content = 2; | ||
} | ||
|
||
message OptimizerState { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we create one specific OptimizerState for each Optimizer? Like AdamOptimizerState, ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the same answer with above one. It is easy to implement and understand. But I think we have a more proper way to do that.
return state.SerializeAsString().c_str(); | ||
} | ||
|
||
void AdadeltaOptimizer::DeSerializeState(const std::string& str) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DeSerializeState -> DeserializeState, deserialize is a single word.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
..... my poor English. fix done.
OptimizerState state; | ||
state.ParseFromString(str); | ||
lr_policy_->set(state.learning_rate()); | ||
num_sample_passed_ = state.num_sample_passed(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is missing epsilon_
, rho_
, decay_
.
Maybe to avoid potential error like this, we could do (just a suggestion):
class AdadeltaOptimizer {
private:
AdadeltaOptimizerState state_;
}
AdadeltaOptimizer::Update(...) {
// directly use state_.learningRate
}
// Clear ProtoTensors when not used to save memory.
AdadeltaOptimizer::ClearStateTensors() {
state_.parameter().Clear();
state_.accum_gradient().Clear();
state_.accum_delta().Clear();
state_.update_delta().Clear();
}
AdadeltaOptimizer::SerializeState() {
TensorToProto(*parameter_, state_.mutable_parameter());
TensorToProto(*accum_gradient_, state_.mutable_accum_gradient());
TensorToProto(*accum_delta_, state_.mutable_accum_delta());
TensorToProto(*update_delta_, state_.mutable_update_delta());
auto r = state.SerializeAsString().c_str();
ClearStateTensors();
return r;
}
AdadeltaOptimizer::DeserializeState() {
state_.ParseFromString(str);
ProtoToTensor(state_.parameter(), parameter_);
ProtoToTensor(state_.accum_gradient(), accum_gradient_);
ProtoToTensor(state_.accum_delta(), accum_delta_);
ProtoToTensor(state_.update_delta(), update_delta_);
ClearStateTensors();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
firstly, epsilon_
, rho_
, decay_
is not missing by careless. Consider that we have to call create_with_config
function signature with config, which contains these three parameters. And these ones is constant varible during training process. Do we really need to save them since we already have to save the config proto file?
If we want the training state self-contained, these constant hyperparameters should be there.
secondly, In that format, obviously it is easy to reading. But need a pair of config/state
for every optimizer, which lead to copy/paste config config and Serialize/Deserialize code many times, we should find another elegant and idiomatic way to do that. The idea is the same way goes with https://github.com/caffe2/caffe2/blob/master/caffe2/core/qtensor_serialization.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
I see, they are constants initialized by constructor, and constructor arguments are already saved by OptimizerConfig. My bad. Sorry!
-
which lead to copy/paste config config and Serialize/Deserialize code many times
I understand the config proto could have duplicates (e.g., probably every config proto have a learning rate field.). I would argue that having one config proto per optimizer makes people understand what a specific optimizer's argument easily. Please take a look at this example, we are currently using a single config proto
LayerConfig
for all layers. My opinion is it's too hard for anyone to read, and understand what parameter each layer needs.You mentioned "Serialize/Deserialize code" need to be copy / pasted many times. I think with the approach of having one config for all optimizers, we still need to write the Serialize/Deserialize code for each class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.
the config proto e.g. SGDConfig
only contains constant hyperparameter, the state proto only contains the others. Combine them together will create the optimizer with a state. Separate different optimizer state into independently for reading. I thought this would be better than adding a state member into each optimizer class.
fix done.
config.mutable_const_lr()->set_learning_rate(0.1); | ||
|
||
ParameterOptimizer* opt = | ||
ParameterOptimizer::Create(config.SerializeAsString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test case does not compile, need to be updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix done.
|
||
const char* AdadeltaOptimizer::SerializeState(int* state_len) { | ||
AdadeltaOptimizerState state; | ||
state.set_learning_rate(lr_policy_->LearningRate(num_sample_passed_)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里貌似没有必要存learning_rate,learning_rate是policy的状态。我们可以放一个存policy状态的TODO以后再加。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's fine.
|
||
TensorToProto(*parameter_, state.mutable_parameter()); | ||
TensorToProto(*accum_gradient_, state.mutable_accum_gradient()); | ||
*state_len = CalStateSize(parameter_, accum_gradient_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里似乎可以用:
auto str = state.SerializeAsString();
*state_len = str.length();
return str.c_str();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里有一点点问题,serializaAsString返回的是basic_string
auto str = state.SerializeAsString();
auto &str = state.SerializeAsString(); // wrong, cannot go this way
第一种做了一次赋值构造,是大块的buffer,是有代价的。
这么写更易读,修改了
fix done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's always choose code clarity over code performance. And only optimize performance when necessary.
void AdagradOptimizer::DeserializeState(const std::string& str) { | ||
AdagradOptimizerState state; | ||
state.ParseFromString(str); | ||
lr_policy_->set(state.learning_rate()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
以后需要让lr_policy序列化其state,这里调用lr_policy_->DeserializeState(lr_state)
。可以放个TODO在这里。
lr_policy应该是拿到序列化好的state自己去反序列化,而不是optimizer给lr_policy设值(optimizer并不知道lr_policy的state有哪些),建议删掉这一行。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree. fix done.
paddle/optimizer/adagrad_optimizer.h
Outdated
double decay) | ||
: ParameterOptimizer(parameter, lr), epsilon_(epsilon), decay_(decay) { | ||
size_t size = parameter->size(); | ||
if (accum_gradient_) delete accum_gradient_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里是constructor,不需要查accum_gradient_
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix done.
int num_bytes) { | ||
// TOOD(zhihong): datatype not work. need to add the runtime datatype | ||
auto grad_type = reinterpret_cast<const float*>(grad_buffer); | ||
Tensor* gradient = new Tensor(const_cast<float*>(grad_type), num_bytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
进来的buffer是const void*
表明这个函数不能修改这个buffer,也不能释放这个buffer。这里做了一个cast,去掉了const的属性,然后Tensor又拥有了这个buffer,Tensor会在destruct的时候释放这个buffer。不应该释放不属于自己的buffer。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里避免拷贝内存,的确有这个问题。改用shared_ptr
fix done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
明白了,不让Tensor拥有直接传的指针确实可以解决问题。
*/ | ||
ParameterOptimizer(Tensor *parameter, LrPolicy *lr) | ||
: parameter_(parameter), lr_policy_(lr), num_sample_passed_(0) {} | ||
virtual ~ParameterOptimizer() { delete parameter_; }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
貌似也需要delete lr_policy_;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix done.
paddle/optimizer/sgd_optimizer.h
Outdated
if (momentum_ != 0.0) { | ||
size_t size = parameter->size(); | ||
// TODO: fix it with align aware allocator bind to Tensor | ||
if (momentums_) delete momentums_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
contrustor里面不应当delete momentums_
,这时候momentums_
应该还没被初始化,可能是任何值。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix done.
public: | ||
SGDOptimizer(Tensor* parameter, LrPolicy* lr, double m, double d, bool n) | ||
: ParameterOptimizer(parameter, lr), | ||
momentums_(nullptr), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里可以写成momentums_(new Tensor(size)),
吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里需要判定momentum_是否为.0,然后创建buffer。其他地方改写了
int num_bytes) { | ||
// TOOD(zhihong): datatype not work. need to add the runtime datatype | ||
auto grad_type = reinterpret_cast<const float*>(grad_buffer); | ||
Tensor* gradient = new Tensor(const_cast<float*>(grad_type), num_bytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
明白了,不让Tensor拥有直接传的指针确实可以解决问题。
paddle/optimizer/Tensor.h
Outdated
TensorT(CpuMemHandlePtr handle, size_t size) | ||
: height_(1), | ||
width_(size), | ||
data_(reinterpret_cast<T*>(handle->getBuf())) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里std::make_shared<CpuMemoryHandle>(size * sizeof(float))
没人引用了,执行完buffer就立刻被释放掉了。
我想了下能不能这样写作为参考:
template <class T>
class TensorT {
public:
TensorT(size_t size)
: TensorT(std::shared_ptr<T> sp(new int[size], std::default_delete<T[]>()), size) {
}
TensorT(std::shared_ptr<T> data, size_t size)
: height_(1),
width_(size),
data_ptr_(data) {}
TensorT(T* data, size_t size) : TensorT(data, 1, size) {}
TensorT(T* data, size_t h, size_t w) : height_(h), width_(w), data_(data) {}
virtual ~TensorT() {}
T* get_buffer() {
auto data = data_;
if (data == nullptr) {
data = data_ptr.get();
}
return data;
}
T& operator[](const size_t idx) {
auto data = data_;
if (data == nullptr) {
data = data_ptr.get();
}
CHECK(idx >= 0 && idx < this->width_) << "out of index range";
return data[idx];
}
T& operator[](const size_t idx) const {
if (data == nullptr) {
data = data_ptr.get();
}
CHECK(idx >= 0 && idx < this->width_) << "out of index range";
return data[idx];
}
// TODO: replace with tensorshape
size_t size() const { return this->width_ * this->height_; }
protected:
size_t height_;
size_t width_;
std::shared_ptr<T*> data_ptr_; // managed data
T* data_; // unmanaged data
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's right, I made a mistake here, Tensor have to save the shared_ptr for extending its lifetime.
fix it, init data_ with data_ptr_.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for being so patient and modify the PR many times!
It is an indenpendent module which will be used by pserver by cgo call. It is easier to update the opitmizer library and made this pull request that resolve the conflict between develop branch and #2190