merge with lasted develop branch. Optimizer lib2 #2386

dzhwinter · 2017-06-05T17:26:12Z

It is an indenpendent module which will be used by pserver by cgo call. It is easier to update the opitmizer library and made this pull request that resolve the conflict between develop branch and #2190

helinwang · 2017-06-05T18:16:22Z

paddle/optimizer/CMakeLists.txt

+add_library(optimizer STATIC ${OPITMIZER_SRCS})
+add_dependencies(optimizer gen_proto_cpp)
+
+add_simple_unittest(Tensor_test)


Let's use all low case for filenames (tensor_test.cpp instead of Tensor_test.cpp). Some filesystem is not case sensitive.

helinwang · 2017-06-05T18:20:01Z

paddle/optimizer/Tensor.h

+    return data_[idx];
+  }
+  // TODO: replace with tensorshape
+  size_t size() const { return this->width_; }


Since height is already a member variable, I think you implemented a 2-d tensor, so here should be return this->width_ * this->height;

it should be. fix done

it should be. fixed done.

helinwang · 2017-06-05T18:24:41Z

paddle/optimizer/Tensor.h

+};
+
+// TODO(zhihong): design problem of dynamic datatype, need to fix it
+typedef TensorT<real> Tensor;


I do not know whether real means float or double. Doing a Google search does not answer my question. Maybe for clarity, we need to change all real in this PR to float? (since currently we are working with 32 bit floating point parameters.)

real type is a macro type in the whole project. https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/utils/Common.h#L30
which is determined when the project is compiled. It is not a good choice since our type is passed by caller's data type, which is determined at runtime. Need to be fixed.

let me explain this part more clear.
In the last PR, I just implement Tensor with template class, but it is the wrong way. Because template class will be lead to some specific types in compile period( explain the reason in a meeting will be better).
In fact, tensor should represent some sort of consistent memory, and support dynamic type in runtime
e.g. https://github.com/caffe2/caffe2/blob/master/caffe2/core/tensor.h#L643
just like an enhanced any type in boost library, we can merge this PR, and enrich tensor library in the future. At this moment, we only support compile period datatype here.

Ok, that's fine. Can you use float instead of real? Because parameter is supposed to be a general data store, should not be dependent on how real is defined on Paddle.

After this PR we need to work on another PR to add tensors of required data types.

MATCH_ENUM_TYPE(int32_t, PADDLE_ELEMENT_TYPE_INT32); MATCH_ENUM_TYPE(uint32_t, PADDLE_ELEMENT_TYPE_UINT32); MATCH_ENUM_TYPE(int64_t, PADDLE_ELEMENT_TYPE_INT64); MATCH_ENUM_TYPE(uint64_t, PADDLE_ELEMENT_TYPE_UINT64); // only below is implemented, we need to implement other types in a follow up PR. MATCH_ENUM_TYPE(float, PADDLE_ELEMENT_TYPE_FLOAT32); MATCH_ENUM_TYPE(double, PADDLE_ELEMENT_TYPE_FLOAT64);

ok, fix done.

helinwang · 2017-06-05T18:30:46Z

paddle/optimizer/Tensor_test.cpp

+using namespace paddle::optimizer;
+
+TEST(Tensor, indexer) {
+  real* ptr = new real[3];


Need to release the ptr.

fixed done.

helinwang · 2017-06-05T18:32:34Z

paddle/optimizer/adadelta_optimizer.cc

+namespace paddle {
+namespace optimizer {
+
+void AdadeltaOptimizer::set_weight(Tensor* p) {


Why the content of p is no used in function?

helinwang · 2017-06-05T20:21:23Z

paddle/optimizer/adagrad_optimizer.cc

+  accum_gradient = new Tensor(gptr, size);
+}
+
+void AdagradOptimizer::update(const Tensor* gradient) {


We need to follow Google C++ code style for C++ function names: https://google.github.io/styleguide/cppguide.html#Function_Names

helinwang · 2017-06-05T20:23:47Z

proto/OptimizerConfig.proto

+  required string lr_policy = 11;
+  optional ConstLr const_lr = 12;
+  optional LinearLr linear_lr = 15;
+  optional uint64 num_sample_passed = 13 [default = 0];


Why this is part of the optimizer config?

should be removed, fix done.

helinwang · 2017-06-05T20:24:37Z

paddle/optimizer/parameter_optimizer.h

+  Tensor *parameter_;
+
+  // learning rate policy
+  BaseLr *lr_policy;


We should follow Google C++ style for class member variable naming: https://google.github.io/styleguide/cppguide.html#Variable_Comments

helinwang · 2017-06-05T20:26:43Z

proto/OptimizerConfig.proto

+  //  lr_policy , type : string
+  //  ConstLr = 0;
+  //  LinearLr = 1;
+  required string lr_policy = 11;


Maybe enum rather than string is more appropriate?

same as above

helinwang · 2017-06-05T20:28:51Z

proto/OptimizerConfig.proto

+  //  Adadelta = 2;
+  //  Adagrad = 3;
+  //  Adam = 4;
+  required string optimizer_name = 1;


Maybe enum is better than string?

rewrite it. done.

jacquesqiao · 2017-06-06T09:20:03Z

paddle/optimizer/Tensor.h

@@ -0,0 +1,49 @@
+#ifndef PADDLE_OPTIMIZER_TENSOR_H_


PaddlePaddle use #pragma once

we will change that and follow the google c++ style.
https://google.github.io/styleguide/cppguide.html#The__define_Guard

In a formal discussion, we decided to use #pragma once(https://github.com/PaddlePaddle/cpp-primer-digest). So we can first change the digest then change our code style.

jacquesqiao · 2017-06-06T15:31:46Z

paddle/optimizer/serialization.h

@@ -0,0 +1,36 @@
+#ifndef PADDLE_OPTIMIZER_SERIALIZARION_H


感觉一个pr里边是不是不要放太多东西比较好？

agreed. But since this is almost a finished PR, add simple serialization feature for saving checkpoint. Which will be developed into a new PR alone.

helinwang · 2017-06-06T22:35:08Z

paddle/optimizer/adadelta_optimizer.cc

+namespace paddle {
+namespace optimizer {
+
+void AdadeltaOptimizer::set_weight(Tensor* p) {


Need Google style function names. Please replace all C++ function names with CamelCase.

https://google.github.io/styleguide/cppguide.html#Function_Names

Accessors and mutators (get and set functions) may be named like variables. These often correspond to actual member variables, but this is not required. For example, int count() and void set_count(int count).
since this function modify the parameter member of the optimizer, in my mind, it is an accessor?

Thanks, sorry!

helinwang · 2017-06-06T22:36:40Z

paddle/optimizer/adadelta_optimizer.cc

+void AdadeltaOptimizer::set_weight(Tensor* p) {
+  parameter_ = p;
+  size_t size = p->size();
+  accum_gradient_ = new Tensor(size);


should we free previous accum_gradient_, accum_delta_, update_delta_ if not NULL? Same for other set_weight functions.

In my mind, an optimizer instance is mapping to a parameter, we never reuse the instance, even though restarting from a checkpoint. In such a situation, we should not implement with pointer guard.

In my opinion, the contract should be API, not an oral or documentation contract like "we never reuse the instance". An oral contract does not prevent client user from reusing the instance. If we have a oral contract, people could easily forget. If we have a documentation contract, it could be easily get out of date, or many people will not look at the document.
A good way to prevent user to reuse the instance is to through API: remove set_weight, and put weight initialization into constructor.
If we provide the API, we need to make sure it's correct in all use case, not just how we "intend" to use it, our intention could easily change over time. I would suggest we just remove the set_weight API.

That's true in real life. Thanks a lot! fix it.

helinwang · 2017-06-06T22:38:59Z

paddle/optimizer/lr_policy.h

+  }
+
+protected:
+  double learning_rate;


Why the learning_rate in LinearLr is private, but here is protected? Consider change to private.

helinwang · 2017-06-06T22:39:52Z

paddle/optimizer/optimizer.cc

+                            const paddle_element_type data_type,
+                            const void* grad_buffer,
+                            int num_bytes) {
+  // TOOD(zhihong): datatype not work. need to add the runtime datatype


Yes! let's fix it in a follow up PR.

helinwang · 2017-06-06T22:41:37Z

paddle/optimizer/optimizer.cc

+
+int paddle_optimizer_get_state(paddle_optimizer* o, const char* state) {
+  state = o->impl->SerializeState();
+  return PADDLE_SUCCESS;


need to return the length of state here.

fixed. by the way, change the get_weights interface too.

helinwang · 2017-06-06T22:55:26Z

paddle/optimizer/parameter_optimizer.cc

+                          config.linear_lr().lr_decay_a(),
+                          config.linear_lr().lr_decay_b());
+    // default
+    return nullptr;


SGDOptimizer并没有检查lr_policy_是不是nullptr。其实如果这里return new ConstLr(0.001);就没有这个问题了。
另外，需要打印warning log，说找不到learning rate policy，用了default learning rate policy。不然如果用户以为自己设了learning rate policy，其实没有设，就完全没法debug了。

👍，fix done.

helinwang · 2017-06-06T23:56:48Z

paddle/optimizer/parameter_optimizer.cc

+                                   lr);
+    }
+    // default
+    return new SGDOptimizer(config.sgd().momentum(),


代码运行到这里，config.optimizer() != OptimizerConfig::SGD，所以config.sgd().momentum(), config.sgd().decay(), config.sgd().nesterov()也是没用的值吧，似乎

return new SGDOptimizer(0, 0, false);

更合理些？

proto2支持default。因此default value都写在protobuf里了。fix done, 之后都会改到optimizer constructor里

helinwang · 2017-06-06T23:57:12Z

paddle/optimizer/parameter_optimizer.cc

+                                   config.adadelta().decay(),
+                                   lr);
+    }
+    // default


Please print warning log when user have not select any optimizer.

helinwang · 2017-06-07T00:00:45Z

paddle/optimizer/parameter_optimizer.h

+  virtual void set_weight(Tensor *parameter);
+
+protected:
+  OptimizerConfig config_;


Is this still used?

helinwang · 2017-06-07T00:04:54Z

paddle/optimizer/parameter_optimizer.cc

+  return parameter_->get_buffer();
+}
+
+void ParameterOptimizer::set_weight(Tensor *p) { parameter_ = p; }


Need to free previous parameter_ if not NULL.
Btw, removing this function and put in constructor simplifies things, child class no longer need to implement this. More code = potentially more bugs. I am fine either way.

fix done. 将C_API的接口合并成一个了。

jacquesqiao · 2017-06-07T05:27:55Z

paddle/optimizer/Tensor_test.cpp

@@ -0,0 +1,20 @@
+#include <iostream>


maybe we should add Copyright message to these new files?

CopyRights declaration can be added when we merge these modules together. I noticed that go module part all need to add that header section.

jacquesqiao · 2017-06-07T05:50:33Z

paddle/optimizer/sgd_optimizer.h

@@ -0,0 +1,35 @@
+#ifndef PADDLE_SGD_OPTIMIZER_H_


#pragma once

jacquesqiao · 2017-06-07T06:01:32Z

proto/OptimizerConfig.proto

+  // match old training state with format parser
+  required string version = 100;
+  repeated TensorProto data = 1;
+  repeated double hyperparam = 3;


maybe we should follow a uniform naming style. hyperparam ==> hyper_param?

hyperparameter is a word, not two words, named with hyperparam for short.

jacquesqiao · 2017-06-07T06:02:08Z

proto/OptimizerConfig.proto

+  optional LinearLr linear_lr = 13;
+
+  // common config of optimizer
+  optional double clipnorm = 101;


clip_norm and clip_value is better？

may be some comment about the usage of these two member is better

ok, change the name style.

jacquesqiao · 2017-06-07T06:04:05Z

proto/OptimizerConfig.proto

@@ -0,0 +1,113 @@
+syntax = "proto2";
+
+option optimize_for = LITE_RUNTIME;


其实不是太明白option optimize_for = LITE_RUNTIME的用途，能解释下不，包括LITE_RUNTIME这个命名的来历。

protobuf提供了option这个选项，可以作为优化来用, lite_runtime就是轻量级版本，include的头文件不同，相当于换了一组实现，序列化这个lite版本的接口多了几个。see here for detail. https://developers.google.com/protocol-buffers/docs/proto

明白了，原来是protobuf的特性，多谢解释~~

typhoonzero · 2017-06-07T06:42:25Z

paddle/optimizer/Tensor.h

+  T* data_;
+};
+
+// TODO(zhihong): design problem of dynamic datatype, need to fix it


It seems that when porting "majel" to PaddlePaddle, we already included boost/variant.hpp for the "single value multiple type" container.

Agreed, 👍. Either we can wait for their majel port job finish, or implement another one with typeid reflection. It is a follow-up question.

typhoonzero · 2017-06-07T06:51:08Z

paddle/optimizer/parameter_optimizer_test.cpp

@@ -0,0 +1,108 @@
+#include "parameter_optimizer.h"


Shall we put unit-test files to a "test" folder?

I remember that WangYi's suggestion that tests should be named with XXX_test.cpp and put it into the same folder. But I can't find the link....

jacquesqiao

LGTM~

helinwang · 2017-06-07T22:45:00Z

paddle/optimizer/optimizer.cc

+
+int paddle_optimizer_get_state(paddle_optimizer* o, const char** state) {
+  *state = o->impl->SerializeState();
+  return strlen(*state);


strlen does not work here. The state is a binary byte array, not a c string. It could have 0 in the body.

额，的确是一个byte vector, 这么写是下面用了cstring，就没去累加长度
https://github.com/PaddlePaddle/Paddle/pull/2386/files#diff-14071fab2b21853a7bc47f69021ba1f1R46
fix done with accumulate length.

helinwang · 2017-06-07T22:52:46Z

paddle/optimizer/parameter_optimizer.h

+  virtual ~ParameterOptimizer() { delete parameter_; };
+
+  static ParameterOptimizer *Create(const std::string &config_proto);
+  virtual const char *SerializeState();


Should we make SerializeState and DeSerializeState pure virtual? If derived class don't have a state, they can put a empty implementation. Otherwise derived class may forget to put an implementation but forget to.

good point! fix done.

helinwang · 2017-06-07T22:58:30Z

paddle/optimizer/sgd_optmizer.cc

+      << "expected : " << kOptimizerVersion << "get : " << state.version();
+
+  ProtoToTensor(state.data(0), parameter_);
+  if (state.data_size() == 2) {


我觉得引入proto的目的就是不用自己定义序列化和反序列化的逻辑。这里用了proto，但是还是自己定义了序列化／反序列化的逻辑，能自动生成的代码如果自己写了，会引入更多出bug可能性。
举个例子：
这里只有两个TensorProto: parameter_, momentums_，这一行(if (state.data_size() == 2) {)说的是如果第二个TensorProto存在，把它parse成momentums。但是如果有三个TensorProto: parameter_, momentums_, foo。怎么表达有parameter_，foo，但是没有momentums_呢？

建议这样定义：
删掉OptimizerState，加入：

message SGDOptimizerState { optional TensorProto parameter_ = 0; optional TensorProto momentums_ = 1; optional double momentum = 2; optional double decay = 3; optional bool nesterov = 4; }

写成repeated类型的原因是计划用 template trick 做addField, getField ，达到archive, unfolding的功能，避免把所有的参数都使用optional定义一次。写了一下发现实现并不容易，做了一个功能有点奇怪的版本。。。本来计划之后的PR里做这个。
这里这么写的确更简单，fix done.

helinwang · 2017-06-07T23:00:21Z

proto/OptimizerConfig.proto

+
+message OptimizerState {
+  // match old training state with format parser
+  required string version = 100;


protobuf这个序列化格式是支持向前兼容以及向后兼容的，再定义一个版本version有一些奇怪。如果定了一个version，我们以后怎么维护多个version呢？如果维护规则跟proto默认的规则一致，为何不交给protobuf维护向前兼容以及向后兼容。

我是考虑optimizer含有/不含 gradient clipping这样的功能，虽然解析格式相同，但是不能重用,因此加入version。不过其实可以去掉。
fix done.

helinwang · 2017-06-07T23:05:02Z

proto/OptimizerConfig.proto

+}
+  required DataType data_type = 1;
+  repeated bytes content = 2;
+  optional uint64 size = 3;


content和size貌似重复了？

yep, fix done.

helinwang · 2017-06-07T23:14:20Z

proto/OptimizerConfig.proto

+
+message ConstLr {
+  // learninRate Policy
+  required double learning_rate = 1 [default = 1.0];


proto用不用required有很多争论，proto3也去掉了required，咱们这个use case应该都行。这里是些background：https://capnproto.org/faq.html#how-do-i-make-a-field-required-like-in-protocol-buffers

很棒的材料，学习了！这么看还是去掉比较好。
fix done

helinwang · 2017-06-09T23:55:51Z

paddle/optimizer/Tensor.h

+  TensorT(size_t size) : height_(1), width_(size) { data_ = new T[size]; }
+  TensorT(T* data, size_t size) : height_(1), width_(size), data_(data) {}
+  TensorT(T* data, size_t h, size_t w) : height_(h), width_(w), data_(data_) {}
+  TensorT(const TensorT& t)


can you explain to me what does ":" do here? Sorry I am not too familiar, and don't know what's the keyword to search for.

This is initializer in c++, which is the idiomatic way in c++ initializes parameter.

please check here for detail. http://en.cppreference.com/w/cpp/language/direct_initialization

括号后面的":"，这个是c++里的初始化手段，和构造函数还不是同一个概念。初始化列表和构造函数的关系类比python的__new__和__init__的关系。初始化列表会在构造函数前完成(就是花括号里的东西)。
中文叫初始化列表，英文叫 constructor initializer list。
1、初始化列表在任何函数执行之前完成
2、初始化列表中的参数赋值顺序是由成员声明顺序决定
并且对于非POD类型的成员具有限制：
https://stackoverflow.com/questions/5816218/difference-between-initializer-and-default-initializer-list-in-c
https://stackoverflow.com/questions/9903248/initializing-fields-in-constructor-initializer-list-vs-constructor-body

一般推荐非静态成员都使用该方法初始化

helinwang · 2017-06-09T23:58:11Z

paddle/optimizer/Tensor.h

+  TensorT(T* data, size_t size) : height_(1), width_(size), data_(data) {}
+  TensorT(T* data, size_t h, size_t w) : height_(h), width_(w), data_(data_) {}
+  TensorT(const TensorT& t)
+      : TensorT(1, t.size(), 0, t.get_buffer(), false, false) {}


I guess here is copy constructor, and what here is doing is that it created a new tensor, copying from the old tensor. And they shared the same buffer.
Seems when the two tensors gets destroyed, they will try to destroy the same buffer twice.

👍 fix done.

helinwang · 2017-06-10T00:00:11Z

paddle/optimizer/adadelta_optimizer.cc

+  state.set_momentum(rho_);
+  state.set_decay(decay_);
+  // can be used when memory alignment to system
+  *state_len += CalStateSize(parameter_,


Why += but not =? Also, the comment above could be more descriptive :)

it can be =, I thought that we may add other options, += is a more general way to represent a accumulation relationships.

I see, that's a good point. However, if I were the user and see this function signature const char* AdadeltaOptimizer::SerializeState(int* state_len), I would think state_len means the length of state returned by this function. Not including other length.
Besides, people may use it like this:

int state_len; foo->SerializeState(&state_len);

The result is likely to be wrong, since state_len is not initialized.

👍！that's right，it is a potential bug here. Thanks!

helinwang · 2017-06-10T00:03:06Z

paddle/optimizer/serialization.h

+  }
+}
+
+static void TensorToProto(const Tensor& tensor, TensorProto* proto) {


Can you add unit test for TensorToProto and ProtoToTensor?

will add unit test, to be fixed.

helinwang · 2017-06-10T00:08:48Z

proto/OptimizerConfig.proto

+  repeated bytes content = 2;
+}
+
+message OptimizerState {


Can we create one specific OptimizerState for each Optimizer? Like AdamOptimizerState, ...

the same answer with above one. It is easy to implement and understand. But I think we have a more proper way to do that.

helinwang · 2017-06-10T00:10:01Z

paddle/optimizer/adadelta_optimizer.cc

+  return state.SerializeAsString().c_str();
+}
+
+void AdadeltaOptimizer::DeSerializeState(const std::string& str) {


DeSerializeState -> DeserializeState, deserialize is a single word.

..... my poor English. fix done.

helinwang · 2017-06-10T00:16:55Z

paddle/optimizer/adadelta_optimizer.cc

+  OptimizerState state;
+  state.ParseFromString(str);
+  lr_policy_->set(state.learning_rate());
+  num_sample_passed_ = state.num_sample_passed();


Here is missing epsilon_, rho_, decay_.
Maybe to avoid potential error like this, we could do (just a suggestion):

class AdadeltaOptimizer { private: AdadeltaOptimizerState state_; } AdadeltaOptimizer::Update(...) { // directly use state_.learningRate } // Clear ProtoTensors when not used to save memory. AdadeltaOptimizer::ClearStateTensors() { state_.parameter().Clear(); state_.accum_gradient().Clear(); state_.accum_delta().Clear(); state_.update_delta().Clear(); } AdadeltaOptimizer::SerializeState() { TensorToProto(*parameter_, state_.mutable_parameter()); TensorToProto(*accum_gradient_, state_.mutable_accum_gradient()); TensorToProto(*accum_delta_, state_.mutable_accum_delta()); TensorToProto(*update_delta_, state_.mutable_update_delta()); auto r = state.SerializeAsString().c_str(); ClearStateTensors(); return r; } AdadeltaOptimizer::DeserializeState() { state_.ParseFromString(str); ProtoToTensor(state_.parameter(), parameter_); ProtoToTensor(state_.accum_gradient(), accum_gradient_); ProtoToTensor(state_.accum_delta(), accum_delta_); ProtoToTensor(state_.update_delta(), update_delta_); ClearStateTensors(); }

firstly, epsilon_, rho_, decay_ is not missing by careless. Consider that we have to call create_with_config function signature with config, which contains these three parameters. And these ones is constant varible during training process. Do we really need to save them since we already have to save the config proto file?
If we want the training state self-contained, these constant hyperparameters should be there.

secondly, In that format, obviously it is easy to reading. But need a pair of config/state for every optimizer, which lead to copy/paste config config and Serialize/Deserialize code many times, we should find another elegant and idiomatic way to do that. The idea is the same way goes with https://github.com/caffe2/caffe2/blob/master/caffe2/core/qtensor_serialization.h

I see, they are constants initialized by constructor, and constructor arguments are already saved by OptimizerConfig. My bad. Sorry!

which lead to copy/paste config config and Serialize/Deserialize code many times

I understand the config proto could have duplicates (e.g., probably every config proto have a learning rate field.). I would argue that having one config proto per optimizer makes people understand what a specific optimizer's argument easily. Please take a look at this example, we are currently using a single config proto LayerConfig for all layers. My opinion is it's too hard for anyone to read, and understand what parameter each layer needs.

You mentioned "Serialize/Deserialize code" need to be copy / pasted many times. I think with the approach of having one config for all optimizers, we still need to write the Serialize/Deserialize code for each class?

I see.
the config proto e.g. SGDConfig only contains constant hyperparameter, the state proto only contains the others. Combine them together will create the optimizer with a state. Separate different optimizer state into independently for reading. I thought this would be better than adding a state member into each optimizer class.
fix done.

helinwang · 2017-06-10T00:19:17Z

paddle/optimizer/parameter_optimizer_test.cpp

+    config.mutable_const_lr()->set_learning_rate(0.1);
+
+    ParameterOptimizer* opt =
+        ParameterOptimizer::Create(config.SerializeAsString());


Test case does not compile, need to be updated.

helinwang · 2017-06-14T00:26:13Z

paddle/optimizer/adadelta_optimizer.cc

+
+const char* AdadeltaOptimizer::SerializeState(int* state_len) {
+  AdadeltaOptimizerState state;
+  state.set_learning_rate(lr_policy_->LearningRate(num_sample_passed_));


这里貌似没有必要存learning_rate，learning_rate是policy的状态。我们可以放一个存policy状态的TODO以后再加。

that's fine.

helinwang · 2017-06-14T00:32:12Z

paddle/optimizer/adagrad_optimizer.cc

+
+  TensorToProto(*parameter_, state.mutable_parameter());
+  TensorToProto(*accum_gradient_, state.mutable_accum_gradient());
+  *state_len = CalStateSize(parameter_, accum_gradient_);


这里似乎可以用：

auto str = state.SerializeAsString(); *state_len = str.length(); return str.c_str();

这里有一点点问题，serializaAsString返回的是basic_string

auto str = state.SerializeAsString(); auto &str = state.SerializeAsString(); // wrong, cannot go this way

第一种做了一次赋值构造，是大块的buffer，是有代价的。

这么写更易读，修改了
fix done.

Let's always choose code clarity over code performance. And only optimize performance when necessary.

helinwang · 2017-06-14T00:35:30Z

paddle/optimizer/adagrad_optimizer.cc

+void AdagradOptimizer::DeserializeState(const std::string& str) {
+  AdagradOptimizerState state;
+  state.ParseFromString(str);
+  lr_policy_->set(state.learning_rate());


以后需要让lr_policy序列化其state，这里调用lr_policy_->DeserializeState(lr_state)。可以放个TODO在这里。
lr_policy应该是拿到序列化好的state自己去反序列化，而不是optimizer给lr_policy设值（optimizer并不知道lr_policy的state有哪些），建议删掉这一行。

agree. fix done.

helinwang · 2017-06-14T00:49:25Z

paddle/optimizer/adagrad_optimizer.h

+                   double decay)
+      : ParameterOptimizer(parameter, lr), epsilon_(epsilon), decay_(decay) {
+    size_t size = parameter->size();
+    if (accum_gradient_) delete accum_gradient_;


这里是constructor，不需要查accum_gradient_

helinwang · 2017-06-14T00:57:33Z

paddle/optimizer/optimizer.cc

+                            int num_bytes) {
+  // TOOD(zhihong): datatype not work. need to add the runtime datatype
+  auto grad_type = reinterpret_cast<const float*>(grad_buffer);
+  Tensor* gradient = new Tensor(const_cast<float*>(grad_type), num_bytes);


进来的buffer是const void*表明这个函数不能修改这个buffer，也不能释放这个buffer。这里做了一个cast，去掉了const的属性，然后Tensor又拥有了这个buffer，Tensor会在destruct的时候释放这个buffer。不应该释放不属于自己的buffer。

这里避免拷贝内存，的确有这个问题。改用shared_ptr
fix done

明白了，不让Tensor拥有直接传的指针确实可以解决问题。

helinwang · 2017-06-14T00:58:51Z

paddle/optimizer/parameter_optimizer.h

+   */
+  ParameterOptimizer(Tensor *parameter, LrPolicy *lr)
+      : parameter_(parameter), lr_policy_(lr), num_sample_passed_(0) {}
+  virtual ~ParameterOptimizer() { delete parameter_; };


貌似也需要delete lr_policy_;

helinwang · 2017-06-14T01:00:08Z

paddle/optimizer/sgd_optimizer.h

+    if (momentum_ != 0.0) {
+      size_t size = parameter->size();
+      // TODO: fix it with align aware allocator bind to Tensor
+      if (momentums_) delete momentums_;


contrustor里面不应当delete momentums_，这时候momentums_应该还没被初始化，可能是任何值。

helinwang · 2017-06-14T01:01:33Z

paddle/optimizer/sgd_optimizer.h

+public:
+  SGDOptimizer(Tensor* parameter, LrPolicy* lr, double m, double d, bool n)
+      : ParameterOptimizer(parameter, lr),
+        momentums_(nullptr),


这里可以写成momentums_(new Tensor(size)),吗？

这里需要判定momentum_是否为.0，然后创建buffer。其他地方改写了

helinwang · 2017-06-15T01:14:47Z

paddle/optimizer/optimizer.cc

+                            int num_bytes) {
+  // TOOD(zhihong): datatype not work. need to add the runtime datatype
+  auto grad_type = reinterpret_cast<const float*>(grad_buffer);
+  Tensor* gradient = new Tensor(const_cast<float*>(grad_type), num_bytes);


明白了，不让Tensor拥有直接传的指针确实可以解决问题。

helinwang · 2017-06-15T01:39:37Z

paddle/optimizer/Tensor.h

+  TensorT(CpuMemHandlePtr handle, size_t size)
+      : height_(1),
+        width_(size),
+        data_(reinterpret_cast<T*>(handle->getBuf())) {}


这里std::make_shared<CpuMemoryHandle>(size * sizeof(float))没人引用了，执行完buffer就立刻被释放掉了。
我想了下能不能这样写作为参考：

template <class T> class TensorT { public: TensorT(size_t size) : TensorT(std::shared_ptr<T> sp(new int[size], std::default_delete<T[]>()), size) { } TensorT(std::shared_ptr<T> data, size_t size) : height_(1), width_(size), data_ptr_(data) {} TensorT(T* data, size_t size) : TensorT(data, 1, size) {} TensorT(T* data, size_t h, size_t w) : height_(h), width_(w), data_(data) {} virtual ~TensorT() {} T* get_buffer() { auto data = data_; if (data == nullptr) { data = data_ptr.get(); } return data; } T& operator[](const size_t idx) { auto data = data_; if (data == nullptr) { data = data_ptr.get(); } CHECK(idx >= 0 && idx < this->width_) << "out of index range"; return data[idx]; } T& operator[](const size_t idx) const { if (data == nullptr) { data = data_ptr.get(); } CHECK(idx >= 0 && idx < this->width_) << "out of index range"; return data[idx]; } // TODO: replace with tensorshape size_t size() const { return this->width_ * this->height_; } protected: size_t height_; size_t width_; std::shared_ptr<T*> data_ptr_; // managed data T* data_; // unmanaged data };

that's right, I made a mistake here, Tensor have to save the shared_ptr for extending its lifetime.
fix it, init data_ with data_ptr_.

helinwang

LGTM, thanks for being so patient and modify the PR many times!

dzhwinter added 9 commits June 4, 2017 22:38

"failed to resolve conflict. apply to HEAD"

62cd5c7

"move cmake scripts too"

3158efe

"optimizer remove init create with proto"

5b8a0c5

"remove get config proto"

8610ba1

"modify update interface"

b4aa0ec

"add vector alias to make name clear"

26e9c4e

"rename test file name"

5f9cd8c

"remove useless test file"

b9d024e

"change size_t type to avoid warning"

5ab958b

helinwang mentioned this pull request Jun 5, 2017

Optimizer library #2190

Closed

helinwang requested changes Jun 5, 2017

View reviewed changes

dzhwinter added 2 commits June 6, 2017 11:58

"format name with google style"

fd8c510

"add checkpoint interface: set state, get state"

3b1294a

jacquesqiao reviewed Jun 6, 2017

View reviewed changes

"remove comments"

81cad37

jacquesqiao reviewed Jun 6, 2017

View reviewed changes

dzhwinter added 2 commits June 6, 2017 23:36

"change header guard to pragma"

beb2697

"update macro and fix some part"

5a1e678

helinwang requested changes Jun 7, 2017

View reviewed changes

"polish code style and update based review comment"

bc26df7

jacquesqiao reviewed Jun 7, 2017

View reviewed changes

"update marco"

b9cb0f2

jacquesqiao reviewed Jun 7, 2017

View reviewed changes

"add comments"

6cbbc2e

typhoonzero reviewed Jun 7, 2017

View reviewed changes

"fix comment"

f5ff283

jacquesqiao approved these changes Jun 7, 2017

View reviewed changes

helinwang requested changes Jun 7, 2017

View reviewed changes

helinwang reviewed Jun 7, 2017

View reviewed changes

typhoonzero mentioned this pull request Jun 8, 2017

new parameterupdater use paddle pserver cclient of go #2413

Merged

dzhwinter added 3 commits June 10, 2017 01:50

"update with comment"

e456796

"update serialization part"

33b4dee

"update interface"

0fc4201

helinwang reviewed Jun 10, 2017

View reviewed changes

dzhwinter added 2 commits June 11, 2017 23:53

"serialization modify"

b7e68e0

"seperate serialization proto state"

b72e8aa

helinwang reviewed Jun 14, 2017

View reviewed changes

dzhwinter added 3 commits June 14, 2017 10:20

"fix lr_policy serialization"

1814fc2

"remove unused tensor line"

e148bc1

"fix double release tensor buffer error."

a46f3fc

helinwang reviewed Jun 15, 2017

View reviewed changes

"fix tensor shared_ptr"

df5bc78

helinwang approved these changes Jun 15, 2017

View reviewed changes

dzhwinter and others added 8 commits June 19, 2017 11:44

"modify config name"

65d9e33

"protobuf required to optional"

ec65fa8

Merge branch 'develop' into optimizer_lib2

baef96e

rename Tensor.h

99849cf

"ci formatter"

72b6b26

formatter

03884f0

"formatter in docker"

a166e52

formatter in docker

33ddc89

dzhwinter merged commit 9b755c0 into PaddlePaddle:develop Jun 20, 2017

dzhwinter mentioned this pull request Sep 30, 2017

Question about using the message-lite in Paddle #4542

Closed

heavengate pushed a commit to heavengate/Paddle that referenced this pull request Aug 16, 2021

fix training in cascade_rcnn (PaddlePaddle#2386)

a2f57be

		@@ -0,0 +1,113 @@
		syntax = "proto2";

		option optimize_for = LITE_RUNTIME;

merge with lasted develop branch. Optimizer lib2 #2386

merge with lasted develop branch. Optimizer lib2 #2386

Conversation

dzhwinter commented Jun 5, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dzhwinter Jun 6, 2017 • edited Loading

Choose a reason for hiding this comment

helinwang Jun 6, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jacquesqiao Jun 6, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

helinwang Jun 7, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jacquesqiao Jun 7, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jacquesqiao left a comment

dzhwinter Jun 6, 2017 •

edited

Loading

helinwang Jun 6, 2017 •

edited

Loading

jacquesqiao Jun 6, 2017 •

edited

Loading

helinwang Jun 7, 2017 •

edited

Loading

jacquesqiao Jun 7, 2017 •

edited

Loading