Adam op #4403

tonyyang-svail · 2017-09-26T23:03:45Z

Adam Gradient Descent algorithm.

moment1_out = beta1 * moment1 + (1 − beta1) * grad
moment2_out = beta2 * moment2 + (1 − beta2) * grad * grad
moment1_hat = moment1_out / (1 - beta1^t)
moment2_hat = moment2_out / (1 - beta2^t)
param_out = param - learning_rate * moment1_hat / (sqrt(moment2_hat) + epsilon)

moment1_out = beta1 * moment1 + (1 − beta1) * grad moment2_out = beta2 * moment2 + (1 − beta2) * grad * grad moment1_hat = moment1_out / (1 - beta1^t) moment2_hat = moment2_out / (1 - beta2^t) param_out = param - learning_rate * moment1_hat / (sqrt(moment2_hat) + epsilon)

tonyyang-svail · 2017-09-26T23:26:54Z

Passed its own unit test.

jacquesqiao · 2017-09-28T21:57:08Z

paddle/operators/adam_op.cc

+  using framework::OperatorWithKernel::OperatorWithKernel;
+
+ protected:
+  void InferShape(const framework::InferShapeContext &ctx) const override {


need to change to new InferShapeContextBase

abhinavarora · 2017-09-29T18:30:19Z

paddle/operators/adam_op.cc

+
+    PADDLE_ENFORCE_EQ(ctx.Input<Tensor>("param")->dims(),
+                      ctx.Input<Tensor>("grad")->dims(),
+                      "Two input of Adam Op's dimension must be same.");


I think we should have better message here than a generic (Two input of Adam Op's dimension must be same). It would be good to have something like the dimension of the gradient and the moments should be the same.

abhinavarora · 2017-09-29T22:33:23Z

paddle/operators/adam_op.h

+
+    float beta1_to_t = std::pow(beta1, t);
+    float beta2_to_t = std::pow(beta2, t);
+    auto m1_hat = m1_o / (1 - beta1_to_t);


We can increase the efficiency of this computation by changing the order of the computation. Please refer to the modified computation just before the section 2.1 of the Adam Paper (https://arxiv.org/abs/1412.6980)

abhinavarora · 2017-09-29T22:38:55Z

paddle/operators/adam_op.cc

+    AddAttr<float>("beta1", "exponential decay for the first moment");
+    AddAttr<float>("beta2", "exponential decay for the second moment");
+    AddComment(R"DOC(
+


I think we should also cite the Adam Paper here.

abhinavarora · 2017-09-30T00:33:04Z

paddle/operators/adam_op.h

+
+template <typename T, int MajorType = Eigen::RowMajor,
+          typename IndexType = Eigen::DenseIndex>
+using EigenScalar = framework::EigenScalar<T, MajorType, IndexType>;


Are we using this?

tonyyang-svail · 2017-10-02T19:56:54Z

Reminder: should learning rate be an attribute or tensor variable?

abhinavarora · 2017-10-02T23:52:02Z

I discussed with @reyoung and as per his suggestion, learning rate and time steps should be inputs and not attributes. I am right now trying to figure out if we can pass a float/int as an input instead of a tensor with shape (1,)

Yang and others added 2 commits September 26, 2017 13:34

fix moment 2

bdaefdc

tonyyang-svail self-assigned this Sep 26, 2017

tonyyang-svail changed the title ~~Adam op 2~~ Adam op Sep 27, 2017

tonyyang-svail added the OpPorting label Sep 27, 2017

jacquesqiao reviewed Sep 28, 2017

View reviewed changes

abhinavarora suggested changes Sep 29, 2017

View reviewed changes

abhinavarora reviewed Sep 29, 2017

View reviewed changes

abhinavarora reviewed Sep 30, 2017

View reviewed changes

tonyyang-svail closed this Oct 16, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adam op #4403

Adam op #4403

tonyyang-svail commented Sep 26, 2017 •

edited

Loading

tonyyang-svail commented Sep 26, 2017

jacquesqiao Sep 28, 2017

abhinavarora Sep 29, 2017

abhinavarora Sep 29, 2017

abhinavarora Sep 29, 2017

abhinavarora Sep 30, 2017

tonyyang-svail commented Oct 2, 2017

abhinavarora commented Oct 2, 2017

Adam op #4403

Adam op #4403

Conversation

tonyyang-svail commented Sep 26, 2017 • edited Loading

tonyyang-svail commented Sep 26, 2017

jacquesqiao Sep 28, 2017

Choose a reason for hiding this comment

abhinavarora Sep 29, 2017

Choose a reason for hiding this comment

abhinavarora Sep 29, 2017

Choose a reason for hiding this comment

abhinavarora Sep 29, 2017

Choose a reason for hiding this comment

abhinavarora Sep 30, 2017

Choose a reason for hiding this comment

tonyyang-svail commented Oct 2, 2017

abhinavarora commented Oct 2, 2017

tonyyang-svail commented Sep 26, 2017 •

edited

Loading