Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add polynomial_decay and piecewise_decay #8013

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
9115017
init polynomial_decay
jacquesqiao Jan 31, 2018
b591ac7
test polynomial_decay
jacquesqiao Jan 31, 2018
7d09fe6
complete polynomial_decay
jacquesqiao Jan 31, 2018
8652da8
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
jacquesqiao Jan 31, 2018
e804f06
fix conditional block op
jacquesqiao Feb 1, 2018
9ee8f77
init scalar-switch-case-op
jacquesqiao Feb 1, 2018
9ae65c4
switch op can compile
jacquesqiao Feb 2, 2018
3d5b807
complete forward switch_op
jacquesqiao Feb 2, 2018
0b4e4c9
add GetMatchCaseIndex
jacquesqiao Feb 2, 2018
7af4dda
add switch_grad_op
jacquesqiao Feb 2, 2018
5a659e8
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
jacquesqiao Feb 5, 2018
bdfb835
init switch Python API
jacquesqiao Feb 5, 2018
33fcaed
add test_switch
jacquesqiao Feb 5, 2018
83e1bc9
support set block list in python
jacquesqiao Feb 5, 2018
5fe5936
fix scope problem
jacquesqiao Feb 5, 2018
942bdcb
complete test
jacquesqiao Feb 5, 2018
9d1385b
optimize test
jacquesqiao Feb 5, 2018
410db57
optimize test
jacquesqiao Feb 5, 2018
511cb49
rm backward part
jacquesqiao Feb 5, 2018
2af2a18
clear grad op
jacquesqiao Feb 5, 2018
d0f2928
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
jacquesqiao Feb 5, 2018
e91c85d
Merge branch 'impl-scalar-switch-case-op' of ssh://github.com/jacques…
jacquesqiao Feb 5, 2018
1e6f229
polynomial_decay use switch op
jacquesqiao Feb 5, 2018
33079d9
revert conditional_block_op and reshape_op
jacquesqiao Feb 5, 2018
04e8a23
add piecewise_decay and test
jacquesqiao Feb 6, 2018
c29a1cc
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
jacquesqiao Feb 6, 2018
7d86a0c
fix piecewise_decay
jacquesqiao Feb 6, 2018
061f0b1
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
jacquesqiao Feb 6, 2018
835dc2f
try to use condition op for switch
jacquesqiao Feb 6, 2018
d3e148f
can work
jacquesqiao Feb 6, 2018
60a45f8
clean old code
jacquesqiao Feb 6, 2018
7b69b0b
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
jacquesqiao Feb 6, 2018
0217065
revert
jacquesqiao Feb 6, 2018
8f02fdf
rm switch_op.cc
jacquesqiao Feb 6, 2018
06d87f9
optimize code
jacquesqiao Feb 6, 2018
edacca8
add attr is_scalar_condition for condition_block_op
jacquesqiao Feb 6, 2018
c2d3207
fix comment
jacquesqiao Feb 6, 2018
59a814a
Merge branch 'impl-scalar-switch-case-op-with-condition-op' of ssh://…
jacquesqiao Feb 7, 2018
7fd322a
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
jacquesqiao Feb 7, 2018
31aa827
fix comment
jacquesqiao Feb 7, 2018
003ed1a
add export
jacquesqiao Feb 7, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions python/paddle/v2/fluid/layers/control_flow.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@
'array_write',
'create_array',
'less_than',
'equal',
'array_read',
'shrink_memory',
'array_length',
Expand Down Expand Up @@ -975,6 +976,36 @@ def less_than(x, y, cond=None, **ignored):
return cond


def equal(x, y, cond=None, **ignored):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Equal is a element-wise operator. And it also can be override in Python.

def __eq__(self):
    ...

Copy link
Member

@QiJune QiJune Feb 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find that our equal is not like python eq. Python eq will return a bool, but our equal will return a vector.

"""
**equal**

This layer returns the truth value of :math:`x == y` elementwise.

Args:
x(Variable): First operand of *equal*
y(Variable): Second operand of *equal*
cond(Variable|None): Optional output variable to store the result of *equal*

Returns:
Variable: The tensor variable storing the output of *equal*.

Examples:
.. code-block:: python

less = fluid.layers.equal(x=label, y=limit)
"""
helper = LayerHelper("equal", **locals())
if cond is None:
cond = helper.create_tmp_variable(dtype='bool')
cond.stop_gradient = True

helper.append_op(
type='equal', inputs={'X': [x],
'Y': [y]}, outputs={'Out': [cond]})
return cond


def array_read(array, i):
"""This function performs the operation to read the data in as an
LOD_TENSOR_ARRAY.
Expand Down
102 changes: 100 additions & 2 deletions python/paddle/v2/fluid/learning_rate_decay.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,10 @@
import layers
from framework import Variable

__all__ = ['exponential_decay', 'natural_exp_decay', 'inverse_time_decay']
__all__ = [
'exponential_decay', 'natural_exp_decay', 'inverse_time_decay',
'polynomial_decay', 'piecewise_decay'
]
"""
When training a model, it's often useful to decay the
learning rate during training process, this is called
Expand Down Expand Up @@ -101,7 +104,7 @@ def inverse_time_decay(learning_rate,
```python
if staircase:
decayed_learning_rate = learning_rate / (1 + decay_rate * floor(global_step / decay_step))
else
else:
decayed_learning_rate = learning_rate / (1 + decay_rate * global_step / decay_step)
```
Args:
Expand All @@ -123,3 +126,98 @@ def inverse_time_decay(learning_rate,
div_res = layers.floor(x=div_res)

return learning_rate / (1 + decay_rate * div_res)


def polynomial_decay(learning_rate,
global_step,
decay_steps,
end_learning_rate=0.0001,
power=1.0,
cycle=False):
"""Applies polynomial decay to the initial learning rate.

```python
if cycle:
decay_steps = decay_steps * ceil(global_step / decay_steps)
else:
global_step = min(global_step, decay_steps)
decayed_learning_rate = (learning_rate - end_learning_rate) *
(1 - global_step / decay_steps) ^ power +
end_learning_rate
```
Args:
learning_rate: A scalar float32 value or a Variable. This
will be the initial learning rate during training
global_step: A Variable that record the training step.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, where is the global_step variable created?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be create outside optimizer. like

    global_step = fluid.layers.create_global_var(shape=[1], value=0, dtype='float32', force_cpu=True)
    sgd_optimizer = fluid.optimizer.SGD(
        learning_rate=fluid.learning_rate_decay.exponential_decay(
            learning_rate=0.0001,
            global_step=global_step,
            decay_steps=100000,
            decay_rate=0.5,
            staircase=True),
        global_step=global_step)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If global_step variable is force cpu, does decay_steps also force cpu?

decay_steps: A Python `int32` number.
end_learning_rate: A Python `float` number.
power: A Python `float` number
cycle: Boolean. If set true, decay the learning rate every decay_steps.

Returns:
The decayed learning rate
"""
if not isinstance(global_step, Variable):
raise ValueError("global_step is required for inverse_time_decay.")

if cycle:
div_res = layers.ceil(x=(global_step / decay_steps))
zero_var = layers.fill_constant(shape=[1], dtype='float32', value=0.0)
one_var = layers.fill_constant(shape=[1], dtype='float32', value=1.0)

with layers.Switch() as switch:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, our switch operator only support scalar condition now? And our if-else operator support vector condition.

Copy link
Member Author

@jacquesqiao jacquesqiao Feb 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the current switch operator only supports scalar condition. because it uses conditional_block, it's easy to support tensor as input in the future.

with switch.case(layers.equal(x=global_step, y=zero_var)):
layers.assign(input=one_var, output=div_res)
decay_steps = decay_steps * div_res
else:
decay_steps_var = layers.fill_constant(
shape=[1], dtype='float32', value=float(decay_steps))
global_step = layers.elementwise_min(x=global_step, y=decay_steps_var)

return (learning_rate - end_learning_rate) * \
((1 - global_step / decay_steps) ** power) + end_learning_rate


def piecewise_decay(global_step, boundaries, values):
"""Applies piecewise decay to the initial learning rate.

```python
boundaries = [10000, 20000]
values = [1.0, 0.5, 0.1]

if step < 10000:
learning_rate = 1.0
elif step >= 10000 and step < 20000:
learning_rate = 0.5
else:
learning_rate = 0.1
```
"""

if len(values) - len(boundaries) != 1:
raise ValueError("len(values) - len(boundaries) should be 1")

if not isinstance(global_step, Variable):
raise ValueError("global_step is required for piecewise_decay.")

lr = layers.create_global_var(
shape=[1],
value=0.0,
dtype='float32',
persistable=True,
name="learning_rate")

with layers.Switch() as switch:
for i in range(len(boundaries)):
boundary_val = layers.fill_constant(
shape=[1], dtype='float32', value=float(boundaries[i]))
value_var = layers.fill_constant(
shape=[1], dtype='float32', value=float(values[i]))
with switch.case(layers.less_than(global_step, boundary_val)):
layers.assign(value_var, lr)
last_value_var = layers.fill_constant(
shape=[1], dtype='float32', value=float(values[len(values) - 1]))
with switch.default():
layers.assign(last_value_var, lr)

return lr
93 changes: 66 additions & 27 deletions python/paddle/v2/fluid/tests/test_learning_rate_decay.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@
import unittest

import math
import copy

import paddle.v2.fluid.framework as framework
import paddle.v2.fluid as fluid
import paddle.v2.fluid.layers as layers
Expand Down Expand Up @@ -54,21 +56,37 @@ def inverse_time_decay(learning_rate,
return learning_rate / (1 + decay_rate * temp)


class TestLearningRateDecay(unittest.TestCase):
def check_decay(self, python_decay_fn, fluid_decay_fn, staircase):
init_lr = 1.0
decay_steps = 5
decay_rate = 0.5
def polynomial_decay(learning_rate,
global_step,
decay_steps,
end_learning_rate=0.0001,
power=1.0,
cycle=False):
if cycle:
div = math.ceil(global_step / float(decay_steps))
if div == 0:
div = 1
decay_steps = decay_steps * div
else:
global_step = min(global_step, decay_steps)
return (learning_rate - end_learning_rate) * \
((1 - float(global_step) / float(decay_steps)) ** power) + end_learning_rate


def piecewise_decay(global_step, boundaries, values):
assert len(boundaries) + 1 == len(values)
for i in range(len(boundaries)):
if global_step < boundaries[i]:
return values[i]
return values[len(values) - 1]


class TestLearningRateDecay(unittest.TestCase):
def check_decay(self, python_decay_fn, fluid_decay_fn, kwargs):
global_step = layers.create_global_var(
shape=[1], value=0.0, dtype='float32', persistable=True)

decayed_lr = fluid_decay_fn(
learning_rate=init_lr,
global_step=global_step,
decay_steps=decay_steps,
decay_rate=decay_rate,
staircase=staircase)
decayed_lr = fluid_decay_fn(global_step=global_step, **kwargs)
layers.increment(global_step, 1.0)

place = fluid.CPUPlace()
Expand All @@ -79,31 +97,52 @@ def check_decay(self, python_decay_fn, fluid_decay_fn, staircase):
step_val, lr_val = exe.run(fluid.default_main_program(),
feed=[],
fetch_list=[global_step, decayed_lr])
python_decayed_lr = python_decay_fn(
learning_rate=init_lr,
global_step=step,
decay_steps=decay_steps,
decay_rate=decay_rate,
staircase=staircase)
python_decayed_lr = python_decay_fn(global_step=step, **kwargs)
self.assertAlmostEqual(python_decayed_lr, lr_val[0])

def test_decay(self):
common_kwargs_true = {
"learning_rate": 1.0,
"decay_steps": 5,
"decay_rate": 0.5,
"staircase": True
}
common_kwargs_false = copy.deepcopy(common_kwargs_true)
common_kwargs_false["staircase"] = False

decay_fns = [
(exponential_decay, lr_decay.exponential_decay, True),
(exponential_decay, lr_decay.exponential_decay, False),
(natural_exp_decay, lr_decay.natural_exp_decay, True),
(natural_exp_decay, lr_decay.natural_exp_decay, False),
(inverse_time_decay, lr_decay.inverse_time_decay, True),
(inverse_time_decay, lr_decay.inverse_time_decay, False),
(exponential_decay, lr_decay.exponential_decay, common_kwargs_true),
(exponential_decay, lr_decay.exponential_decay,
common_kwargs_false),
(natural_exp_decay, lr_decay.natural_exp_decay, common_kwargs_true),
(natural_exp_decay, lr_decay.natural_exp_decay,
common_kwargs_false),
(inverse_time_decay, lr_decay.inverse_time_decay,
common_kwargs_true),
(inverse_time_decay, lr_decay.inverse_time_decay,
common_kwargs_false),
(polynomial_decay, lr_decay.polynomial_decay, {
"learning_rate": 1.0,
"decay_steps": 5,
"cycle": True
}),
(polynomial_decay, lr_decay.polynomial_decay, {
"learning_rate": 1.0,
"decay_steps": 5,
"cycle": False
}),
(piecewise_decay, lr_decay.piecewise_decay, {
"boundaries": [3, 6, 9],
"values": [0.1, 0.2, 0.3, 0.4]
}),
]

for py_decay_fn, fluid_decay_fn, staircase in decay_fns:
print("decay_fn=" + str(py_decay_fn) + " staircase=" + str(
staircase))
for py_decay_fn, fluid_decay_fn, kwargs in decay_fns:
print("decay_fn=" + py_decay_fn.__name__ + " kwargs=" + str(kwargs))
main_program = framework.Program()
startup_program = framework.Program()
with framework.program_guard(main_program, startup_program):
self.check_decay(py_decay_fn, fluid_decay_fn, staircase)
self.check_decay(py_decay_fn, fluid_decay_fn, kwargs)


if __name__ == '__main__':
Expand Down