Add some dist-training robust cases into fluid benchmark test #11207

velconia · 2018-06-05T14:01:04Z

2. add learning rate decay feature into fluid benchmark test 3. add L1&L2 regularization feature into fluid benchmark test 4. add error clipping feature into fluid benchmark test 5. add gradient clipping feature into fluid benchmark test

typhoonzero · 2018-06-05T15:01:10Z

benchmark/fluid/models/machine_translation.py

@@ -26,6 +26,10 @@
 import paddle.fluid.core as core
 import paddle.fluid.framework as framework
 from paddle.fluid.executor import Executor
+from models.model_base import get_decay_learning_rate


model_base is not uploaded?

Thanks for review, I added the benchmark/fluid/models/model_base.py file in next commit

… benchmark

typhoonzero · 2018-06-06T15:05:28Z

benchmark/fluid/models/machine_translation.py


    # clone from default main program
    inference_program = fluid.default_main_program().clone()

-    optimizer = fluid.optimizer.Adam(learning_rate=args.learning_rate)
+    # set gradient clip
+    set_gradient_clip(args.gradient_clip_method, args.gradient_clip_norm)


Is there a way that we can disable these settings if the args is empty?

if clip_method in args is None, these settings will be disabled, and if user do NOT specify the args --gradient_clip_method, the args will be None in the case of default.

the code was like below

def set_gradient_clip(clip_method, clip_norm=1.): if not clip_method: return None

typhoonzero · 2018-06-08T02:41:51Z

I'm currently thinking, we can test all these cases using unit test and not CE. Run e2e tests with CE may spend alot of time

velconia · 2018-06-08T06:10:41Z

Actually, test cases have cover the most part of these features, however, what we need is:

Running an program in distributed environment, e.g., run learning decay with parallel executor in 2 parameter servers and 2 trainers which unit tests could not cover.
These tests does not run to end actually, we just run few iterations each time, which will not cost much time.

velconia · 2018-06-08T06:12:12Z

So I guess, put these features in fluid benchmark and add about 6-7 cases in ce is a good choice, which will cost around 30s each case

2. remove lr_decay, regularization, clipping out of fluid_benchmark.py

… benchmark

… out

typhoonzero · 2018-06-10T02:01:09Z

benchmark/fluid/args.py

+        choices=[],
+        help='Error clipping method, not allowed yet')
+    parser.add_argument(
+        '--error_clip_min',


Can we remove clipping and other optimization configures in argument, It might be clean if we leave these settings to model configs, Thanks!

2. fix bug in test_listen_and_serv_op

… benchmark

typhoonzero · 2018-06-11T07:11:57Z

python/paddle/fluid/tests/unittests/test_listen_and_serv_op.py

            try:
                # the listen_and_serv_op would touch a file which contains the listen port
                # on the /tmp directory until it was ready to process all the RPC call.
                os.stat("/tmp/paddle.%d.port" % pid)
                return
            except os.error:
-                retry_times -= 1
+                retry_times -= sleep_time


This change seems is not for current PR? and retry_times seems is the total count for trying not the time.

I changed the name retry_times to start_left_time to indicate that this is the left time for pserver starting, and this change is for passing the CI

typhoonzero · 2018-06-11T08:10:42Z

Ref #11213 for adding unit tests

typhoonzero

LGTM!

velconia added 2 commits June 5, 2018 21:51

Add some document to README.md under benchmark/fluid/ repo

3bd8f9e

typhoonzero reviewed Jun 5, 2018

View reviewed changes

velconia added 3 commits June 6, 2018 11:39

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

2da0ef7

… benchmark

Add model_base.py

3bf93b3

Fix bugs in test_listen_and_serv_op

8041e8d

typhoonzero reviewed Jun 6, 2018

View reviewed changes

velconia added 6 commits June 8, 2018 15:18

1. remove args out of fluid_benchmark.py

4dd0ded

2. remove lr_decay, regularization, clipping out of fluid_benchmark.py

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

7e0afd5

… benchmark

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

e67392e

… benchmark

add async_mode description to doc and remove the clipping description…

9c2e68d

… out

for restart build

d11e2bf

to restart build

2da70cc

typhoonzero reviewed Jun 10, 2018

View reviewed changes

velconia added 3 commits June 11, 2018 11:21

remove optimization args from args.py

95cbb43

1. remove optimization from models

e140844

2. fix bug in test_listen_and_serv_op

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

4779338

… benchmark

typhoonzero reviewed Jun 11, 2018

View reviewed changes

velconia added 2 commits June 11, 2018 15:16

change the name retry_times to left_time

0a90eee

change retry_times to the pserver start left time

c950d22

typhoonzero approved these changes Jun 11, 2018

View reviewed changes

typhoonzero merged commit 1cfd3cb into PaddlePaddle:develop Jun 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add some dist-training robust cases into fluid benchmark test #11207

Add some dist-training robust cases into fluid benchmark test #11207

velconia commented Jun 5, 2018

typhoonzero Jun 5, 2018

velconia Jun 6, 2018

typhoonzero Jun 6, 2018

velconia Jun 7, 2018

typhoonzero commented Jun 8, 2018

velconia commented Jun 8, 2018

velconia commented Jun 8, 2018

typhoonzero Jun 10, 2018

velconia Jun 11, 2018

typhoonzero Jun 11, 2018

velconia Jun 11, 2018

typhoonzero commented Jun 11, 2018

typhoonzero left a comment

Add some dist-training robust cases into fluid benchmark test #11207

Add some dist-training robust cases into fluid benchmark test #11207

Conversation

velconia commented Jun 5, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

typhoonzero commented Jun 8, 2018

velconia commented Jun 8, 2018

velconia commented Jun 8, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

typhoonzero commented Jun 11, 2018

typhoonzero left a comment

Choose a reason for hiding this comment