Memory optimization for fit a line demo #7321

QiJune · 2018-01-08T12:50:40Z

This PR includes:

Control flow graph construction #7316 Simple Control Flow graph construction without if/while operator
Variable liveness analysis based on control flow graph #7317 Variable liveness analysis
Memory reuse policy based on liveness analysis #7318 Simple memory reuse policy without inplace attribute support
training benchmark on some examples #7320 A simple demo on fit_a_line

The cache memory pool can be hit:

hit cache !!!! pool index is 1, var name is fc_0.tmp_1@GRAD, cache var name is square_error_cost_0.tmp_1@GRAD, var shape is [-1L, 1L]
elementwise_add_grad
hit cache !!!! pool index is 0, var name is fc_0.b_0@GRAD, cache var name is mean_0.tmp_0@GRAD, var shape is [1L]
hit cache !!!! pool index is 0, var name is fc_0.tmp_0@GRAD, cache var name is square_error_cost_0.tmp_1, var shape is [-1L, 1L]

Memory can be saved from 102400 bytes to 90112 bytes.

Memory use before optimization in a batch

I0108 20:34:28.472046 2497127360 executor.cc:114] Memory used 36864
I0108 20:34:28.472112 2497127360 executor.cc:114] Memory used 49152
I0108 20:34:28.472159 2497127360 executor.cc:114] Memory used 53248
I0108 20:34:28.472302 2497127360 executor.cc:114] Memory used 57344
I0108 20:34:28.472363 2497127360 executor.cc:114] Memory used 61440
I0108 20:34:28.472487 2497127360 executor.cc:114] Memory used 65536
I0108 20:34:28.472534 2497127360 executor.cc:114] Memory used 69632
I0108 20:34:28.472662 2497127360 executor.cc:114] Memory used 73728
I0108 20:34:28.472720 2497127360 executor.cc:114] Memory used 77824
I0108 20:34:28.472798 2497127360 executor.cc:114] Memory used 81920
I0108 20:34:28.472895 2497127360 executor.cc:114] Memory used 86016
I0108 20:34:28.472966 2497127360 executor.cc:114] Memory used 90112
I0108 20:34:28.473053 2497127360 executor.cc:114] Memory used 98304
I0108 20:34:28.473188 2497127360 executor.cc:114] Memory used 102400
I0108 20:34:28.473237 2497127360 executor.cc:114] Memory used 102400
I0108 20:34:28.473333 2497127360 executor.cc:114] Memory used 102400
I0108 20:34:28.473363 2497127360 executor.cc:126] Memory used 102400
I0108 20:34:28.473496 2497127360 executor.cc:130] Memory used after deleting local scope 36864

Memory use after optimization in a batch

I0108 20:35:11.611282 2497127360 executor.cc:114] Memory used 36864
I0108 20:35:11.611460 2497127360 executor.cc:114] Memory used 49152
I0108 20:35:11.611562 2497127360 executor.cc:114] Memory used 53248
I0108 20:35:11.611773 2497127360 executor.cc:114] Memory used 57344
I0108 20:35:11.612010 2497127360 executor.cc:114] Memory used 61440
I0108 20:35:11.612197 2497127360 executor.cc:114] Memory used 65536
I0108 20:35:11.612269 2497127360 executor.cc:114] Memory used 69632
I0108 20:35:11.612385 2497127360 executor.cc:114] Memory used 73728
I0108 20:35:11.612460 2497127360 executor.cc:114] Memory used 77824
I0108 20:35:11.612624 2497127360 executor.cc:114] Memory used 81920
I0108 20:35:11.612782 2497127360 executor.cc:114] Memory used 86016
I0108 20:35:11.612910 2497127360 executor.cc:114] Memory used 86016
I0108 20:35:11.613070 2497127360 executor.cc:114] Memory used 86016
I0108 20:35:11.613468 2497127360 executor.cc:114] Memory used 90112
I0108 20:35:11.613684 2497127360 executor.cc:114] Memory used 90112
I0108 20:35:11.613921 2497127360 executor.cc:114] Memory used 90112
I0108 20:35:11.614055 2497127360 executor.cc:126] Memory used 90112
I0108 20:35:11.614233 2497127360 executor.cc:130] Memory used after deleting local scope 36864

reyoung · 2018-01-08T13:03:32Z

python/paddle/v2/fluid/memory_optimization_transpiler.py

+        # print(block_size)
+
+        # TODO(qijun) handle Program with if/while operators
+        self.global_block = program_desc.block(0)


We can just handle memory optimization for each block and do not optimize the memory between block. It may be enough.

wangkuiyi · 2018-01-09T19:23:39Z

paddle/framework/executor.cc

@@ -116,6 +115,7 @@ void Executor::Run(const ProgramDesc& pdesc, Scope* scope, int block_id,
  for (auto& op_desc : block.AllOps()) {
    auto op = paddle::framework::OpRegistry::CreateOp(*op_desc);
    VLOG(3) << op->DebugStringEx(local_scope);
+    VLOG(3) << "Memory used " << memory::memory_usage(place_);


Do we need a little more details in this log message, e.g., “before ....” or “after ...”?

This PR mainly provides a demo to show the result and these logs are actually for debugging. And another clean PR focus on memory optimization transpiler #7356 has been merged.
So I will close this PR.

QiJune added 6 commits January 5, 2018 16:51

init and support plain control flow graph

4c3a858

add get_diff helper method

8b5be7c

add general memory optimize policy

7b3cef8

test memory optimization transpiler in fit a line demo

c1e6e4f

add memory usage method and add memory log in executor

eeb6ab5

merge baidu/develop

364193b

QiJune requested review from reyoung, wangkuiyi, dzhwinter and tonyyang-svail January 8, 2018 12:51

reyoung reviewed Jan 8, 2018

View reviewed changes

fix ci

4a215d3

QiJune mentioned this pull request Jan 9, 2018

add simple memory optimization transpiler #7356

Merged

wangkuiyi approved these changes Jan 9, 2018

View reviewed changes

QiJune closed this Jan 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory optimization for fit a line demo #7321

Memory optimization for fit a line demo #7321

QiJune commented Jan 8, 2018 •

edited

Loading

reyoung Jan 8, 2018

wangkuiyi Jan 9, 2018

QiJune Jan 10, 2018

Memory optimization for fit a line demo #7321

Memory optimization for fit a line demo #7321

Conversation

QiJune commented Jan 8, 2018 • edited Loading

reyoung Jan 8, 2018

Choose a reason for hiding this comment

wangkuiyi Jan 9, 2018

Choose a reason for hiding this comment

QiJune Jan 10, 2018

Choose a reason for hiding this comment

QiJune commented Jan 8, 2018 •

edited

Loading