[Dy2St][PIR] Run `test_break_continue` in sequential run mode #63287

SigureMo · 2024-04-07T09:53:16Z

PR Category

Execute Infrastructure

PR Types

Others

Description

开启 test_break_continue 单测 case TestOptimBreakInWhile PIR 模式，需要在 FLAGS_new_executor_sequential_run 下运行

非 FLAGS_new_executor_sequential_run 仍然跑不起来，问题分析见下面

PIR 动转静理想态单测推全验证任务列表（二期）🥳 #60131

PCard-66972

问题分析

现象

删掉 FLAGS_new_executor_sequential_run，运行下面命令获取完整 log

GLOG_v=10 FLAGS_print_ir=True python test/dygraph_to_static/test_break_continue.py TestOptimBreakInWhile.test_transformed_static_result__ast_pir > break-continue.log 2>&1

完整 log 见 break-continue.log

有问题的代码为：

def test_optim_break_in_while(x):
    x = paddle.to_tensor(x)
    i = paddle.tensor.fill_constant(shape=[1], dtype='int32', value=0)
    while i < 10:
        if i > 5:
            break
            x += 10086
        x += i
        i += 1
    return x

仅 CPU 可复现（好像 Linux ），表现为结果错误，预期结果为 15（0..5 的累加），但实际结果是随机的，大多数情况是 >15 的

通过在 x += i 前加 print 可以得到如下结果：

Variable: var
  - lod: {}
  - place: Place(cpu)
  - shape: [1]
  - layout: NCHW
  - dtype: int32
  - data: [0]
Variable: var
  - lod: {}
  - place: Place(cpu)
  - shape: [1]
  - layout: NCHW
  - dtype: int32
  - data: [1]
Variable: var
  - lod: {}
  - place: Place(cpu)
  - shape: [1]
  - layout: NCHW
  - dtype: int32
  - data: [2]
Variable: var
  - lod: {}
  - place: Place(cpu)
  - shape: [1]
  - layout: NCHW
  - dtype: int32
  - data: [4]
Variable: var
  - lod: {}
  - place: Place(cpu)
  - shape: [1]
  - layout: NCHW
  - dtype: int32
  - data: [5]
Variable: var
  - lod: {}
  - place: Place(cpu)
  - shape: [1]
  - layout: NCHW
  - dtype: int32
  - data: [6]

预期应为 0 1 2 3 4 5，但实际结果为 0 1 2 4 5 6（随机的，下次跑不一定是这个）

可以发现实际 print 的 i 会随机为 i + 1 的结果

该问题仅在默认的多线程跑会有问题，开启 FLAGS_new_executor_sequential_run=true 或者 FLAGS_enable_pir_in_executor_trace_run=true 都没问题，因此猜测因为依赖分析没有分析出两者的依赖关系，导致 i += 1 在 x += i 之前执行

lower 后的 program 如下（仅关键部分）：

IR after lowering = {
    # ...
    (%1) = "full(phi_kernel)" () {dtype:(pd_op.DataType)int32,kernel_key:<backend:CPU|layout:Undefined(AnyLayout)|dtype:int32>,kernel_name:"full",op_name:"pd_op.full",place:(pd_op.Place)Place(undefined:0),shape:(pd_op.IntArray)[1],stop_gradient:[true],value:(Float)0} : () -> cpu_tensor<1xi32> # 这个就是 i
    # ...
    (%9, %10, %11) = "pd_op.while"(cond=%8, inputs=%4, %1, %0) {
    ^%arg_0, %arg_1, %arg_2
        # ...
        (%17, %18) = pd_op.if (%16) {} -> cpu_tensor<1xi32>, cpu_tensor<1xi64>{
            # 这两条对应 x += i
            (%19) = "cast(phi_kernel)" (%arg_1) {dtype:(pd_op.DataType)int64,kernel_key:<backend:CPU|layout:NCHW|dtype:int32>,kernel_name:"cast",op_name:"pd_op.cast",stop_gradient:[true]} : (cpu_tensor<1xi32>) -> cpu_tensor<1xi64>
            (%20) = "add(phi_kernel)" (%arg_2, %19) {kernel_key:<backend:CPU|layout:NCHW|dtype:int64>,kernel_name:"add",op_name:"pd_op.add",stop_gradient:[true]} : (undefined_tensor<1xi64>, cpu_tensor<1xi64>) -> cpu_tensor<1xi64>
            # 这两条对应 i += 1
            (%21) = "full(phi_kernel)" () {dtype:(pd_op.DataType)float32,kernel_key:<backend:CPU|layout:Undefined(AnyLayout)|dtype:float32>,kernel_name:"full",op_name:"pd_op.full",place:(pd_op.Place)Place(cpu),shape:(pd_op.IntArray)[1],stop_gradient:[true],value:(Float)1} : () -> cpu_tensor<1xf32>
            (%22) = "scale(phi_kernel)" (%arg_1, %21) {bias:(Float)1,bias_after_scale:true,kernel_key:<backend:CPU|layout:NCHW|dtype:int32>,kernel_name:"scale",op_name:"pd_op.scale",stop_gradient:[true]} : (cpu_tensor<1xi32>, cpu_tensor<1xf32>) -> cpu_tensor<1xi32>
            () = "cf.yield" (%22, %20) {} : (cpu_tensor<1xi32>, cpu_tensor<1xi64>) ->
        } else {
            () = "cf.yield" (%arg_1, %arg_2) {} : (cpu_tensor<1xi32>, undefined_tensor<1xi64>) ->
        }
        # ...
        () = "cf.yield" (%26, %14, %17, %18) {} : (cpu_tensor<1xb>, cpu_tensor<b>, cpu_tensor<1xi32>, cpu_tensor<1xi64>) ->
    }
    () = "builtin.shadow_output" (%11) {output_name:"output_0"} : (undefined_tensor<1xi64>) ->
}

可以看到 %1 -> %arg_1，而 i + 1 虽然是一个新的 Value %22，但 %22 -> %17 -> %10 之后作为 block 参数重新 share data 到 %arg_1，也就是这里 (%22) = "scale(phi_kernel)" (%arg_1, %21) 实际上是 (%arg_1) = "scale(phi_kernel)" (%arg_1, %21)，类似一个 inplace 操作

而实际上 Program 并没有表示这样一个关系，因此分析产生 %19、%20、%21、%22 的那四行并没有分析出后两个 OP 依赖于前两个 OP 执行，因此多线程情况下，可能会反过来，这一点可以通过 log 印证：

======================== The network executed by pir interpreter ========================
{
    (%0) = "cast(phi_kernel)" (%arg_0) {dtype:(pd_op.DataType)int64,kernel_key:<backend:CPU|layout:NCHW|dtype:int32>,kernel_name:"cast",op_name:"pd_op.cast",stop_gradient:[true]} : (cpu_tensor<1xi32>) -> cpu_tensor<1xi64>
    (%1) = "add(phi_kernel)" (%arg_1, %0) {kernel_key:<backend:CPU|layout:NCHW|dtype:int64>,kernel_name:"add",op_name:"pd_op.add",stop_gradient:[true]} : (undefined_tensor<1xi64>, cpu_tensor<1xi64>) -> cpu_tensor<1xi64>
    (%2) = "full(phi_kernel)" () {dtype:(pd_op.DataType)float32,kernel_key:<backend:CPU|layout:Undefined(AnyLayout)|dtype:float32>,kernel_name:"full",op_name:"pd_op.full",place:(pd_op.Place)Place(cpu),shape:(pd_op.IntArray)[1],stop_gradient:[true],value:(Float)1} : () -> cpu_tensor<1xf32>
    (%3) = "scale(phi_kernel)" (%arg_0, %2) {bias:(Float)1,bias_after_scale:true,kernel_key:<backend:CPU|layout:NCHW|dtype:int32>,kernel_name:"scale",op_name:"pd_op.scale",stop_gradient:[true]} : (cpu_tensor<1xi32>, cpu_tensor<1xf32>) -> cpu_tensor<1xi32>
    () = "cf.yield" (%3, %1) {} : (cpu_tensor<1xi32>, cpu_tensor<1xi64>) ->
}

======================== The instruction executed by pir interpreter ========================
{outputs} =  instruction_name[idx] ({inputs})
0: ( 23 )  = pd_op.cast ( 12 )
1: ( 24 )  = pd_op.add ( 23 )  ( 13 )
2: ( 25 )  = pd_op.full
3: ( 26 )  = pd_op.scale ( 25 )  ( 12 )
---------------------------var_id -> var_name -> variable*---------------------------
0 -> _jst.0.x.0 -> 0x127431d70
1 -> 0x1339f0000648973463091791_inner_var_1 -> 0x12742bcd0
2 -> 0x1339f0000648973463091791_inner_var_2 -> 0x12742c160
3 -> 0x1339f0000648973463091791_inner_var_3 -> 0x127429970
4 -> 0x1339f0000648973463091791_inner_var_4 -> 0x12742be40
5 -> 0x1339f0000648973463091791_inner_var_5 -> 0x12742cf80
6 -> 0x1339f0000648973463091791_inner_var_6 -> 0x12742bdc0
7 -> 0x1339f0000648973463091791_inner_var_7 -> 0x127432320
8 -> 0x1339f0000648973463091791_inner_var_8 -> 0x12742e050
9 -> 0x1339f0000648973463091791_inner_var_9 -> 0x127429cd0
10 -> output_0 -> 0x12742c780
11 -> 0x16c4528d0648973465007541body_block_arg_0 -> 0x12741df40
12 -> 0x16c4528d0648973465012958body_block_arg_1 -> 0x12741cc70
13 -> 0x16c4528d0648973465021500body_block_arg_2 -> 0x12741cb00
14 -> 0x16b26c800648973465049958_inner_var_14 -> 0x12741b9e0
15 -> 0x16b26c800648973465049958_inner_var_15 -> 0x12741b8f0
16 -> 0x16b26c800648973465049958_inner_var_16 -> 0x12741ba60
17 -> 0x16b26c800648973465049958_inner_var_17 -> 0x12741ab80
18 -> 0x16b26c800648973465049958_inner_var_18 -> 0x12741aa70
19 -> 0x16b26c800648973465049958_inner_var_19 -> 0x12741a3f0
20 -> 0x16b26c800648973465049958_inner_var_20 -> 0x127419ae0
21 -> 0x16b26c800648973465049958_inner_var_21 -> 0x1274199d0
22 -> 0x16b26c800648973465049958_inner_var_22 -> 0x127418b60
23 -> 0x16b2a0600648973468709083_inner_var_23 -> 0x1275feb20
24 -> 0x16b2a0600648973468709083_inner_var_24 -> 0x1275fe6f0
25 -> 0x16b2a0600648973468709083_inner_var_25 -> 0x1275fdc90
26 -> 0x16b2a0600648973468709083_inner_var_26 -> 0x1275fe2c0


======================= The dependency of all instruction ========================
id -> down_stream_id
0 -> 1
2 -> 3

这里只分析出了前两个 OP（0 1）之间的依赖关系（0 -> 1），和后两个 OP（2 3）之间的依赖关系（2 -> 3），但并没有分析出来后两个 OP 和前两个 OP 之间的依赖关系，因此就存在先执行 23（i += 1）后执行 01（x += i）的可能，进而导致结果出错

paddle-bot · 2024-04-07T09:53:21Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

…Paddle#63287)

[Dy2St][PIR] Run test_break_continue in sequential run mode

c378b8f

SigureMo requested a review from gouzil April 7, 2024 11:21

gouzil approved these changes Apr 7, 2024

View reviewed changes

SigureMo merged commit de0cd61 into PaddlePaddle:develop Apr 8, 2024
30 checks passed

SigureMo deleted the dy2st/enable-test-break-continue-in-sequential-run-mode branch April 8, 2024 06:12

SigureMo mentioned this pull request Apr 8, 2024

PIR 动转静理想态单测推全验证任务列表（二期）🥳 #60131

Closed

co63oc pushed a commit to co63oc/Paddle that referenced this pull request Apr 9, 2024

[Dy2St][PIR] Run test_break_continue in sequential run mode (Paddle…

8881059

…Paddle#63287)

co63oc pushed a commit to co63oc/Paddle that referenced this pull request Apr 10, 2024

[Dy2St][PIR] Run test_break_continue in sequential run mode (Paddle…

6ea42ad

…Paddle#63287)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Dy2St][PIR] Run `test_break_continue` in sequential run mode #63287

[Dy2St][PIR] Run `test_break_continue` in sequential run mode #63287

SigureMo commented Apr 7, 2024 •

edited

Loading

paddle-bot bot commented Apr 7, 2024

[Dy2St][PIR] Run test_break_continue in sequential run mode #63287

[Dy2St][PIR] Run test_break_continue in sequential run mode #63287

Conversation

SigureMo commented Apr 7, 2024 • edited Loading

PR Category

PR Types

Description

问题分析

现象

paddle-bot bot commented Apr 7, 2024

[Dy2St][PIR] Run `test_break_continue` in sequential run mode #63287

[Dy2St][PIR] Run `test_break_continue` in sequential run mode #63287

SigureMo commented Apr 7, 2024 •

edited

Loading