Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dy2St][PIR] Run test_break_continue in sequential run mode #63287

Conversation

SigureMo
Copy link
Member

@SigureMo SigureMo commented Apr 7, 2024

PR Category

Execute Infrastructure

PR Types

Others

Description

开启 test_break_continue 单测 case TestOptimBreakInWhile PIR 模式,需要在 FLAGS_new_executor_sequential_run 下运行

FLAGS_new_executor_sequential_run 仍然跑不起来,问题分析见下面

PCard-66972

问题分析

现象

删掉 FLAGS_new_executor_sequential_run,运行下面命令获取完整 log

GLOG_v=10 FLAGS_print_ir=True python test/dygraph_to_static/test_break_continue.py TestOptimBreakInWhile.test_transformed_static_result__ast_pir > break-continue.log 2>&1

完整 log 见 break-continue.log

有问题的代码为:

def test_optim_break_in_while(x):
    x = paddle.to_tensor(x)
    i = paddle.tensor.fill_constant(shape=[1], dtype='int32', value=0)
    while i < 10:
        if i > 5:
            break
            x += 10086
        x += i
        i += 1
    return x

仅 CPU 可复现(好像 Linux ),表现为结果错误,预期结果为 15(0..5 的累加),但实际结果是随机的,大多数情况是 >15 的

通过在 x += i 前加 print 可以得到如下结果:

Variable: var
  - lod: {}
  - place: Place(cpu)
  - shape: [1]
  - layout: NCHW
  - dtype: int32
  - data: [0]
Variable: var
  - lod: {}
  - place: Place(cpu)
  - shape: [1]
  - layout: NCHW
  - dtype: int32
  - data: [1]
Variable: var
  - lod: {}
  - place: Place(cpu)
  - shape: [1]
  - layout: NCHW
  - dtype: int32
  - data: [2]
Variable: var
  - lod: {}
  - place: Place(cpu)
  - shape: [1]
  - layout: NCHW
  - dtype: int32
  - data: [4]
Variable: var
  - lod: {}
  - place: Place(cpu)
  - shape: [1]
  - layout: NCHW
  - dtype: int32
  - data: [5]
Variable: var
  - lod: {}
  - place: Place(cpu)
  - shape: [1]
  - layout: NCHW
  - dtype: int32
  - data: [6]

预期应为 0 1 2 3 4 5,但实际结果为 0 1 2 4 5 6(随机的,下次跑不一定是这个)

可以发现实际 print 的 i 会随机为 i + 1 的结果

该问题仅在默认的多线程跑会有问题,开启 FLAGS_new_executor_sequential_run=true 或者 FLAGS_enable_pir_in_executor_trace_run=true 都没问题,因此猜测因为依赖分析没有分析出两者的依赖关系,导致 i += 1x += i 之前执行

lower 后的 program 如下(仅关键部分):

IR after lowering = {
    # ...
    (%1) = "full(phi_kernel)" () {dtype:(pd_op.DataType)int32,kernel_key:<backend:CPU|layout:Undefined(AnyLayout)|dtype:int32>,kernel_name:"full",op_name:"pd_op.full",place:(pd_op.Place)Place(undefined:0),shape:(pd_op.IntArray)[1],stop_gradient:[true],value:(Float)0} : () -> cpu_tensor<1xi32> # 这个就是 i
    # ...
    (%9, %10, %11) = "pd_op.while"(cond=%8, inputs=%4, %1, %0) {
    ^%arg_0, %arg_1, %arg_2
        # ...
        (%17, %18) = pd_op.if (%16) {} -> cpu_tensor<1xi32>, cpu_tensor<1xi64>{
            # 这两条对应 x += i
            (%19) = "cast(phi_kernel)" (%arg_1) {dtype:(pd_op.DataType)int64,kernel_key:<backend:CPU|layout:NCHW|dtype:int32>,kernel_name:"cast",op_name:"pd_op.cast",stop_gradient:[true]} : (cpu_tensor<1xi32>) -> cpu_tensor<1xi64>
            (%20) = "add(phi_kernel)" (%arg_2, %19) {kernel_key:<backend:CPU|layout:NCHW|dtype:int64>,kernel_name:"add",op_name:"pd_op.add",stop_gradient:[true]} : (undefined_tensor<1xi64>, cpu_tensor<1xi64>) -> cpu_tensor<1xi64>
            # 这两条对应 i += 1
            (%21) = "full(phi_kernel)" () {dtype:(pd_op.DataType)float32,kernel_key:<backend:CPU|layout:Undefined(AnyLayout)|dtype:float32>,kernel_name:"full",op_name:"pd_op.full",place:(pd_op.Place)Place(cpu),shape:(pd_op.IntArray)[1],stop_gradient:[true],value:(Float)1} : () -> cpu_tensor<1xf32>
            (%22) = "scale(phi_kernel)" (%arg_1, %21) {bias:(Float)1,bias_after_scale:true,kernel_key:<backend:CPU|layout:NCHW|dtype:int32>,kernel_name:"scale",op_name:"pd_op.scale",stop_gradient:[true]} : (cpu_tensor<1xi32>, cpu_tensor<1xf32>) -> cpu_tensor<1xi32>
            () = "cf.yield" (%22, %20) {} : (cpu_tensor<1xi32>, cpu_tensor<1xi64>) ->
        } else {
            () = "cf.yield" (%arg_1, %arg_2) {} : (cpu_tensor<1xi32>, undefined_tensor<1xi64>) ->
        }
        # ...
        () = "cf.yield" (%26, %14, %17, %18) {} : (cpu_tensor<1xb>, cpu_tensor<b>, cpu_tensor<1xi32>, cpu_tensor<1xi64>) ->
    }
    () = "builtin.shadow_output" (%11) {output_name:"output_0"} : (undefined_tensor<1xi64>) ->
}

可以看到 %1 -> %arg_1,而 i + 1 虽然是一个新的 Value %22,但 %22 -> %17 -> %10 之后作为 block 参数重新 share data 到 %arg_1,也就是这里 (%22) = "scale(phi_kernel)" (%arg_1, %21) 实际上是 (%arg_1) = "scale(phi_kernel)" (%arg_1, %21),类似一个 inplace 操作

而实际上 Program 并没有表示这样一个关系,因此分析产生 %19%20%21%22 的那四行并没有分析出后两个 OP 依赖于前两个 OP 执行,因此多线程情况下,可能会反过来,这一点可以通过 log 印证:

======================== The network executed by pir interpreter ========================
{
    (%0) = "cast(phi_kernel)" (%arg_0) {dtype:(pd_op.DataType)int64,kernel_key:<backend:CPU|layout:NCHW|dtype:int32>,kernel_name:"cast",op_name:"pd_op.cast",stop_gradient:[true]} : (cpu_tensor<1xi32>) -> cpu_tensor<1xi64>
    (%1) = "add(phi_kernel)" (%arg_1, %0) {kernel_key:<backend:CPU|layout:NCHW|dtype:int64>,kernel_name:"add",op_name:"pd_op.add",stop_gradient:[true]} : (undefined_tensor<1xi64>, cpu_tensor<1xi64>) -> cpu_tensor<1xi64>
    (%2) = "full(phi_kernel)" () {dtype:(pd_op.DataType)float32,kernel_key:<backend:CPU|layout:Undefined(AnyLayout)|dtype:float32>,kernel_name:"full",op_name:"pd_op.full",place:(pd_op.Place)Place(cpu),shape:(pd_op.IntArray)[1],stop_gradient:[true],value:(Float)1} : () -> cpu_tensor<1xf32>
    (%3) = "scale(phi_kernel)" (%arg_0, %2) {bias:(Float)1,bias_after_scale:true,kernel_key:<backend:CPU|layout:NCHW|dtype:int32>,kernel_name:"scale",op_name:"pd_op.scale",stop_gradient:[true]} : (cpu_tensor<1xi32>, cpu_tensor<1xf32>) -> cpu_tensor<1xi32>
    () = "cf.yield" (%3, %1) {} : (cpu_tensor<1xi32>, cpu_tensor<1xi64>) ->
}

======================== The instruction executed by pir interpreter ========================
{outputs} =  instruction_name[idx] ({inputs})
0: ( 23 )  = pd_op.cast ( 12 )
1: ( 24 )  = pd_op.add ( 23 )  ( 13 )
2: ( 25 )  = pd_op.full
3: ( 26 )  = pd_op.scale ( 25 )  ( 12 )
---------------------------var_id -> var_name -> variable*---------------------------
0 -> _jst.0.x.0 -> 0x127431d70
1 -> 0x1339f0000648973463091791_inner_var_1 -> 0x12742bcd0
2 -> 0x1339f0000648973463091791_inner_var_2 -> 0x12742c160
3 -> 0x1339f0000648973463091791_inner_var_3 -> 0x127429970
4 -> 0x1339f0000648973463091791_inner_var_4 -> 0x12742be40
5 -> 0x1339f0000648973463091791_inner_var_5 -> 0x12742cf80
6 -> 0x1339f0000648973463091791_inner_var_6 -> 0x12742bdc0
7 -> 0x1339f0000648973463091791_inner_var_7 -> 0x127432320
8 -> 0x1339f0000648973463091791_inner_var_8 -> 0x12742e050
9 -> 0x1339f0000648973463091791_inner_var_9 -> 0x127429cd0
10 -> output_0 -> 0x12742c780
11 -> 0x16c4528d0648973465007541body_block_arg_0 -> 0x12741df40
12 -> 0x16c4528d0648973465012958body_block_arg_1 -> 0x12741cc70
13 -> 0x16c4528d0648973465021500body_block_arg_2 -> 0x12741cb00
14 -> 0x16b26c800648973465049958_inner_var_14 -> 0x12741b9e0
15 -> 0x16b26c800648973465049958_inner_var_15 -> 0x12741b8f0
16 -> 0x16b26c800648973465049958_inner_var_16 -> 0x12741ba60
17 -> 0x16b26c800648973465049958_inner_var_17 -> 0x12741ab80
18 -> 0x16b26c800648973465049958_inner_var_18 -> 0x12741aa70
19 -> 0x16b26c800648973465049958_inner_var_19 -> 0x12741a3f0
20 -> 0x16b26c800648973465049958_inner_var_20 -> 0x127419ae0
21 -> 0x16b26c800648973465049958_inner_var_21 -> 0x1274199d0
22 -> 0x16b26c800648973465049958_inner_var_22 -> 0x127418b60
23 -> 0x16b2a0600648973468709083_inner_var_23 -> 0x1275feb20
24 -> 0x16b2a0600648973468709083_inner_var_24 -> 0x1275fe6f0
25 -> 0x16b2a0600648973468709083_inner_var_25 -> 0x1275fdc90
26 -> 0x16b2a0600648973468709083_inner_var_26 -> 0x1275fe2c0


======================= The dependency of all instruction ========================
id -> down_stream_id
0 -> 1
2 -> 3

这里只分析出了前两个 OP(0 1)之间的依赖关系(0 -> 1),和后两个 OP(2 3)之间的依赖关系(2 -> 3),但并没有分析出来后两个 OP 和前两个 OP 之间的依赖关系,因此就存在先执行 23i += 1) 后执行 01(x += i)的可能,进而导致结果出错

Copy link

paddle-bot bot commented Apr 7, 2024

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@SigureMo SigureMo requested a review from gouzil April 7, 2024 11:21
@SigureMo SigureMo merged commit de0cd61 into PaddlePaddle:develop Apr 8, 2024
30 checks passed
@SigureMo SigureMo deleted the dy2st/enable-test-break-continue-in-sequential-run-mode branch April 8, 2024 06:12
co63oc pushed a commit to co63oc/Paddle that referenced this pull request Apr 9, 2024
co63oc pushed a commit to co63oc/Paddle that referenced this pull request Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants