[Dy2St][PIR] Run test_break_continue
in sequential run mode
#63287
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR Category
Execute Infrastructure
PR Types
Others
Description
开启
test_break_continue
单测 caseTestOptimBreakInWhile
PIR 模式,需要在FLAGS_new_executor_sequential_run
下运行非
FLAGS_new_executor_sequential_run
仍然跑不起来,问题分析见下面PCard-66972
问题分析
现象
删掉
FLAGS_new_executor_sequential_run
,运行下面命令获取完整 log完整 log 见 break-continue.log
有问题的代码为:
仅 CPU 可复现(好像 Linux ),表现为结果错误,预期结果为 15(
0..5
的累加),但实际结果是随机的,大多数情况是 >15 的通过在
x += i
前加 print 可以得到如下结果:预期应为
0 1 2 3 4 5
,但实际结果为0 1 2 4 5 6
(随机的,下次跑不一定是这个)可以发现实际 print 的 i 会随机为
i + 1
的结果该问题仅在默认的多线程跑会有问题,开启
FLAGS_new_executor_sequential_run=true
或者FLAGS_enable_pir_in_executor_trace_run=true
都没问题,因此猜测因为依赖分析没有分析出两者的依赖关系,导致i += 1
在x += i
之前执行lower 后的 program 如下(仅关键部分):
可以看到
%1
->%arg_1
,而i + 1
虽然是一个新的 Value%22
,但%22
->%17
->%10
之后作为 block 参数重新 share data 到%arg_1
,也就是这里(%22) = "scale(phi_kernel)" (%arg_1, %21)
实际上是(%arg_1) = "scale(phi_kernel)" (%arg_1, %21)
,类似一个 inplace 操作而实际上 Program 并没有表示这样一个关系,因此分析产生
%19
、%20
、%21
、%22
的那四行并没有分析出后两个 OP 依赖于前两个 OP 执行,因此多线程情况下,可能会反过来,这一点可以通过 log 印证:这里只分析出了前两个 OP(
0 1
)之间的依赖关系(0 -> 1
),和后两个 OP(2 3
)之间的依赖关系(2 -> 3
),但并没有分析出来后两个 OP 和前两个 OP 之间的依赖关系,因此就存在先执行23
(i += 1
) 后执行01
(x += i)的可能,进而导致结果出错