Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output of roberta models varies between NNPA compiles #2789

Closed
cjvolzka opened this issue Apr 5, 2024 · 1 comment · Fixed by #2794
Closed

Output of roberta models varies between NNPA compiles #2789

cjvolzka opened this issue Apr 5, 2024 · 1 comment · Fixed by #2794

Comments

@cjvolzka
Copy link
Collaborator

cjvolzka commented Apr 5, 2024

After compiling roberta-sequence-classification-9.onnx and roberta-base-11.onnx for NNPA and running the generate .so. The model output can vary slightly.

For example, compiling and running roberta-sequence-classification-9 twice may yield

1st: [[ 0.455078, -0.35498 ]]
2nd: [[ 0.457520, -0.355957 ]]
3rd: [[ 0.458984, -0.357422 ]]
...

The values are always very close but I wouldn't expect any differences at all.

Notes:

  • Don't seem to see the issue compiling for CPU
  • Multiple runs of the same .so file yeilds consistent results. I only encounter the issue if I recompile and then re-run the model.
  • I use the same version of onnx-mlir and the .onnx files for all tests (see test_robert.sh script)
  • I don't think this is from a recent change. I've noticed random failures in our tolerance test in the past. However I couldn't reproduce the issue I investigated those times because I would re-run the same .so for extended periods and not observe issue. Since I couldn't reproduce it, I thought I just needed to loosen our tolerance test from commits over time. It seems to be happening more lately so I tried both re-compiling and re-running the model and found I can reproduce the issue readily.
  • I'm attaching roberta-out.zip with a script and our s390x test client which will show the issue. To run:
    • Extract .zip to a folder
    • Download models from the onnx model zoo and put in the same folder
    • run ./test-roberta.sh roberta-sequence-classification-9
      • Notice the actual column's output varies between compile and run cycles. (Note: exp column are the CPU expected values based on the example values in the onnx-model-zoo )
    • run ./test-roberta.sh roberta-base-11
      • There's many more output values for this model. So it's easier to look for lines likeMin passing r_tol: and Max absolute difference (Min passing a_tol for r_tol to be 0) and notice it keeps shifting. Those are the minimum a_tol and r_tol values for the test to pass that run.
@cjvolzka
Copy link
Collaborator Author

cjvolzka commented Apr 8, 2024

I tried to track this down some. I don't have an exact commit but it becomes much worse (and easily reproducible) somewhere after ebac513.

At ebac513 I got it to happen twice out of 70 tests. The next four commits had an issue that caused the model to not compile. The next commit to compile, b7e981d, the issue is readily reproducible and values shift slightly almost every test iteration.

I'm still tracking down the first commit to show the issue. This is what I've tested so far ... means there's commits between the two shown that I haven't tested.

b7e981d - almost every iteration different
708faf5 - (Compile failure)
74b00e3 - (Compile failure)
cc4f31d - (Compile failure)
2fac346 - (Compile failure)
ebac513 - two out of 70 iterations different
...
de3ebd3 - 5 out of 159 iterations different
...
7316d28 - 19 out of 94 iterations different
...
cd7576e - 6 out of 93 iterations different
...
a70c43a - 1 out of 40 iterations different
...
ec91b39 - 1 out of 61 iterations different
d2f4797 - 1 out of 26 iterations different
38b16b0 - 1 out of 43 iterations different
dea431d - stable for 200 iterations
...
c55a536 - stable for 39 iterations
...
4e1c970 - (4.0.0) stable for over 200 iterations

cjvolzka pushed a commit to cjvolzka/onnx-mlir that referenced this issue Apr 11, 2024
…s and some rules in zhigh-to-onnx pass

The onnx-to-zhigh pass has two phases: 1) converting multiple onnx ops into a single zhigh op, and 2) converting a single onnx op to a single zhigh op, where the second phase uses DimAnalysis (Patterns in the 1st phase at this moment does not use DimAnalysis)

The problem is DimAnalysis is currently called before the 1st phase, which is not good because the 1st  phase may change the IR so the information from DimAnalysis is obsoleted to the 2nd phase. Correct position for DimAnalysis would be just before the 2nd phase.

Other than that, this PR changes slightly the rules in zhigh-to-onnx pass so that for binary ops, only one input (instead of two) that is from stick would be enough to trigger the rule to convert a zhigh op back to an onnx op.

Resolves onnx#2789

---------

Signed-off-by: Tung D. Le <[email protected]>
(cherry picked from commit 80a63f2)
Signed-off-by: Charles Volzka <[email protected]>
cjvolzka added a commit that referenced this issue Apr 11, 2024
…s and some rules in zhigh-to-onnx pass (#2797)

The onnx-to-zhigh pass has two phases: 1) converting multiple onnx ops into a single zhigh op, and 2) converting a single onnx op to a single zhigh op, where the second phase uses DimAnalysis (Patterns in the 1st phase at this moment does not use DimAnalysis)

The problem is DimAnalysis is currently called before the 1st phase, which is not good because the 1st  phase may change the IR so the information from DimAnalysis is obsoleted to the 2nd phase. Correct position for DimAnalysis would be just before the 2nd phase.

Other than that, this PR changes slightly the rules in zhigh-to-onnx pass so that for binary ops, only one input (instead of two) that is from stick would be enough to trigger the rule to convert a zhigh op back to an onnx op.

Resolves #2789

---------


(cherry picked from commit 80a63f2)

Signed-off-by: Tung D. Le <[email protected]>
Signed-off-by: Charles Volzka <[email protected]>
Co-authored-by: Tung D. Le <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant