Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i#4940: Fix the wrong aflags estimation in drreg #4941

Merged
merged 33 commits into from
Jul 15, 2021

Conversation

JerryYouxin
Copy link
Contributor

Fix issue #4940 to make sure the functionality of drreg_reserve_aflags/drreg_unreserve_aflags and drreg_aflags_are_dead for conditional instructions like cmovcc and sbb.

@JerryYouxin JerryYouxin changed the title Fix issue#4940 i#4940 Fix the drreg forward analysis estimation failure of arithemetic flags for conditional instructions Jun 8, 2021
Copy link
Contributor

@abhinav92003 abhinav92003 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for filing the issue and the PR. The CI tests shows some failures due to formatting. Please fix those and rerun.

ext/drreg/drreg.c Outdated Show resolved Hide resolved
ext/drreg/drreg.c Outdated Show resolved Hide resolved
@abhinav92003
Copy link
Contributor

It seems that this PR doesn't run our CI workflows automatically, because this is @JerryYouxin's first PR:

After a contributor has at least one pull request merged into a project's repository, any future pull requests from that contributor's fork will automatically run workflows.

https://docs.github.com/en/actions/managing-workflow-runs/approving-workflow-runs-from-public-forks.

Someone with write access needs to approve these workflow runs everytime. I can do that when needed.

The last commit is only a comment fix, so not running this time though.

@JerryYouxin
Copy link
Contributor Author

JerryYouxin commented Jun 9, 2021

I have added the test similar to the gist, but I find that the test will fail with a segmentation fault in application code and I cannot figure out why. It seems that if the analysis phase of the test client (event_app_analysis) include the drreg_reserve_register/drreg_unreserve_register or drreg_reserve_aflags/drreg_unreserve_aflags, it will cause the segmentation fault; otherwise, if it only includes the checks of drreg_are_aflags_dead, the client will not result in test program's segmentation fault.

@JerryYouxin
Copy link
Contributor Author

@abhinav92003 I have implemented a minimal test for arithmetic flag liveness estimation in drreg-test.dll.c and drreg-test.c. I have tested that if this bug fix is not included, the test will fail with message aflag liveness estimation of drx and drreg should always consist. If my bug fix is included, the test will pass correctly.

Copy link
Contributor

@abhinav92003 abhinav92003 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting is broken for newly added changes. Please see the recent clang-format CI run. You can install clang-format on your machine to automatically format your changes: https://dynamorio.org/page_code_style.html

suite/tests/client-interface/drreg-test.dll.c Outdated Show resolved Hide resolved
@abhinav92003
Copy link
Contributor

Also, please change the PR description (and final commit description when you merge this PR) according to the Commit Messages section here: https://dynamorio.org/page_code_reviews.html. Particularly, please add the Issue: #NNNNN and Fixes: #NNNNN` string. Also, I usually prefer the PR/commit title to be short (less than 60 chars) so that Github doesn't wrap it in some pages.

@JerryYouxin JerryYouxin changed the title i#4940 Fix the drreg forward analysis estimation failure of arithemetic flags for conditional instructions i#4940: Fix the wrong aflags estimation in drreg Jun 10, 2021
@JerryYouxin
Copy link
Contributor Author

The check has passed in x86 but failed in ci-windows with vs2017. It seems that the bug may still exist in the windows platform.

133: Test command: D:\a\dynamorio\dynamorio\build_debug-internal-64\bin64\drrun.exe "-s" "90" "-quiet" "-debug" "-use_dll" "D:/a/dynamorio/dynamorio/build_debug-internal-64/lib64/debug/dynamorio.dll" "-exit0" "-stderr_mask" "0xC" "-msgbox_mask" "0" "-dumpcore_mask" "0x7d" "-staged" "-code_api" "-c" "D:/a/dynamorio/dynamorio/build_debug-internal-64/suite/tests/bin/client.drreg-test.dll.dll" "--" "D:/a/dynamorio/dynamorio/build_debug-internal-64/suite/tests/bin/client.drreg-test.exe"
133: Test timeout computed to be: 600
133: drreg-test running
133: CHECK failed ..\suite\tests\client-interface\drreg-test.dll.c:159: aflags liveness estimation of drx and drreg should always be consistent
134/259 Test #133: code_api|client.drreg-test ...................................***Failed  Required regular expression not found. Regex=[^drreg-test running
?
drreg-test finished
?
$
]  6.10 sec
drreg-test running
CHECK failed ..\suite\tests\client-interface\drreg-test.dll.c:159: aflags liveness estimation of drx and drreg should always be consistent

@JerryYouxin
Copy link
Contributor Author

@abhinav92003 I'm sorry that I cannot reproduce the test failure on my local windows machine, and I cannot figure out why the test is failed. Can you provide more details about the windows test (e.g., environment setup) to reproduce the failure for further analysis of the bug fixes?

@abhinav92003
Copy link
Contributor

@abhinav92003 I'm sorry that I cannot reproduce the test failure on my local windows machine, and I cannot figure out why the test is failed. Can you provide more details about the windows test (e.g., environment setup) to reproduce the failure for further analysis of the bug fixes?

We have seen some issues before where we weren't able to reproduce test failures locally. Fortunately, there's a way to ssh into the Github Actions machine where the failure happened to get more info. See the section on "Debugging Tests on Github Actions Runner" at https://dynamorio.org/page_test_suite.html. Essentially, you need to change the Github Actions workflow config in your branch to invoke tmate if a failure happened: like 9580af8, but for dynamorio/.github/workflows/ci-windows.yml.

When you have sshed into the GA runner, then you can use -loglevel <n> -logdir <dir> to get some logs (https://dynamorio.org/page_logging.html) or gdb too (you may need to install it there). See https://dynamorio.org/page_debugging.html also for other tips.

@abhinav92003
Copy link
Contributor

@JerryYouxin We have granted the access required for you to run the CI workflows on your own. You should be able to run the workflows by pushing to this branch and also re-run a previously workflow from its page. This should help you try out the tmate debugging strategy that I mentioned in my previous comment.

@JerryYouxin
Copy link
Contributor Author

@JerryYouxin We have granted the access required for you to run the CI workflows on your own. You should be able to run the workflows by pushing to this branch and also re-run a previously workflow from its page. This should help you try out the tmate debugging strategy that I mentioned in my previous comment.

Unfortunately, I still cannot open the tmate session. And It seems that I still cannot run the workflow by myself or by pushing new commits.

@derekbruening
Copy link
Contributor

Unfortunately, I still cannot open the tmate session. And It seems that I still cannot run the workflow by myself or by pushing new commits.

You're listed as a Pending member of Committers: you didn't accept the invitation yet it looks like.

@JerryYouxin
Copy link
Contributor Author

JerryYouxin commented Jul 9, 2021

Given that the test passes everywhere else and only fails in this PR, chances are the failure is due to a local change here. I see this PR changes the .yml config and has lines requesting ml64, so I would think something there is the culprit.

Thanks! The previous environment error is from my changes specified for my local machine and the error is gone when I revert those commits. But the AArch64 Precommit check is still not triggered automatically and I don't know how to trigger this check manually.

@abhinav92003
Copy link
Contributor

run arm tests

@abhinav92003
Copy link
Contributor

For some reason my previous comment didn't invoke Jenkins. @AssadHashmi do we need to add @JerryYouxin to any ACLs?

@JerryYouxin
Copy link
Contributor Author

The arm test is still not running. What should I do to run arm tests manually?

@abhinav92003
Copy link
Contributor

For some reason my previous comment didn't invoke Jenkins. @AssadHashmi do we need to add @JerryYouxin to any ACLs?

To verify if access control is indeed the issue, I created a PR from my forked DR repo: #5004. It does run the AArch64 test suite as expected. Let me check if I can add you to the whitelist on Jenkins myself. I see that other committers are there.

@abhinav92003
Copy link
Contributor

run arm tests

@abhinav92003
Copy link
Contributor

@JerryYouxin I added you to the AArch64 Precommit whitelist. For now I triggered the build manually, but in future it should trigger automatically when you push to a branch.

In the test run code_api|tool.drcachesim.threads-with-config-file failed. I don't think that's related to this PR; it's a known flaky test: #4954. I'll review the PR.

Copy link
Contributor

@abhinav92003 abhinav92003 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pending the unresolved comments.

ext/drx/drx.c Outdated Show resolved Hide resolved
ext/drx/drx.c Outdated Show resolved Hide resolved
ext/drx/drx.c Show resolved Hide resolved
Copy link
Contributor

@abhinav92003 abhinav92003 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for debugging and fixing the issue! Please take a look at the section on Final Commit Message before merging https://dynamorio.org/page_code_reviews.html.

ext/drx/drx.c Outdated Show resolved Hide resolved
@abhinav92003
Copy link
Contributor

You can use "run arm tests" in a separate comment to re-run any flaky CI test suite. If you find that there's no issue associated with some flaky test, please file one.

@JerryYouxin
Copy link
Contributor Author

JerryYouxin commented Jul 13, 2021

It seems that the drreg-test fails in AArch64 CI tests and I cannot access an available local AArch64 machine for further debugging. @abhinav92003 Is there any way to debug this failing issue?

@abhinav92003
Copy link
Contributor

It seems that the drreg-test fails in AArch64 CI tests and I cannot access an available local AArch64 machine for further debugging. @abhinav92003 Is there any way to debug this failing issue?

We're hitting this check: CHECK failed /var/lib/jenkins/workspace/DynamoRIO-AArch64-Precommit/suite/tests/client-interface/drreg-test.dll.c:824: must use the freed up slot

This is because the total number of spill slots available in drreg are now different than before, so drreg is making different decisions about which spill slots to use.

Note that the new call to drx_init requests additional slots from drreg internally. I think we can simply skip the drx_init and drx_exit calls. Do add a comment in dr_init in drreg-test.dll.c that:

We skip drx_init/drx_exit because that will change the number of total drreg spill slots available for the drreg tests. Skipping drx_init is okay because we use only drx_are_aflags_dead which doesn't need any drx state to be initialised.

@JerryYouxin
Copy link
Contributor Author

I have fixed the drreg-test issue by skipping drx_init and drx_exit, but the AArch64 Precommit CI tests still fails in drcache.

====> FAILURE in release-external-64 <====
release-external-64: 47 tests passed, **** 4 tests failed, but ignoring 2 for i#2417: ****
	(ignore: i#2417) 	code_api|tool.drcachesim.phys 
	(ignore: i#2417) 	code_api|tool.drcacheoff.rseq 
	code_api|tool.drcachesim.threads-with-config-file 
	code_api|tool.drcachesim.coherence 

@abhinav92003
Copy link
Contributor

run arm tests

@@ -1154,7 +1154,6 @@ event_exit(void)
drreg_exit() != DRREG_SUCCESS)
CHECK(false, "exit failed");

drx_exit();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the comment suggested on the PR's main thread, about why we need to skip drx_exit and drx_init calls.

@abhinav92003
Copy link
Contributor

Do add a comment in dr_init in drreg-test.dll.c that:

We skip drx_init/drx_exit because that will change the number of total drreg spill slots available for the drreg tests. Skipping drx_init is okay because we use only drx_are_aflags_dead which doesn't need any drx state to be initialised.

I think you forgot to add this comment.

I have fixed the drreg-test issue by skipping drx_init and drx_exit, but the AArch64 Precommit CI tests still fails in drcache.

Both seem timeouts; try re-running by commenting "run arm tests". When I did that, only the code_api|tool.drcachesim.threads-with-config-file failure was left over, which is a known issue #4954. If it persists even after you re-run, we can consider adding it to the ignore list.

@JerryYouxin
Copy link
Contributor Author

run arm tests

1 similar comment
@JerryYouxin
Copy link
Contributor Author

run arm tests

@abhinav92003
Copy link
Contributor

Oh great, it passed this time. I have added a comment to #4954 anyway, noting the consecutive failures.

This PR seems good to go. Please modify the final commit message before merging as described here: https://dynamorio.org/page_code_reviews.html.

@JerryYouxin JerryYouxin merged commit 9822a65 into DynamoRIO:master Jul 15, 2021
@derekbruening
Copy link
Contributor

@JerryYouxin -- please avoid duplicating the title for future merges (the docs https://dynamorio.org/page_code_reviews.html#autotoc_md117 talk about the separate edit box for the title line):

commit 9822a6515e48446de727af3b58108d35d46ae431

    i#4940: Fix the wrong aflags estimation in drreg (#4941)
    
    i#4940: Fix the wrong aflags estimation in drreg (#4941)
    
    Fix the wrong aflags estimation in drreg to make sure the functionality of drreg_reserve_aflags/drreg_unreserve_aflags and drreg_aflags_are_dead for conditional instructions like cmovcc and sbb. Make drx_aflags_are_dead as a wrapper of drreg_aflags_are_dead.
    
    Fixes: #4940

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants