Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix compute sanitizer in GitHub Actions CI #12530

Closed
wants to merge 2 commits into from

Conversation

bdice
Copy link
Contributor

@bdice bdice commented Jan 10, 2023

Description

@davidwendt noticed that the nightly builds were passing unexpectedly on GitHub Actions, after they had been regularly failing on a compute-sanitizer --tool memcheck step in gpuCI.

https://github.com/rapidsai/cudf/actions/runs/3880698576/jobs/6618987835#step:6:66543

After some investigation, I found the job was failing -- but for the wrong reason, and silently.

RAPIDS logger » [01/10/23 06:24:39]
┌──────────────────────────────────────────┐
|    Memcheck gtests with rmm_mode=cuda    |
└──────────────────────────────────────────┘

Running gtest ARROW_IO_SOURCE_TEST
ci/test_cpp.sh: line 95: compute-sanitizer: command not found
...

In this PR, I add the cuda-sanitizer-api package to our C++ test dependencies, which should fix the nightly tests (so they fail as expected). I also fixed a script bug that let the tests pass silently despite having failures.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@bdice bdice requested a review from a team as a code owner January 10, 2023 23:39
@bdice bdice requested a review from davidwendt January 10, 2023 23:39
@bdice bdice self-assigned this Jan 10, 2023
@bdice bdice added bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change labels Jan 10, 2023
Copy link
Contributor

@davidwendt davidwendt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make this only run on one configuration? Say just one of the 11.8 ones perhaps?
conda-cpp-tests / tests (11.8.0, ubuntu20.04, arm64, 3.10, a100, 520)

@davidwendt
Copy link
Contributor

According to the Jenkins logs for nightly runs, the compute-sanitizer step takes over 3 hours to run so it would be good to only run it for a single configuration and not all of them.

@codecov
Copy link

codecov bot commented Jan 11, 2023

Codecov Report

❗ No coverage uploaded for pull request base (branch-23.04@8d8d0ee). Click here to learn what that means.
Patch has no changes to coverable lines.

Additional details and impacted files
@@               Coverage Diff               @@
##             branch-23.04   #12530   +/-   ##
===============================================
  Coverage                ?   85.65%           
===============================================
  Files                   ?      155           
  Lines                   ?    24810           
  Branches                ?        0           
===============================================
  Hits                    ?    21251           
  Misses                  ?     3559           
  Partials                ?        0           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@bdice
Copy link
Contributor Author

bdice commented Jan 11, 2023

Thanks for that info @davidwendt, I didn't realize it took so long. Good idea to split it up and limit the configurations.

When I get a chance (may not be until next week) I will split the compute-sanitizer portion of the C++ test script into a separate file, add it to a separate workflow job using a custom script path (rather than having it be part of the C++ matrix tests), and have this launch only on one runner configuration, which we label gpu-latest-1. We also run Java tests and notebooks tests on gpu-latest-1 using a custom script, because they only need one configuration.

@bdice bdice changed the base branch from branch-23.02 to branch-23.04 January 26, 2023 16:57
@wence-
Copy link
Contributor

wence- commented Feb 1, 2023

Just to note that I think we also want to, if we can, run with compute-sanitizer --tool initcheck (I think this is even slower than memcheck so running only on a single configuration is also necessary here)

@davidwendt
Copy link
Contributor

Just to note that I think we also want to, if we can, run with compute-sanitizer --tool initcheck (I think this is even slower than memcheck so running only on a single configuration is also necessary here)

I would not recommend adding initcheck since I think it cannot reliably verify usage of uninitialized memory.
#12667 (comment)

rapids-bot bot pushed a commit that referenced this pull request Feb 23, 2023
Adds github workflow action to the nightly tests for running `compute-sanitizer` on the libcudf gtests.

Reference: #12530

Authors:
  - David Wendt (https://github.com/davidwendt)
  - AJ Schmidt (https://github.com/ajschmidt8)

Approvers:
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Bradley Dice (https://github.com/bdice)

URL: #12800
@bdice
Copy link
Contributor Author

bdice commented Mar 2, 2023

Closing in favor of #12800 (already merged). Thanks @davidwendt for picking this up.

@bdice bdice closed this Mar 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants