Run compute-sanitizer in nightly build #9641

karthikeyann · 2021-11-09T22:01:17Z

Addresses part of #904

This PR enables run of compute-sanitizer --tool memcheck on libcudf unit tests when env COMPUTE_SANITIZER_ENABLE=true
This env COMPUTE_SANITIZER_ENABLE will be enabled only in nightly builds of cudf. (To be Enabled in PR https://github.com/rapidsai/gpuci-scripts/pull/675)
This PR also adds script to parse compute-sanitizer log to junit xml file which can be processed by Jenkins.
Reports only failures. If no errors, no tests are reported under memcheck results.

Note: Only memcheck is enabled now. when required, other checks of compute-sanitizer could be enabled later.

davidwendt · 2021-11-09T23:08:17Z

Seems like this should not be going into 21.12 since we are well into burndown.

codecov · 2021-11-09T23:20:23Z

Codecov Report

Merging #9641 (2a7f20d) into branch-22.02 (967a333) will decrease coverage by 0.01%.
The diff coverage is 0.00%.

@@               Coverage Diff                @@
##           branch-22.02    #9641      +/-   ##
================================================
- Coverage         10.49%   10.47%   -0.02%     
================================================
  Files               119      119              
  Lines             20305    20343      +38     
================================================
  Hits               2130     2130              
- Misses            18175    18213      +38

Impacted Files	Coverage Δ
python/cudf/cudf/core/column/column.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/frame.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/index.py	`0.00% <ø> (ø)`
python/cudf/cudf/core/indexed_frame.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/multiindex.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/utils/utils.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/__init__.py	`0.00% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d1811b5...2a7f20d. Read the comment docs.

karthikeyann · 2021-11-10T07:15:39Z

rerun tests

karthikeyann · 2021-11-11T09:30:09Z

Preferably merge to 21.12 so that memchecks could run in nightly builds for previous branches as well.
Not enabled in gpuci nightly build yet. So, it's not be enabled in PR CI runs.

davidwendt · 2021-11-11T13:01:01Z

Preferably merge to 21.12 so that memchecks could run in nightly builds for previous branches as well. Not enabled in gpuci nightly build yet. So, it's not be enabled in PR CI runs.

The concern is even the faint possibly of breaking the build at a time when the build is critical (and build team over engaged) in the release cycle.

mythrocks

A couple of nitpicks.

Bigger picture: I was wondering why we're converting to the junit format.

ci/gpu/build.sh

cpp/scripts/compute-sanitizer-to-junit-xml.py

mythrocks · 2021-11-11T20:15:26Z

cpp/scripts/compute-sanitizer-to-junit-xml.py

+                    testcase_name = ""
+                else:
+                    pass
+                    # raise Exception('unexpected line in compute-sanitizer log: '+line)


Hmm. If none of the above, should the tool pass?
I think I can see why we'd leave this comment in. Might be useful to uncomment for debugging.

Yes. The log may contain stdout from googletest or pytest which tool will ignore (pytest is not memchecked yet, but will also be too large to add, probably not nightly, may be weekly).
This is added for debug for analyzing only memcheck log. (using --log-file path)

karthikeyann · 2021-11-15T02:48:30Z

Bigger picture: I was wondering why we're converting to the junit format.

so that Jenkins can display it similar to how gtest results are shown. Jenkins Junit plugin expects the results in XML.
https://docs.rapids.ai/maintainers/gpuci/#cigpubuildsh

Besides, compute-sanitizer 11.6 has support for xml which could be used directly for reporting memcheck in future. (still need translation at Jenkins Junit plugin)

Whichever is most convenient method to review memcheck results can be used. Is there any preferred way of report for compute-sanitizer results? (should not limited to memcheck but also leakcheck, racecheck, initcheck too).

…tizer-in-build

karthikeyann · 2021-11-15T06:20:56Z

Thank you for the reviews @mythrocks and @davidwendt

…E value - tested locally)

karthikeyann · 2021-11-16T15:19:51Z

Tested these environment variables in local docker image after simplifying the PR https://github.com/rapidsai/gpuci-scripts/pull/675

karthikeyann · 2021-11-16T19:00:59Z

rerun tests

karthikeyann · 2021-11-18T16:40:54Z

rerun tests

karthikeyann · 2021-11-18T19:59:47Z

rerun tests

…tizer-in-build

karthikeyann · 2021-11-19T10:14:12Z

rerun tests

karthikeyann · 2021-11-19T16:13:37Z

rerun tests

karthikeyann · 2021-11-19T20:54:53Z

memcheck failures reported example.
https://gpuci.gpuopenanalytics.com/job/rapidsai/job/gpuci/job/cudf/job/prb/job/cudf-gpu-test/CUDA=11.5,GPU_LABEL=cuda115,LINUX_VER=centos7,PYTHON=3.8/5082/

…tizer-in-build

ajschmidt8

Approving ops-codeowner file changes

karthikeyann · 2021-11-30T11:21:39Z

rerun tests

karthikeyann · 2021-11-30T18:09:11Z

@gpucibot merge

While working on #9641 I noticed that building the iterator gtests takes alot of time in CI. Here is a link to the individual build times for libcudf including the gtests: https://gpuci.gpuopenanalytics.com/job/rapidsai/job/gpuci/job/cudf/job/prb/job/cudf-gpu-test/CUDA=11.5,GPU_LABEL=driver-495,LINUX_VER=ubuntu20.04,PYTHON=3.8/5173/testReport/(root)/BuildTime/ (you can sort by Duration by clicking on table colum header). Here is a table of the top 20 compile time offenders as recorded on my local machine. Note that like the CI build output, 6 of the top 20 are just building the `ITERATOR_TEST` | rank | time (ms) | file | | ---:| ---:|:--- | | 1 | 814334 | /cudf.dir/src/search/search.cu.o | 2 | 755375 | /cudf.dir/src/sort/sort_column.cu.o | 3 | 686235 | /ITERATOR_TEST.dir/iterator/optional_iterator_test_numeric.cu.o | 4 | 670587 | /cudf.dir/src/groupby/sort/group_nunique.cu.o | 5 | 585524 | /cudf.dir/src/reductions/scan/scan_inclusive.cu.o | 6 | 582677 | /ITERATOR_TEST.dir/iterator/pair_iterator_test_numeric.cu.o | 7 | 568418 | /ITERATOR_TEST.dir/iterator/scalar_iterator_test.cu.o | 8 | 563196 | /cudf.dir/src/sort/sort.cu.o | 9 | 548816 | /ITERATOR_TEST.dir/iterator/value_iterator_test_numeric.cu.o | 10 | 535315 | /cudf.dir/src/groupby/sort/sort_helper.cu.o | 11 | 531384 | /cudf.dir/src/sort/is_sorted.cu.o | 12 | 530382 | /ITERATOR_TEST.dir/iterator/value_iterator_test_chrono.cu.o | 13 | 525187 | /cudf.dir/src/join/semi_join.cu.o | 14 | 523726 | /cudf.dir/src/rolling/rolling.cu.o | 15 | 517909 | /cudf.dir/src/reductions/product.cu.o | 16 | 513119 | /cudf.dir/src/stream_compaction/distinct_count.cu.o | 17 | 512569 | /ITERATOR_TEST.dir/iterator/optional_iterator_test_chrono.cu.o | 18 | 508978 | /cudf.dir/src/reductions/sum_of_squares.cu.o | 19 | 508460 | /cudf.dir/src/lists/drop_list_duplicates.cu.o | 20 | 505247 | /cudf.dir/src/reductions/sum.cu.o I made some simple changes to the iterator code logic to use different thrust functions along with a temporary device vector. This approach improved the compile time of the `ITERATOR_TEST` by about 3x. Here are the results of compiling the above 6 files with the changes in this PR. | new rank | new time (ms) | file | | ---:| ---:|:--- | | 59 | 232691 (2.9x) | optional_iterator_test_numeric.cu.o | | 26 | 416951 (1.4x) | pair_iterator_test_numeric.cu.o | | 92 | 165947 (3.4x) | scalar_iterator_test.cu.o | | 65 | 216364 (2.5x) | value_iterator_test_numeric.cu.o | | 77 | 186583 (2.8x) | value_iterator_test_chrono.cu.o | | 111 | 137789 (3.7x) | optional_iterator_test_chrono.cu.o | Total overall build time improved locally by ~3m (10%) using `ninja -j48 install` on a Dell 5820. Here are the build time results of a CI build with these changes. https://gpuci.gpuopenanalytics.com/job/rapidsai/job/gpuci/job/cudf/job/prb/job/cudf-gpu-test/CUDA=11.5,GPU_LABEL=driver-495,LINUX_VER=ubuntu20.04,PYTHON=3.8/5190/testReport/(root)/BuildTime/ Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Nghia Truong (https://github.com/ttnghia) - Devavret Makkar (https://github.com/devavret) URL: #9788

davidwendt · 2021-12-01T16:02:13Z

ci/gpu/build.sh

+                  continue
+                fi
+                echo "Running GoogleTest $test_name"
+                ${COMPUTE_SANITIZER_CMD} ${gt} | tee "$WORKSPACE/test-results/${test_name}.cs.log"


Could this have been the following:

${COMPUTE_SANITIZER_CMD} ${gt} --rmm_mode=cuda | tee "$WORKSPACE/test-results/${test_name}.cs.log"

And then you would not need to set and unset the GTEST_CUDF_RMM_MODE environment variable?

We could use either one for google tests. Initially we wanted to use ctest and cdash to run and report memcheck results. It was not possible to add arguments only for memcheck in ctest. So, we went with environmental variable. Since we want to report to Jenkins, we didn't use ctest/cdash
Hoping to use the environmental variable for both gtests and pytests. (variable name might change)

We should use the cuda async mode for RMM. That will help the tests run faster by using the CUDA pool but still have memcheck support.

karthikeyann added 2 commits November 10, 2021 03:10

add compute-sanitizer-to-junit-xml.py converter

f5bc277

run compute-sanitizer in ci under env COMPUTE_SANITIZER_ENABLE

dbedae9

karthikeyann added feature request New feature or request 3 - Ready for Review Ready for review by team gpuCI non-breaking Non-breaking change labels Nov 9, 2021

karthikeyann requested review from a team as code owners November 9, 2021 22:01

karthikeyann requested review from mythrocks and codereport November 9, 2021 22:01

github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Nov 9, 2021

mythrocks approved these changes Nov 11, 2021

View reviewed changes

review comments

e987897

karthikeyann changed the base branch from branch-21.12 to branch-22.02 November 15, 2021 03:36

Merge branch 'branch-22.02' of github.com:rapidsai/cudf into fea-sani…

5012e7e

…tizer-in-build

nightly check BUILD_MODE=branch, BUILD_TYPE=gpu

f1a31b4

karthikeyann added the 5 - DO NOT MERGE Hold off on merging; see PR for details label Nov 16, 2021

change expected value to true (as per jenkins COMPUTE_SANITIZER_ENABL…

0e34fff

…E value - tested locally)

karthikeyann removed the 5 - DO NOT MERGE Hold off on merging; see PR for details label Nov 16, 2021

karthikeyann added the 5 - DO NOT MERGE Hold off on merging; see PR for details label Nov 16, 2021

test commit to see memcheck reports in jenkins

dbc2750

karthikeyann added 4 commits November 17, 2021 13:19

fix summing error count

7635953

skip ERROR_TEST, remove writing test xml output again.

5d6df02

skip failure of error mismatch (just to unblock)

c6c1ddc

fix skipping ERROR_TEST, update reporting hierarchy

ac0fc75

Merge branch 'branch-22.02' of github.com:rapidsai/cudf into fea-sani…

b30637a

…tizer-in-build

karthikeyann added 4 commits November 20, 2021 02:25

remove debug comments, enable only in nightly build

18b0042

Merge branch 'branch-22.02' of github.com:rapidsai/cudf into fea-sani…

b8812bc

…tizer-in-build

move to ci utils

ec8789c

cs.log report parsing moved to gpuci

2a7f20d

github-actions bot removed the libcudf Affects libcudf (C++/CUDA) code. label Nov 24, 2021

karthikeyann removed the 5 - DO NOT MERGE Hold off on merging; see PR for details label Nov 24, 2021

ajschmidt8 approved these changes Nov 29, 2021

View reviewed changes

davidwendt mentioned this pull request Nov 29, 2021

Improve build time of libcudf iterator tests #9788

Merged

karthikeyann marked this pull request as ready for review November 30, 2021 04:40

rapids-bot bot merged commit 1697f63 into rapidsai:branch-22.02 Nov 30, 2021

davidwendt reviewed Dec 1, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run compute-sanitizer in nightly build #9641

Run compute-sanitizer in nightly build #9641

karthikeyann commented Nov 9, 2021 •

edited

Loading

davidwendt commented Nov 9, 2021

codecov bot commented Nov 9, 2021 •

edited

Loading

karthikeyann commented Nov 10, 2021

karthikeyann commented Nov 11, 2021 •

edited

Loading

davidwendt commented Nov 11, 2021

mythrocks left a comment

mythrocks Nov 11, 2021

karthikeyann Nov 15, 2021

karthikeyann commented Nov 15, 2021 •

edited

Loading

karthikeyann commented Nov 15, 2021

karthikeyann commented Nov 16, 2021

karthikeyann commented Nov 16, 2021

karthikeyann commented Nov 18, 2021

karthikeyann commented Nov 18, 2021

karthikeyann commented Nov 19, 2021

karthikeyann commented Nov 19, 2021

karthikeyann commented Nov 19, 2021

ajschmidt8 left a comment

karthikeyann commented Nov 30, 2021

karthikeyann commented Nov 30, 2021

davidwendt Dec 1, 2021

karthikeyann Dec 1, 2021 •

edited

Loading

jrhemstad Dec 1, 2021

Run compute-sanitizer in nightly build #9641

Run compute-sanitizer in nightly build #9641

Conversation

karthikeyann commented Nov 9, 2021 • edited Loading

davidwendt commented Nov 9, 2021

codecov bot commented Nov 9, 2021 • edited Loading

Codecov Report

karthikeyann commented Nov 10, 2021

karthikeyann commented Nov 11, 2021 • edited Loading

davidwendt commented Nov 11, 2021

mythrocks left a comment

Choose a reason for hiding this comment

mythrocks Nov 11, 2021

Choose a reason for hiding this comment

karthikeyann Nov 15, 2021

Choose a reason for hiding this comment

karthikeyann commented Nov 15, 2021 • edited Loading

karthikeyann commented Nov 15, 2021

karthikeyann commented Nov 16, 2021

karthikeyann commented Nov 16, 2021

karthikeyann commented Nov 18, 2021

karthikeyann commented Nov 18, 2021

karthikeyann commented Nov 19, 2021

karthikeyann commented Nov 19, 2021

karthikeyann commented Nov 19, 2021

ajschmidt8 left a comment

Choose a reason for hiding this comment

karthikeyann commented Nov 30, 2021

karthikeyann commented Nov 30, 2021

davidwendt Dec 1, 2021

Choose a reason for hiding this comment

karthikeyann Dec 1, 2021 • edited Loading

Choose a reason for hiding this comment

jrhemstad Dec 1, 2021

Choose a reason for hiding this comment

karthikeyann commented Nov 9, 2021 •

edited

Loading

codecov bot commented Nov 9, 2021 •

edited

Loading

karthikeyann commented Nov 11, 2021 •

edited

Loading

karthikeyann commented Nov 15, 2021 •

edited

Loading

karthikeyann Dec 1, 2021 •

edited

Loading