-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REL Fix 21.06
Release Changelog
#272
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[REVIEW] Adding `set_subcomm` and `get_subcomm` to handle
[REVIEW] Clean-up CUDA utilites
[REVIEW] skip raft tests
[REVIEW] Gtest target names on cmake build
[REVIEW] Update CI scripts to remove references to master [skip ci]
The change log check was updated in `branch-0.15` but both `master` and `main` have an older version of the `ci/checks/changelog.sh`. Without the forced checkout this will break and not complete the check successfully.
FIX Update change log check
[gpuCI] Auto-merge branch-0.15 to branch-0.16 [skip ci]
…by cuda10 & cuda11.
[REVIEW] Adding csrgemm2 to cusparse_wrappers.h
[REVIEW] Update docs to remove references to master [skip ci]
[gpuCI] Auto-merge branch-0.15 to branch-0.16 [skip ci]
This push will fix NaNs that might appear on Hellinger. The result of the operation 0 * NaN was Nan, so this fix will correct it by having zero in the sqrt. Authors: - Micka (https://github.com/lowener) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #215
Co-authored-by: Raymond Douglass <[email protected]>
Merge `branch-0.19` into `branch-0.20`
Add a wrapper required by rapidsai/cuml#3827 Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #220
…d ignoring self-loops (#217) Authors: - Corey J. Nolet (https://github.com/cjnolet) - Divye Gala (https://github.com/divyegala) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: #217
Got a bunch of warnings compiling cugraph. This lead to the observation that error.hpp contains 4 headers that don't appear to be necessary. This PR removes those extraneous includes. Authors: - Chuck Hastings (https://github.com/ChuckHastings) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: #224
Update the release script to take a parameter with the new version instead of calculating the new version. Authors: - Ray Douglass (https://github.com/raydouglass) Approvers: - Dillon Cullinan (https://github.com/dillon-cullinan) - AJ Schmidt (https://github.com/ajschmidt8) URL: #226
We recently discovered a memory error in the `devArrMatch()` function: https://gpuci.gpuopenanalytics.com/job/rapidsai/job/gpuci-v0.20/job/cuml/job/prb/job/cuml-gpu-test/CUDA=11.0,GPU_LABEL=gpu,OS=ubuntu16.04,PYTHON=3.7/161/console ``` 13:09:34 [----------] 44 tests from FilTests/TreeliteDenseFilTest 13:09:34 [ RUN ] FilTests/TreeliteDenseFilTest.Import/0 13:09:34 *** Error in `./test/ml': free(): invalid pointer: 0x00007f632b691fe8 *** 13:09:34 ======= Backtrace: ========= 13:09:34 /lib/x86_64-linux-gnu/libc.so.6(+0x777f5)[0x7f632b3447f5] 13:09:34 /lib/x86_64-linux-gnu/libc.so.6(+0x8038a)[0x7f632b34d38a] 13:09:34 /lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f632b35158c] 13:09:34 /workspace/ci/artifacts/cuml/cpu/conda_work/cpp/build/libcuml++.so(_ZN8treelite9ModelImplIffED0Ev+0xf5)[0x7f632c556405] ``` which was traced to the `devArrMatch()` function as follows: ``` $ valgrind ./cpp/build/test/ml --gtest_filter=FilTests/TreeliteDenseFilTest.Import/0 ==6398== Mismatched free() / delete / delete [] ==6398== at 0x483D1CF: operator delete(void*, unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==6398== by 0x209287: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (in /home/ubuntu/cuml/cpp/build/test/ml) ==6398== by 0x2AC253: testing::AssertionResult raft::devArrMatch<float, raft::CompareApprox<float> >(float const*, float const*, unsigned long, raft::CompareApprox<float>, CUstream_st*) (in /home/ubuntu/cuml/cpp/build/test/ml) ==6398== by 0x2AC51C: ML::BaseFilTest::compare() (in /home/ubuntu/cuml/cpp/build/test/ml) ==6398== by 0x4858098D: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x48580BE0: testing::Test::Run() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x48580F0E: testing::TestInfo::Run() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x48581035: testing::TestSuite::Run() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x485815EB: testing::internal::UnitTestImpl::RunAllTests() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x48581858: testing::UnitTest::Run() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x4853007E: main (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest_main.so) ==6398== Address 0x232bfa860 is 0 bytes inside a block of size 160,000 alloc'd ==6398== at 0x483C583: operator new[](unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==6398== by 0x2ABFF8: testing::AssertionResult raft::devArrMatch<float, raft::CompareApprox<float> >(float const*, float const*, unsigned long, raft::CompareApprox<float>, CUstream_st*) (in /home/ubuntu/cuml/cpp/build/test/ml) ==6398== by 0x2AC51C: ML::BaseFilTest::compare() (in /home/ubuntu/cuml/cpp/build/test/ml) ==6398== by 0x4858098D: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x48580BE0: testing::Test::Run() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x48580F0E: testing::TestInfo::Run() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x48581035: testing::TestSuite::Run() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x485815EB: testing::internal::UnitTestImpl::RunAllTests() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x48581858: testing::UnitTest::Run() (in/home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x4853007E: main (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest_main.so) ``` **Diagnosis**. The `devArrMatch` functions are allocating a temporary buffer using `new[]` and then assigning it to a `shared_ptr<T>`. This is not valid because the destructor of `shared_ptr<T>` will invoke `delete`, not `delete[]`. Calling `delete` with a buffer allocated by `new[]` leads to an undefined behavior. See https://docs.microsoft.com/en-us/cpp/code-quality/c6278?view=msvc-160. **Proposed fix**. Use `std:unique_ptr<T[]>` instead to store temporary buffers. Authors: - Philip Hyunsu Cho (https://github.com/hcho3) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #229
PR updates RAFT's CMake and uses CPM for dependency management. Still WIP, meant for early 0.20, but opening the PR to start debugging in CI. This is heavily based on cuDF's, RMM's and cumlprims' changes. PR does the following: - [x] Adopt CPM: - [x] RMM - [x] FAISS - [x] GTest - [ ] NCCL: Missing building from source, will be added if/when required - [ ] UCX: Missing building from source, will be added if/when required - [x] CUB for CUDA < 11.0 - [x] Update arch detection - [x] Use generators to allow clang compilation as well - [x] Updates to outdated cmake parts and remove old code - [x] Update code to aid transition to #83 Authors: - Dante Gama Dessavre (https://github.com/dantegd) - Robert Maynard (https://github.com/robertmaynard) Approvers: - Robert Maynard (https://github.com/robertmaynard) - Rick Ratzel (https://github.com/rlratzel) - Divye Gala (https://github.com/divyegala) URL: #187
Make sure to use CalVer when we clone `rapids-cmake` Authors: - Robert Maynard (https://github.com/robertmaynard) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) - Divye Gala (https://github.com/divyegala) URL: #230
Initialize `rmm::exec_policy` with `rmm::cuda_stream_view` to fix Thrust 1.12 compile errors. Authors: - Paul Taylor (https://github.com/trxcllnt) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: #231
This PR updates the `0.20` references in `CHANGELOG.md` to be `21.06`. Authors: - AJ Schmidt (https://github.com/ajschmidt8) Approvers: - Dillon Cullinan (https://github.com/dillon-cullinan) URL: #233
Authors: - Divye Gala (https://github.com/divyegala) Approvers: - Alex Fender (https://github.com/afender) - Corey J. Nolet (https://github.com/cjnolet) URL: #223
Authors: - Corey J. Nolet (https://github.com/cjnolet) - Divye Gala (https://github.com/divyegala) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: #208
The barrier synchronous communication pattern of the RAFT comms allows the senders and receivers to know ahead of time when a message needs to be initiated. Because of this, we only need to place the rank of the sender in 32-bits of the tag for the receiver and we only use UCX endpoints for sending messages while the receiver uses the `ucx_worker` to receive. This is a little different than the fully asynchronous pattern of ucx-py, where an endpoint is created in the connection listener and its reference held in order to send messages asynchronously at a later time. Still, this endpoint's life cycle is also expected to be managed by the user. We are still not entirely sure why this additional endpoint causes issues under some circumstances and not others - for example, we might never encounter an issue with one configuration, while another configuration may fail every single time (such as a timeout, lockup, or explicit error). @pentschev and I tested this change on his configuration on UCX 1.11 w/ the latest dask/distributed and it appears to fix the hang. I have also tested that it runs on UCX 1.9 successfully. In my tests, I run an MNMG nearest neighbors on 50k rows. Below are the configuration options we used: UCX 1.9 ``` export DASK_UCX__CUDA_COPY=True export DASK_UCX__TCP=True export DASK_UCX__NVLINK=True export DASK_UCX__INFINIBAND=True export DASK_UCX__RDMACM=False export DASK_RMM__POOL_SIZE=0.5GB export DASK_DISTRIBUTED__COMM__TIMEOUTS__CONNECT="100s" export DASK_DISTRIBUTED__COMM__TIMEOUTS__TCP="600s" export DASK_DISTRIBUTED__COMM__RETRY__DELAY__MIN="1s" export DASK_DISTRIBUTED__COMM__RETRY__DELAY__MAX="60s" export DASK_DISTRIBUTED__WORKER__MEMORY__Terminate="False" export DASK_UCX__REUSE_ENDPOINTS=True export UCXPY_IFNAME="ib0" export UCX_NET_DEVICES=all export UCX_MAX_RNDV_RAILS=1 # <-- must be set in the client env too! export DASK_LOGGING__DISTRIBUTED="DEBUG" export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 SCHEDULER_FILE=${SHARED_DIR}/dask-scheduler.json SCHEDULER_ARGS="--protocol ucx --port 8792 --interface ib0 --scheduler-file $SCHEDULER_FILE" WORKER_ARGS="--enable-tcp-over-ucx --enable-nvlink --enable-infiniband --rmm-pool-size=1G --net-devices="ib0" --local-directory /tmp/$LOGNAME --scheduler-file $SCHEDULER_FILE" ``` UCX 1.11 ``` export DASK_UCX__CUDA_COPY=True export DASK_UCX__TCP=True export DASK_UCX__NVLINK=True export DASK_UCX__INFINIBAND=True export DASK_UCX__RDMACM=True export DASK_RMM__POOL_SIZE=0.5GB export DASK_DISTRIBUTED__COMM__TIMEOUTS__CONNECT="100s" export DASK_DISTRIBUTED__COMM__TIMEOUTS__TCP="600s" export DASK_DISTRIBUTED__COMM__RETRY__DELAY__MIN="1s" export DASK_DISTRIBUTED__COMM__RETRY__DELAY__MAX="60s" export DASK_DISTRIBUTED__WORKER__MEMORY__Terminate="False" export DASK_UCX__REUSE_ENDPOINTS=True export UCXPY_IFNAME="ib0" export UCX_MAX_RNDV_RAILS=1 # <-- must be set in the client env too! export DASK_LOGGING__DISTRIBUTED="DEBUG" export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 SCHEDULER_FILE=${SHARED_DIR}/dask-scheduler.json SCHEDULER_ARGS="--protocol ucx --port 8792 --interface ib0 --scheduler-file $SCHEDULER_FILE" WORKER_ARGS="--enable-tcp-over-ucx --enable-nvlink --enable-infiniband --enable-rdmacm --rmm-pool-size=1G --local-directory /tmp/$LOGNAME --scheduler-file $SCHEDULER_FILE" ``` And for the client: UCX 1.9 ``` initialize(enable_tcp_over_ucx=True, enable_nvlink=True, enable_infiniband=True, enable_rdmacm=False, ) ``` UCX 1.11 ``` initialize(enable_tcp_over_ucx=True, enable_nvlink=True, enable_infiniband=True, enable_rdmacm=True, ) ``` Also tagging @rlratzel Authors: - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Divye Gala (https://github.com/divyegala) URL: #241
This PR addresses issues mentioned in #221 -- Adds grid stride based fusedL2NN kernel, this gives approx 1.85x speed up over previous version of this kernel. -- Adds support in pairwise dist base class to work for any input size by adding support for grid stride based work distribution. Authors: - Mahesh Doijade (https://github.com/mdoijade) Approvers: - Thejaswi. N. S (https://github.com/teju85) - Divye Gala (https://github.com/divyegala) - Alex Fender (https://github.com/afender) URL: #232
Update cuco git tag Authors: - Seunghwa Kang (https://github.com/seunghwak) Approvers: - Brad Rees (https://github.com/BradReesWork) - Dante Gama Dessavre (https://github.com/dantegd) URL: #243
After the merge of #232, a few different tests failed in rapidsai/cuml#3891, given the timing I think it'd be best to target 232 (again) to 21.08 after triaging the issues. Authors: - Dante Gama Dessavre (https://github.com/dantegd) Approvers: - Corey J. Nolet (https://github.com/cjnolet) - Divye Gala (https://github.com/divyegala) - Brad Rees (https://github.com/BradReesWork) URL: #246
The previous version unintentionally disallowed floating point value types and cuCollection fixed this in the most recent version. Update git tag accordingly. Authors: - Seunghwa Kang (https://github.com/seunghwak) Approvers: - Brad Rees (https://github.com/BradReesWork) URL: #248
Author: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) - Dante Gama Dessavre (https://github.com/dantegd) URL: #254
dillon-cullinan
approved these changes
Jun 10, 2021
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR merges the latest changelog entries from
branch-21.06
into themain
branch.It looks like
branch-21.06
was never merged intomain
in this repo, so this PR also includes all the latest changes frombranch-21.06
.