Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix memory error in devArrMatch (#229)
We recently discovered a memory error in the `devArrMatch()` function: https://gpuci.gpuopenanalytics.com/job/rapidsai/job/gpuci-v0.20/job/cuml/job/prb/job/cuml-gpu-test/CUDA=11.0,GPU_LABEL=gpu,OS=ubuntu16.04,PYTHON=3.7/161/console ``` 13:09:34 [----------] 44 tests from FilTests/TreeliteDenseFilTest 13:09:34 [ RUN ] FilTests/TreeliteDenseFilTest.Import/0 13:09:34 *** Error in `./test/ml': free(): invalid pointer: 0x00007f632b691fe8 *** 13:09:34 ======= Backtrace: ========= 13:09:34 /lib/x86_64-linux-gnu/libc.so.6(+0x777f5)[0x7f632b3447f5] 13:09:34 /lib/x86_64-linux-gnu/libc.so.6(+0x8038a)[0x7f632b34d38a] 13:09:34 /lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f632b35158c] 13:09:34 /workspace/ci/artifacts/cuml/cpu/conda_work/cpp/build/libcuml++.so(_ZN8treelite9ModelImplIffED0Ev+0xf5)[0x7f632c556405] ``` which was traced to the `devArrMatch()` function as follows: ``` $ valgrind ./cpp/build/test/ml --gtest_filter=FilTests/TreeliteDenseFilTest.Import/0 ==6398== Mismatched free() / delete / delete [] ==6398== at 0x483D1CF: operator delete(void*, unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==6398== by 0x209287: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (in /home/ubuntu/cuml/cpp/build/test/ml) ==6398== by 0x2AC253: testing::AssertionResult raft::devArrMatch<float, raft::CompareApprox<float> >(float const*, float const*, unsigned long, raft::CompareApprox<float>, CUstream_st*) (in /home/ubuntu/cuml/cpp/build/test/ml) ==6398== by 0x2AC51C: ML::BaseFilTest::compare() (in /home/ubuntu/cuml/cpp/build/test/ml) ==6398== by 0x4858098D: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x48580BE0: testing::Test::Run() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x48580F0E: testing::TestInfo::Run() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x48581035: testing::TestSuite::Run() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x485815EB: testing::internal::UnitTestImpl::RunAllTests() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x48581858: testing::UnitTest::Run() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x4853007E: main (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest_main.so) ==6398== Address 0x232bfa860 is 0 bytes inside a block of size 160,000 alloc'd ==6398== at 0x483C583: operator new[](unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==6398== by 0x2ABFF8: testing::AssertionResult raft::devArrMatch<float, raft::CompareApprox<float> >(float const*, float const*, unsigned long, raft::CompareApprox<float>, CUstream_st*) (in /home/ubuntu/cuml/cpp/build/test/ml) ==6398== by 0x2AC51C: ML::BaseFilTest::compare() (in /home/ubuntu/cuml/cpp/build/test/ml) ==6398== by 0x4858098D: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x48580BE0: testing::Test::Run() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x48580F0E: testing::TestInfo::Run() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x48581035: testing::TestSuite::Run() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x485815EB: testing::internal::UnitTestImpl::RunAllTests() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x48581858: testing::UnitTest::Run() (in/home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x4853007E: main (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest_main.so) ``` **Diagnosis**. The `devArrMatch` functions are allocating a temporary buffer using `new[]` and then assigning it to a `shared_ptr<T>`. This is not valid because the destructor of `shared_ptr<T>` will invoke `delete`, not `delete[]`. Calling `delete` with a buffer allocated by `new[]` leads to an undefined behavior. See https://docs.microsoft.com/en-us/cpp/code-quality/c6278?view=msvc-160. **Proposed fix**. Use `std:unique_ptr<T[]>` instead to store temporary buffers. Authors: - Philip Hyunsu Cho (https://github.com/hcho3) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #229
- Loading branch information