You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Recently, several errors have been reported by @harrism, @nvdbaranec, and @KyleFromNVIDIA where STATISTICS_TEST and TRACKING_TEST have failed with out-of-memory errors.
[ RUN ] StatisticsTest.AllFreed
unknown file: Failure
C++ exception with description "std::bad_alloc: out_of_memory: CUDA error at: /opt/conda/conda-bld/work/include/rmm/mr/device/cuda_memory_resource.hpp:60: cudaErrorMemoryAllocation out of memory" thrown in the test body.
[ FAILED ] StatisticsTest.AllFreed (2340 ms)
[ RUN ] StatisticsTest.PeakAllocations
unknown file: Failure
C++ exception with description "std::bad_alloc: out_of_memory: CUDA error at: /opt/conda/conda-bld/work/include/rmm/mr/device/cuda_memory_resource.hpp:60: cudaErrorMemoryAllocation out of memory" thrown in the test body.
[ FAILED ] StatisticsTest.PeakAllocations (16 ms)
[ RUN ] StatisticsTest.PeakAllocations
unknown file: Failure
C++ exception with description "std::bad_alloc: out_of_memory: CUDA error at: /opt/conda/conda-bld/work/include/rmm/mr/device/cuda_memory_resource.hpp:60: cudaErrorMemoryAllocation out of memory" thrown in the test body.
[ FAILED ] StatisticsTest.PeakAllocations (1115 ms)
[ RUN ] StatisticsTest.MultiTracking
unknown file: Failure
C++ exception with description "std::bad_alloc: out_of_memory: CUDA error at: /opt/conda/conda-bld/work/include/rmm/mr/device/cuda_memory_resource.hpp:60: cudaErrorMemoryAllocation out of memory" thrown in the test body.
[ FAILED ] StatisticsTest.MultiTracking (1 ms)
[ RUN ] TrackingTest.AllFreed
unknown file: Failure
C++ exception with description "std::bad_alloc: out_of_memory: CUDA error at: /opt/conda/conda-bld/work/include/rmm/mr/device/cuda_memory_resource.hpp:60: cudaErrorMemoryAllocation out of memory" thrown in the test body.
[ RUN ] TrackingTest.AllFreed
unknown file: Failure
C++ exception with description "std::bad_alloc: out_of_memory: CUDA error at: /opt/conda/conda-bld/work/include/rmm/mr/device/cuda_memory_resource.hpp:60: cudaErrorMemoryAllocation out of memory" thrown in the test body.
[ FAILED ] TrackingTest.AllFreed (4556 ms)
[ RUN ] TrackingTest.AllocationsLeftWithStacks
unknown file: Failure
C++ exception with description "std::bad_alloc: out_of_memory: CUDA error at: /opt/conda/conda-bld/work/include/rmm/mr/device/cuda_memory_resource.hpp:60: cudaErrorMemoryAllocation out of memory" thrown in the test body.
[ FAILED ] TrackingTest.AllocationsLeftWithStacks (1 ms)
Across the logs I saw, the list of failing tests includes:
Describe the bug
Recently, several errors have been reported by @harrism, @nvdbaranec, and @KyleFromNVIDIA where
STATISTICS_TEST
andTRACKING_TEST
have failed with out-of-memory errors.I am including several examples copied from the logs.
https://github.com/rapidsai/rmm/actions/runs/8054518004/job/21999601232?pr=1479
https://github.com/rapidsai/rmm/actions/runs/8056918096/job/22007169303?pr=1469
Across the logs I saw, the list of failing tests includes:
Expected behavior
No OOM errors in the test suite.
Additional context
I will open a PR to serialize the execution of these tests.
The text was updated successfully, but these errors were encountered: