Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] nightly ai.rapids.cudf.ReductionTest failed in cuda12 ENV after enable sanitizer #9052

Closed
pxLi opened this issue Aug 16, 2023 · 2 comments
Labels
duplicate This issue or pull request already exists

Comments

@pxLi
Copy link
Collaborator

pxLi commented Aug 16, 2023

Describe the bug
The same tests with sanitizer passed correctly in cuda 11, but not cuda 12

pipeline: spark-rapids-jni_nightly-dev, build ID:512

attached sanitizer log: sanitizer_for_pid_20785.log

[2023-08-16T04:25:35.253Z] [INFO] -------------------------------------------------------
[2023-08-16T04:25:35.253Z] [INFO]  T E S T S
[2023-08-16T04:25:35.253Z] [INFO] -------------------------------------------------------
[2023-08-16T04:25:37.150Z] [INFO] Running ai.rapids.cudf.Aggregation128UtilsTest
[2023-08-16T04:25:49.342Z] [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 10.72 s - in ai.rapids.cudf.Aggregation128UtilsTest
[2023-08-16T04:25:49.342Z] [INFO] Running ai.rapids.cudf.ArrowColumnVectorTest
[2023-08-16T04:25:49.342Z] [INFO] Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.245 s - in ai.rapids.cudf.ArrowColumnVectorTest
[2023-08-16T04:25:49.342Z] [INFO] Running ai.rapids.cudf.BinaryOpTest
[2023-08-16T04:25:54.601Z] [INFO] Tests run: 45, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.746 s - in ai.rapids.cudf.BinaryOpTest
[2023-08-16T04:25:54.601Z] [INFO] Running ai.rapids.cudf.ByteColumnVectorTest
[2023-08-16T04:25:54.860Z] [INFO] Tests run: 11, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.882 s - in ai.rapids.cudf.ByteColumnVectorTest
[2023-08-16T04:25:54.860Z] [INFO] Running ai.rapids.cudf.ColumnVectorTest
[2023-08-16T04:26:41.506Z] [WARNING] Tests run: 316, Failures: 0, Errors: 0, Skipped: 2, Time elapsed: 41.466 s - in ai.rapids.cudf.ColumnVectorTest
[2023-08-16T04:26:41.507Z] [INFO] Running ai.rapids.cudf.CudaTest
[2023-08-16T04:26:41.507Z] [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 s - in ai.rapids.cudf.CudaTest
[2023-08-16T04:26:41.507Z] [INFO] Running ai.rapids.cudf.Date32ColumnVectorTest
[2023-08-16T04:26:41.507Z] [INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.004 s - in ai.rapids.cudf.Date32ColumnVectorTest
[2023-08-16T04:26:41.507Z] [INFO] Running ai.rapids.cudf.Date64ColumnVectorTest
[2023-08-16T04:26:41.507Z] [INFO] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.024 s - in ai.rapids.cudf.Date64ColumnVectorTest
[2023-08-16T04:26:41.507Z] [INFO] Running ai.rapids.cudf.DecimalColumnVectorTest
[2023-08-16T04:26:41.507Z] [INFO] Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.127 s - in ai.rapids.cudf.DecimalColumnVectorTest
[2023-08-16T04:26:41.507Z] [INFO] Running ai.rapids.cudf.DoubleColumnVectorTest
[2023-08-16T04:26:41.507Z] [INFO] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.62 s - in ai.rapids.cudf.DoubleColumnVectorTest
[2023-08-16T04:26:41.507Z] [INFO] Running ai.rapids.cudf.FloatColumnVectorTest
[2023-08-16T04:26:41.507Z] [INFO] Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.602 s - in ai.rapids.cudf.FloatColumnVectorTest
[2023-08-16T04:26:41.507Z] [INFO] Running ai.rapids.cudf.GatherMapTest
[2023-08-16T04:26:41.507Z] [INFO] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.238 s - in ai.rapids.cudf.GatherMapTest
[2023-08-16T04:26:41.507Z] [INFO] Running ai.rapids.cudf.HashJoinTest
[2023-08-16T04:26:41.507Z] [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.037 s - in ai.rapids.cudf.HashJoinTest
[2023-08-16T04:26:41.507Z] [INFO] Running ai.rapids.cudf.HostMemoryBufferTest
[2023-08-16T04:26:41.507Z] [WARNING] Tests run: 14, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0.144 s - in ai.rapids.cudf.HostMemoryBufferTest
[2023-08-16T04:26:41.507Z] [INFO] Running ai.rapids.cudf.IfElseTest
[2023-08-16T04:26:41.507Z] [INFO] Tests run: 110, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.336 s - in ai.rapids.cudf.IfElseTest
[2023-08-16T04:26:41.507Z] [INFO] Running ai.rapids.cudf.IntColumnVectorTest
[2023-08-16T04:26:42.440Z] [INFO] Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.615 s - in ai.rapids.cudf.IntColumnVectorTest
[2023-08-16T04:26:42.440Z] [INFO] Running ai.rapids.cudf.LongColumnVectorTest
[2023-08-16T04:26:43.006Z] [INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.613 s - in ai.rapids.cudf.LongColumnVectorTest
[2023-08-16T04:26:43.006Z] [INFO] Running ai.rapids.cudf.MemoryBufferTest
[2023-08-16T04:26:43.006Z] [INFO] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 s - in ai.rapids.cudf.MemoryBufferTest
[2023-08-16T04:26:43.006Z] [INFO] Running ai.rapids.cudf.NvtxTest
[2023-08-16T04:26:43.006Z] [INFO] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 s - in ai.rapids.cudf.NvtxTest
[2023-08-16T04:26:43.006Z] [INFO] Running ai.rapids.cudf.PinnedMemoryPoolTest
[2023-08-16T04:26:43.265Z] [INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.342 s - in ai.rapids.cudf.PinnedMemoryPoolTest
[2023-08-16T04:26:43.265Z] [INFO] Running ai.rapids.cudf.ReductionTest
[2023-08-16T04:26:51.378Z] [ERROR] Tests run: 130, Failures: 0, Errors: 90, Skipped: 0, Time elapsed: 8.092 s <<< FAILURE! - in ai.rapids.cudf.ReductionTest
[2023-08-16T04:26:51.378Z] [ERROR] testShort{ReductionAggregation, Short[], DataType, Object, Double}[4]  Time elapsed: 3.761 s  <<< ERROR!
[2023-08-16T04:26:51.378Z] ai.rapids.cudf.CudaFatalException: CUDA error at: /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-512-cuda12/thirdparty/cudf/cpp/build/_deps/rmm-src/include/rmm/device_uvector.hpp:316: cudaErrorLaunchFailure unspecified launch failure
[2023-08-16T04:26:51.378Z] ai.rapids.cudf.CudaFatalException: CUDA error at: /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-512-cuda12/thirdparty/cudf/cpp/build/_deps/rmm-src/include/rmm/device_uvector.hpp:316: cudaErrorLaunchFailure unspecified launch failure
[2023-08-16T04:26:51.378Z] 	at ai.rapids.cudf.Scalar.isScalarValid(Native Method)
[2023-08-16T04:26:51.378Z] 	at ai.rapids.cudf.Scalar.isValid(Scalar.java:568)
[2023-08-16T04:26:51.378Z] 	at ai.rapids.cudf.Scalar.equals(Scalar.java:707)
[2023-08-16T04:26:51.378Z] 	at org.junit.jupiter.api.AssertionUtils.objectsAreEqual(AssertionUtils.java:193)
[2023-08-16T04:26:51.378Z] 	at org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:181)
[2023-08-16T04:26:51.378Z] 	at org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:177)
[2023-08-16T04:26:51.378Z] 	at org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:1141)
[2023-08-16T04:26:51.378Z] 	at ai.rapids.cudf.ReductionTest.assertEqualsDelta(ReductionTest.java:413)
[2023-08-16T04:26:51.378Z] 	at ai.rapids.cudf.ReductionTest.testShort(ReductionTest.java:462)
[2023-08-16T04:26:51.378Z] 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[2023-08-16T04:26:51.378Z] 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[2023-08-16T04:26:51.378Z] 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[2023-08-16T04:26:51.378Z] 	at java.lang.reflect.Method.invoke(Method.java:498)
[2023-08-16T04:26:51.378Z] 	at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725)
[2023-08-16T04:26:51.378Z] 	at org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
[2023-08-16T04:26:51.378Z] 	at org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
[2023-08-16T04:26:51.378Z] 	at org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
[2023-08-16T04:26:51.378Z] 	at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
[2023-08-16T04:26:51.378Z] 	at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestTemplateMethod(TimeoutExtension.java:92)
[2023-08-16T04:26:51.378Z] 	at org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
[2023-08-16T04:26:51.378Z] 	at org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
[2023-08-16T04:26:51.378Z] 	at org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
[2023-08-16T04:26:51.378Z] 	at org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
[2023-08-16T04:26:51.378Z] 	at org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
[2023-08-16T04:26:51.378Z] 	at org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
[2023-08-16T04:26:51.378Z] 	at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
[2023-08-16T04:26:51.378Z] 	at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
[2023-08-16T04:26:51.378Z] 	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:214)
[2023-08-16T04:26:51.378Z] 	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
[2023-08-16T04:26:51.378Z] 	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:210)
[2023-08-16T04:26:51.378Z] 	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:135)
[2023-08-16T04:26:51.378Z] 	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:66)
[2023-08-16T04:26:51.378Z] 	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151)
[2023-08-16T04:26:51.378Z] 	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
[2023-08-16T04:26:51.379Z] 	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
[2023-08-16T04:26:51.379Z] 	at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
[2023-08-16T04:26:51.379Z] 	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
[2023-08-16T04:26:51.379Z] 	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
[2023-08-16T04:26:51.379Z] 	at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
[2023-08-16T04:26:51.379Z] 	at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95)
[2023-08-16T04:26:51.379Z] 	at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.submit(SameThreadHierarchicalTestExecutorService.java:35)
[2023-08-16T04:26:51.379Z] 	at org.junit.platform.engine.support.hierarchical.NodeTestTask$DefaultDynamicTestExecutor.execute(NodeTestTask.java:226)
[2023-08-16T04:26:51.379Z] 	at org.junit.platform.engine.support.hierarchical.NodeTestTask$DefaultDynamicTestExecutor.execute(NodeTestTask.java:204)
[2023-08-16T04:26:51.379Z] 	at org.junit.jupiter.engine.descriptor.TestTemplateTestDescriptor.execute(TestTemplateTestDescriptor.java:139)
[2023-08-16T04:26:51.379Z] 	at org.junit.jupiter.engine.descriptor.TestTemplateTestDescriptor.lambda$execute$2(TestTemplateTestDescriptor.java:107)
[2023-08-16T04:26:51.379Z] 	at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
[2023-08-16T04:26:51.379Z] 	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
[2023-08-16T04:26:51.379Z] 	at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
[2023-08-16T04:26:51.379Z] 	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
[2023-08-16T04:26:51.379Z] 	at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
[2023-08-16T04:26:51.379Z] 	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
[2023-08-16T04:26:51.379Z] 	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
[2023-08-16T04:26:51.379Z] 	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
[2023-08-16T04:26:51.379Z] 	at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
[2023-08-16T04:26:51.379Z] 	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
[2023-08-16T04:26:51.379Z] 	at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
[2023-08-16T04:26:51.379Z] 	at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647)
[2023-08-16T04:26:51.379Z] 	at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:272)
[2023-08-16T04:26:51.379Z] 	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
[2023-08-16T04:26:51.379Z] 	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
[2023-08-16T04:26:51.379Z] 	at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
[2023-08-16T04:26:51.379Z] 	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
[2023-08-16T04:26:51.379Z] 	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
[2023-08-16T04:26:51.379Z] 	at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
[2023-08-16T04:26:51.379Z] 	at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
[2023-08-16T04:26:51.379Z] 	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
[2023-08-16T04:26:51.379Z] 	at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
[2023-08-16T04:26:51.379Z] 	at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:272)
[2023-08-16T04:26:51.379Z] 	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
[2023-08-16T04:26:51.379Z] 	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
[2023-08-16T04:26:51.379Z] 	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
[2023-08-16T04:26:51.379Z] 	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384)
[2023-08-16T04:26:51.379Z] 	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
[2023-08-16T04:26:51.379Z] 	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
[2023-08-16T04:26:51.379Z] 	at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
[2023-08-16T04:26:51.379Z] 	at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
[2023-08-16T04:26:51.379Z] 	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
[2023-08-16T04:26:51.379Z] 	at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
[2023-08-16T04:26:51.379Z] 	at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:272)
[2023-08-16T04:26:51.379Z] 	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384)
[2023-08-16T04:26:51.379Z] 	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
[2023-08-16T04:26:51.379Z] 	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
[2023-08-16T04:26:51.379Z] 	at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
[2023-08-16T04:26:51.379Z] 	at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
[2023-08-16T04:26:51.379Z] 	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
[2023-08-16T04:26:51.379Z] 	at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
[2023-08-16T04:26:51.379Z] 	at org.junit.jupiter.engine.descriptor.TestTemplateTestDescriptor.execute(TestTemplateTestDescriptor.java:107)
[2023-08-16T04:26:51.379Z] 	at org.junit.jupiter.engine.descriptor.TestTemplateTestDescriptor.execute(TestTemplateTestDescriptor.java:42)
[2023-08-16T04:26:51.379Z] 	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151)
[2023-08-16T04:26:51.379Z] 	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
[2023-08-16T04:26:51.379Z] 	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
[2023-08-16T04:26:51.379Z] 	at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
[2023-08-16T04:26:51.379Z] 	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
[2023-08-16T04:26:51.379Z] 	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
[2023-08-16T04:26:51.379Z] 	at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
[2023-08-16T04:26:51.379Z] 	at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95)
[2023-08-16T04:26:51.379Z] 	at java.util.ArrayList.forEach(ArrayList.java:1259)
[2023-08-16T04:26:51.379Z] 	at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:41)
[2023-08-16T04:26:51.379Z] 	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:155)
[2023-08-16T04:26:51.379Z] 	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
[2023-08-16T04:26:51.379Z] 	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
[2023-08-16T04:26:51.379Z] 	at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
[2023-08-16T04:26:51.379Z] 	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
[2023-08-16T04:26:51.379Z] 	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
[2023-08-16T04:26:51.379Z] 	at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
[2023-08-16T04:26:51.379Z] 	at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95)
[2023-08-16T04:26:51.379Z] 	at java.util.ArrayList.forEach(ArrayList.java:1259)
[2023-08-16T04:26:51.379Z] 	at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:41)
[2023-08-16T04:26:51.379Z] 	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:155)
[2023-08-16T04:26:51.380Z] 	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
[2023-08-16T04:26:51.380Z] 	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
[2023-08-16T04:26:51.380Z] 	at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
[2023-08-16T04:26:51.380Z] 	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
[2023-08-16T04:26:51.380Z] 	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
[2023-08-16T04:26:51.380Z] 	at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
[2023-08-16T04:26:51.380Z] 	at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95)
[2023-08-16T04:26:51.380Z] 	at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.submit(SameThreadHierarchicalTestExecutorService.java:35)
[2023-08-16T04:26:51.380Z] 	at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor.execute(HierarchicalTestExecutor.java:57)
[2023-08-16T04:26:51.380Z] 	at org.junit.platform.engine.support.hierarchical.HierarchicalTestEngine.execute(HierarchicalTestEngine.java:54)
[2023-08-16T04:26:51.380Z] 	at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:220)
[2023-08-16T04:26:51.380Z] 	at org.junit.platform.launcher.core.DefaultLauncher.lambda$execute$6(DefaultLauncher.java:188)
[2023-08-16T04:26:51.380Z] 	at org.junit.platform.launcher.core.DefaultLauncher.withInterceptedStreams(DefaultLauncher.java:202)
[2023-08-16T04:26:51.380Z] 	at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:181)
[2023-08-16T04:26:51.380Z] 	at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:128)
[2023-08-16T04:26:51.380Z] 	at org.junit.platform.surefire.provider.JUnitPlatformProvider.invokeAllTests(JUnitPlatformProvider.java:155)
[2023-08-16T04:26:51.380Z] 	at org.junit.platform.surefire.provider.JUnitPlatformProvider.invoke(JUnitPlatformProvider.java:134)
[2023-08-16T04:26:51.380Z] 	at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:383)
[2023-08-16T04:26:51.380Z] 	at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:344)
[2023-08-16T04:26:51.380Z] 	at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125)
[2023-08-16T04:26:51.380Z] 	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:417)

Steps/Code to reproduce bug
Please provide a list of steps or a code sample to reproduce the issue.
Avoid posting private or sensitive data.

Expected behavior
A clear and concise description of what you expected to happen.

Environment details (please complete the following information)

  • Environment location: [Standalone, YARN, Kubernetes, Cloud(specify cloud provider)]
  • Spark configuration settings related to the issue

Additional context
Add any other context about the problem here.

@pxLi pxLi added bug Something isn't working ? - Needs Triage Need team to review and classify test Only impacts tests P0 Must have for release and removed ? - Needs Triage Need team to review and classify labels Aug 16, 2023
@pxLi pxLi changed the title [BUG] nightly test failed in cuda12 ENV after enable sanitizer [BUG] nightly ai.rapids.cudf.ReductionTest failed in cuda12 ENV after enable sanitizer Aug 16, 2023
@pxLi
Copy link
Collaborator Author

pxLi commented Aug 16, 2023

summary

[2023-08-16T04:26:52.487Z] [ERROR] Tests run: 729, Failures: 0, Errors: 90, Skipped: 3
[2023-08-16T04:26:52.487Z] [INFO] 
[2023-08-16T04:26:52.487Z] [INFO] ------------------------------------------------------------------------
[2023-08-16T04:26:52.487Z] [INFO] BUILD FAILURE
[2023-08-16T04:26:52.487Z] [INFO] ------------------------------------------------------------------------
[2023-08-16T04:26:52.487Z] [INFO] Total time: 54:55.118s
[2023-08-16T04:26:52.487Z] [INFO] Finished at: Wed Aug 16 04:26:52 UTC 2023
[2023-08-16T04:26:52.487Z] [INFO] Final Memory: 43M/1324M
[2023-08-16T04:26:52.487Z] [INFO] ------------------------------------------------------------------------
[2023-08-16T04:26:52.487Z] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.22.0:test (default-test) on project spark-rapids-jni: There are test failures.
[2023-08-16T04:26:52.487Z] [ERROR] 
[2023-08-16T04:26:52.487Z] [ERROR] Please refer to /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-512-cuda12/target/surefire-reports for the individual test results.
[2023-08-16T04:26:52.487Z] [ERROR] Please refer to dump files (if any exist) [date]-jvmRun[N].dump, [date].dumpstream and [date]-jvmRun[N].dumpstream.
[2023-08-16T04:26:52.487Z] [ERROR] There was an error in the forked process
[2023-08-16T04:26:52.487Z] [ERROR] Error occurred in starting fork, check output in log
[2023-08-16T04:26:52.487Z] [ERROR] Could not allocate native memory: std::bad_alloc: out_of_memory: RMM failure at:/home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-512-cuda12/thirdparty/cudf/cpp/build/_deps/rmm-src/include/rmm/mr/device/pool_memory_resource.hpp:196: Maximum pool size exceeded
[2023-08-16T04:26:52.487Z] [ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: There was an error in the forked process
[2023-08-16T04:26:52.487Z] [ERROR] Error occurred in starting fork, check output in log
[2023-08-16T04:26:52.487Z] [ERROR] Could not allocate native memory: std::bad_alloc: out_of_memory: RMM failure at:/home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-512-cuda12/thirdparty/cudf/cpp/build/_deps/rmm-src/include/rmm/mr/device/pool_memory_resource.hpp:196: Maximum pool size exceeded

and error dump Maximum pool size exceeded

# Created at 2023-08-16T04:26:51.225
java.lang.OutOfMemoryError: Could not allocate native memory: std::bad_alloc: out_of_memory: RMM failure at:/home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-512-cuda12/thirdparty/cudf/cpp/build/_deps/rmm-src/include/rmm/mr/device/pool_memory_resource.hpp:196: Maximum pool size exceeded
	at ai.rapids.cudf.Rmm.newPoolMemoryResource(Native Method)
	at ai.rapids.cudf.RmmPoolMemoryResource.<init>(RmmPoolMemoryResource.java:39)
	at ai.rapids.cudf.Rmm.initialize(Rmm.java:238)
	at ai.rapids.cudf.CudfTestBase.beforeEach(CudfTestBase.java:46)
	at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725)
	at org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
	at org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
	at org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
	at org.junit.jupiter.engine.extension.TimeoutExtension.interceptLifecycleMethod(TimeoutExtension.java:126)

@pxLi
Copy link
Collaborator Author

pxLi commented Aug 16, 2023

close this. dup of NVIDIA/spark-rapids-jni#1349

@pxLi pxLi closed this as completed Aug 16, 2023
@pxLi pxLi added duplicate This issue or pull request already exists and removed bug Something isn't working test Only impacts tests P0 Must have for release labels Aug 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

1 participant