-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] NPE on array_max of transformed empty array #5140
Labels
bug
Something isn't working
P0
Must have for release
reliability
Features to improve reliability or bugs that severly impact the reliability of the plugin
Milestone
Comments
jlowe
added
bug
Something isn't working
? - Needs Triage
Need team to review and classify
labels
Apr 4, 2022
mattahrens
added
P1
Nice to have for release
and removed
? - Needs Triage
Need team to review and classify
labels
Apr 5, 2022
It seems to boil down to incorrect handling of empty arrays in the array aggregation from pyspark.sql.functions import *
from pyspark.sql.types import *
schema = StructType(
[
StructField('c1', ArrayType(IntegerType(), containsNull=True))
]
)
df = spark.createDataFrame(
[
[[]]
],
schema
)
df.select(array_max('c1')).collect()
22/04/09 05:45:54 WARN GpuOverrides:
*Exec <ProjectExec> will run on GPU
*Expression <Alias> array_max(c1#0) AS array_max(c1)#2 will run on GPU
*Expression <ArrayMax> array_max(c1#0) will run on GPU
! <RDDScanExec> cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.RDDScanExec
@Expression <AttributeReference> c1#0 could run on GPU
22/04/09 05:45:55 ERROR Executor: Exception in task 15.0 in stage 0.0 (TID 15)6]
Caused by: java.lang.AssertionError: index is out of range 0 <= 0 < 0
at ai.rapids.cudf.HostColumnVectorCore.isNull(HostColumnVectorCore.java:451)
at com.nvidia.spark.rapids.RapidsHostColumnVectorCore.isNullAt(RapidsHostColumnVectorCore.java:89)
at org.apache.spark.sql.vectorized.ColumnarBatchRow.isNullAt(ColumnarBatch.java:190)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:350) |
revans2
added
P0
Must have for release
reliability
Features to improve reliability or bugs that severly impact the reliability of the plugin
and removed
P1
Nice to have for release
labels
Apr 12, 2022
This issue can be (re)solved in cuDF via rapidsai/cudf#10779 |
rapids-bot bot
pushed a commit
to rapidsai/cudf
that referenced
this issue
May 6, 2022
…#10779) This PR suggests a 3VL way of interpreting `isNull` for a `rowId` out of bounds. Such a value is unknown and therefore isNull should be `true`. NVIDIA/spark-rapids#5140 shows that `SpecificUnsafeProjection` may probe child columns for NULL even though the parent column row is also NULL. However there are no rows in the child CV when the parent row is NULL leading to an assert violation if asserts are enabled or an NPE if disabled. Signed-off-by: Gera Shegalov <[email protected]> Authors: - Gera Shegalov (https://github.com/gerashegalov) Approvers: - Robert (Bobby) Evans (https://github.com/revans2) URL: #10779
Closed by rapidsai/cudf#10779 and #5438 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
bug
Something isn't working
P0
Must have for release
reliability
Features to improve reliability or bugs that severly impact the reliability of the plugin
Describe the bug
The following query results in an NPE stacktrace when the RAPIDS Accelerator is enabled.
Steps/Code to reproduce bug
Execute the query above with the RAPIDS Accelerator enabled which will result in the following stacktrace:
Expected behavior
The query should not crash and produce the same result as on the CPU, e.g.:
Environment details (please complete the following information)
Spark 3.2.1
The text was updated successfully, but these errors were encountered: