Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Cannot lexicographic compare a table with a LIST of STRUCT column at ai.rapids.cudf.Table.sortOrder #7799

Closed
tgravescs opened this issue Feb 22, 2023 · 3 comments · Fixed by #7812
Assignees
Labels
bug Something isn't working

Comments

@tgravescs
Copy link
Collaborator

Describe the bug
Customer job tried upgrading to a 23.04 snapshot build (job was using 23.02.20221221 version) and got the following error:

ai.rapids.cudf.CudfException: CUDF failure at: /home/jenkins/agent/workspace/jenkins-spark-rapids-jni_nightly-dev-346-cuda11/thirdparty/cudf/cpp/src/table/row_operators.cu:288: Cannot lexicographic compare a table with a LIST of STRUCT column at ai.rapids.cudf.Table.sortOrder(Native Method) at ai.rapids.cudf.Table.sortOrder(Table.java:1956) at com.nvidia.spark.rapids.GpuSorter.$anonfun$computeSortOrder$2(SortUtils.scala:327) at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28) at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26) at com.nvidia.spark.rapids.GpuSorter.withResource(SortUtils.scala:65) at com.nvidia.spark.rapids.GpuSorter.$anonfun$computeSortOrder$1(SortUtils.scala:326) at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28) at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26) at com.nvidia.spark.rapids.GpuSorter.withResource(SortUtils.scala:65) at com.nvidia.spark.rapids.GpuSorter.computeSortOrder(SortUtils.scala:325) at com.nvidia.spark.rapids.GpuSorter.$anonfun$fullySortBatch$1(SortUtils.scala:377) at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28) at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26) at com.nvidia.spark.rapids.GpuSorter.withResource(SortUtils.scala:65) at com.nvidia.spark.rapids.GpuSorter.fullySortBatch(SortUtils.scala:372) at com.nvidia.spark.rapids.GpuSortEachBatchIterator.$anonfun$next$2(GpuSortExec.scala:173) at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28) at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26) at com.nvidia.spark.rapids.GpuSortEachBatchIterator.withResource(GpuSortExec.scala:159) at com.nvidia.spark.rapids.GpuSortEachBatchIterator.$anonfun$next$1(GpuSortExec.scala:172) at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28) at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26) at com.nvidia.spark.rapids.GpuSortEachBatchIterator.withResource(GpuSortExec.scala:159) at com.nvidia.spark.rapids.GpuSortEachBatchIterator.next(GpuSortExec.scala:171) at com.nvidia.spark.rapids.GpuSortEachBatchIterator.next(GpuSortExec.scala:159) at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.partNextBatch(GpuShuffleExchangeExecBase.scala:318) at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.hasNext(GpuShuffleExchangeExecBase.scala:340) at org.apache.spark.sql.rapids.RapidsShuffleThreadedWriterBase.$anonfun$write$2(RapidsShuffleInternalManagerBase.scala:281) at org.apache.spark.sql.rapids.RapidsShuffleThreadedWriterBase.$anonfun$write$2$adapted(RapidsShuffleInternalManagerBase.scala:274) at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28) at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26) at org.apache.spark.sql.rapids.RapidsShuffleThreadedWriterBase.withResource(RapidsShuffleInternalManagerBase.scala:234) at org.apache.spark.sql.rapids.RapidsShuffleThreadedWriterBase.$anonfun$write$1(RapidsShuffleInternalManagerBase.scala:274) at org.apache.spark.sql.rapids.RapidsShuffleThreadedWriterBase.$anonfun$write$1$adapted(RapidsShuffleInternalManagerBase.scala:273)

@tgravescs tgravescs added bug Something isn't working ? - Needs Triage Need team to review and classify labels Feb 22, 2023
@ttnghia
Copy link
Collaborator

ttnghia commented Feb 22, 2023

Comparing LIST of STRUCT and STRUCT of LIST is not yet supported. The corresponding cudf issue is rapidsai/cudf#11222. There is no plan to implement it very soon.

Related issue which has the same cudf dependency: #5109

@ttnghia
Copy link
Collaborator

ttnghia commented Feb 22, 2023

This issue should not be due to upgrading to 23.04. It should exist in all releases (in fact, we just recently support sorting nested LIST).

@tgravescs
Copy link
Collaborator Author

this particular query actually has a delta write that went on the GPU in 23.04 (GpuRapidsDeltaWrite). Looks like the sort happens because of the roundrobin partitioning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants