You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Reports the following error when writing map(map) column.
key column can not be nullable
The SchemaUtils.writerOptionsFromField can not handle map(map) column. SchemaUtils uses structBuilder to simulate mapBuilder, but does not set the outer key as non-null. For the inner key, it's OK.
Steps/Code to reproduce bug
val data = Seq(
(Map(Map(111->111, 112->112) -> 1, Map(121->121, 122->122) -> 2), 1),
(Map(Map(211->111, 212->112) -> 1, Map(221->121, 222->122) -> 2), 2)
)
val df = spark.createDataFrame(data).toDF("c1", "c2")
df.write.parquet("/tmp/a.parquet")
Error:
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (chongg-pc executor driver): java.lang.IllegalArgumentException: key column can not be nullable
at ai.rapids.cudf.ColumnWriterOptions.mapColumn(ColumnWriterOptions.java:530)
at com.nvidia.spark.rapids.SchemaUtils$.writerOptionsFromField(SchemaUtils.scala:298)
at com.nvidia.spark.rapids.SchemaUtils$.$anonfun$writerOptionsFromSchema$1(SchemaUtils.scala:329)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at org.apache.spark.sql.types.StructType.foreach(StructType.scala:102)
at com.nvidia.spark.rapids.SchemaUtils$.writerOptionsFromSchema(SchemaUtils.scala:327)
at com.nvidia.spark.rapids.GpuParquetWriter.<init>(GpuParquetFileFormat.scala:374)
at com.nvidia.spark.rapids.GpuParquetFileFormat$$anon$1.newInstance(GpuParquetFileFormat.scala:287)
at org.apache.spark.sql.rapids.GpuSingleDirectoryDataWriter.newOutputWriter(GpuFileFormatDataWriter.scala:235)
at org.apache.spark.sql.rapids.GpuSingleDirectoryDataWriter.<init>(GpuFileFormatDataWriter.scala:217)
at org.apache.spark.sql.rapids.GpuFileFormatWriter$.executeTask(GpuFileFormatWriter.scala:326)
at org.apache.spark.sql.rapids.GpuFileFormatWriter$.$anonfun$write$15(GpuFileFormatWriter.scala:266)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:136)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Environment details (please complete the following information)
Branch 23.10
The text was updated successfully, but these errors were encountered:
Describe the bug
Reports the following error when writing map(map) column.
The SchemaUtils.writerOptionsFromField can not handle map(map) column.
SchemaUtils
usesstructBuilder
to simulate mapBuilder, but does not set the outer key as non-null. For the inner key, it's OK.Steps/Code to reproduce bug
Error:
Environment details (please complete the following information)
Branch 23.10
The text was updated successfully, but these errors were encountered: