[BUG] Writing Parquet map(map) column can not set the outer key as non-null. #9129

res-life · 2023-08-29T10:28:56Z

Describe the bug
Reports the following error when writing map(map) column.

key column can not be nullable

The SchemaUtils.writerOptionsFromField can not handle map(map) column.
SchemaUtils uses structBuilder to simulate mapBuilder, but does not set the outer key as non-null. For the inner key, it's OK.

Steps/Code to reproduce bug

val data = Seq(
  (Map(Map(111->111, 112->112) -> 1, Map(121->121, 122->122) -> 2), 1),
  (Map(Map(211->111, 212->112) -> 1, Map(221->121, 222->122) -> 2), 2)
)
val df = spark.createDataFrame(data).toDF("c1", "c2")
df.write.parquet("/tmp/a.parquet")

Error:

Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (chongg-pc executor driver): java.lang.IllegalArgumentException: key column can not be nullable
	at ai.rapids.cudf.ColumnWriterOptions.mapColumn(ColumnWriterOptions.java:530)
	at com.nvidia.spark.rapids.SchemaUtils$.writerOptionsFromField(SchemaUtils.scala:298)
	at com.nvidia.spark.rapids.SchemaUtils$.$anonfun$writerOptionsFromSchema$1(SchemaUtils.scala:329)
	at scala.collection.Iterator.foreach(Iterator.scala:943)
	at scala.collection.Iterator.foreach$(Iterator.scala:943)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
	at scala.collection.IterableLike.foreach(IterableLike.scala:74)
	at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
	at org.apache.spark.sql.types.StructType.foreach(StructType.scala:102)
	at com.nvidia.spark.rapids.SchemaUtils$.writerOptionsFromSchema(SchemaUtils.scala:327)
	at com.nvidia.spark.rapids.GpuParquetWriter.<init>(GpuParquetFileFormat.scala:374)
	at com.nvidia.spark.rapids.GpuParquetFileFormat$$anon$1.newInstance(GpuParquetFileFormat.scala:287)
	at org.apache.spark.sql.rapids.GpuSingleDirectoryDataWriter.newOutputWriter(GpuFileFormatDataWriter.scala:235)
	at org.apache.spark.sql.rapids.GpuSingleDirectoryDataWriter.<init>(GpuFileFormatDataWriter.scala:217)
	at org.apache.spark.sql.rapids.GpuFileFormatWriter$.executeTask(GpuFileFormatWriter.scala:326)
	at org.apache.spark.sql.rapids.GpuFileFormatWriter$.$anonfun$write$15(GpuFileFormatWriter.scala:266)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:136)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)

Environment details (please complete the following information)
Branch 23.10

The text was updated successfully, but these errors were encountered:

sameerz · 2023-08-30T21:52:23Z

Related to rapidsai/cudf#14003

res-life added bug Something isn't working ? - Needs Triage Need team to review and classify labels Aug 29, 2023

res-life mentioned this issue Aug 29, 2023

Statistics tests for Parquet files written by GPU #8762

Closed

sameerz removed the ? - Needs Triage Need team to review and classify label Aug 30, 2023

sameerz assigned res-life Aug 30, 2023

res-life mentioned this issue Aug 31, 2023

Make map column non-nullable when it's a key in another map. #9147

Merged

res-life closed this as completed in #9147 Sep 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Writing Parquet map(map) column can not set the outer key as non-null. #9129

[BUG] Writing Parquet map(map) column can not set the outer key as non-null. #9129

res-life commented Aug 29, 2023 •

edited

Loading

sameerz commented Aug 30, 2023

[BUG] Writing Parquet map(map) column can not set the outer key as non-null. #9129

[BUG] Writing Parquet map(map) column can not set the outer key as non-null. #9129

Comments

res-life commented Aug 29, 2023 • edited Loading

sameerz commented Aug 30, 2023

res-life commented Aug 29, 2023 •

edited

Loading