-
Notifications
You must be signed in to change notification settings - Fork 75
[NSE-359] [NSE-273] Introduce shim layer to fix compatibility issues for gazelle on spark 3.1 & 3.2 #742
Conversation
…vant to spark 3.1/3.2 compatibility issues
…o AQEShuffleReadExec
Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/oap-project/native-sql-engine/issues Then could you also rename commit message and pull request title in the following format?
See also: |
pom.xml
Outdated
</properties> | ||
</profile> | ||
<profile> | ||
<id>spark-3.2.0</id> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: rename these profiles to spark-3.2
and spark-3.1
, as the changes from those minor releases are not so big
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, make sense. The corresponding minor release may work well also. I will make the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to compile with Spark 3.2.1, looks like there are indeed some changes.
[INFO] /home/sparkuser/git/gazelle_plugin/arrow-data-source/parquet/src/main/scala:-1: info: compiling [INFO] Compiling 4 source files to /home/sparkuser/git/gazelle_plugin/arrow-data-source/parquet/target/classes at 1646359743584 [ERROR] /home/sparkuser/git/gazelle_plugin/arrow-data-source/parquet/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala:344: error: overloaded method constructor VectorizedParquetRecordReader with alternatives: [INFO] (x$1: Boolean,x$2: Int)org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader <and> [INFO] (x$1: java.time.ZoneId,x$2: String,x$3: String,x$4: String,x$5: String,x$6: Boolean,x$7: Int)org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader [INFO] cannot be applied to (java.time.ZoneId, String, String, Boolean, Int) [INFO] val vectorizedReader = new VectorizedParquetRecordReader( [INFO] ^ [ERROR] /home/sparkuser/git/gazelle_plugin/arrow-data-source/parquet/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala:366: error: type mismatch; [INFO] found : org.apache.spark.sql.internal.SQLConf.LegacyBehaviorPolicy.Value [INFO] required: org.apache.spark.sql.catalyst.util.RebaseDateTime.RebaseSpec [INFO] convertTz, enableVectorizedReader = false, datetimeRebaseMode, SQLConf.LegacyBehaviorPolicy.LEGACY) [INFO] ^ [ERROR] /home/sparkuser/git/gazelle_plugin/arrow-data-source/parquet/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala:366: error: type mismatch; [INFO] found : org.apache.spark.sql.internal.SQLConf.LegacyBehaviorPolicy.Value [INFO] required: org.apache.spark.sql.catalyst.util.RebaseDateTime.RebaseSpec [INFO] convertTz, enableVectorizedReader = false, datetimeRebaseMode, SQLConf.LegacyBehaviorPolicy.LEGACY) [INFO] ^ [ERROR] three errors found
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there are some incompatible issues related to parquet reader after bumping spark versions from 3.2.0 to 3.2.1. I have just fixed all issues found in the compiling. So by now, the project with this patch applied can be successfully built on both spark 3.1.1 & 3.2.1.
Two build profiles, |
note: this patch changes the testing configurations, common/shims jar need to be added to class path |
new configuration with this patch
spark.executor.extraClassPath /home/sparkuser/nativesql_jars/spark-columnar-core-1.4.0-SNAPSHOT-jar-with-dependencies.jar:/home/sparkuser/nativesql_jars/spark-arrow-datasource-standard-1.4.0-SNAPSHOT-jar-with-dependencies.jar:/home/sparkuser/nativesql_jars/spark-sql-columnar-shims-common-1.4.0-SNAPSHOT.jar:/home/sparkuser/nativesql_jars/spark-sql-columnar-shims-spark321-1.4.0-SNAPSHOT.jar