Skip to content
This repository has been archived by the owner on Sep 18, 2023. It is now read-only.

[NSE-359] [NSE-273] Introduce shim layer to fix compatibility issues for gazelle on spark 3.1 & 3.2 #742

Merged
merged 39 commits into from
Mar 9, 2022

Conversation

PHILO-HE
Copy link
Collaborator

@PHILO-HE PHILO-HE commented Feb 25, 2022

new configuration with this patch

spark.executor.extraClassPath /home/sparkuser/nativesql_jars/spark-columnar-core-1.4.0-SNAPSHOT-jar-with-dependencies.jar:/home/sparkuser/nativesql_jars/spark-arrow-datasource-standard-1.4.0-SNAPSHOT-jar-with-dependencies.jar:/home/sparkuser/nativesql_jars/spark-sql-columnar-shims-common-1.4.0-SNAPSHOT.jar:/home/sparkuser/nativesql_jars/spark-sql-columnar-shims-spark321-1.4.0-SNAPSHOT.jar

@github-actions
Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/native-sql-engine/issues

Then could you also rename commit message and pull request title in the following format?

[NSE-${ISSUES_ID}] ${detailed message}

See also:

@PHILO-HE PHILO-HE marked this pull request as draft February 25, 2022 09:34
@zhouyuan zhouyuan mentioned this pull request Mar 1, 2022
pom.xml Outdated
</properties>
</profile>
<profile>
<id>spark-3.2.0</id>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: rename these profiles to spark-3.2 and spark-3.1, as the changes from those minor releases are not so big

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, make sense. The corresponding minor release may work well also. I will make the change.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to compile with Spark 3.2.1, looks like there are indeed some changes.
[INFO] /home/sparkuser/git/gazelle_plugin/arrow-data-source/parquet/src/main/scala:-1: info: compiling [INFO] Compiling 4 source files to /home/sparkuser/git/gazelle_plugin/arrow-data-source/parquet/target/classes at 1646359743584 [ERROR] /home/sparkuser/git/gazelle_plugin/arrow-data-source/parquet/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala:344: error: overloaded method constructor VectorizedParquetRecordReader with alternatives: [INFO] (x$1: Boolean,x$2: Int)org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader <and> [INFO] (x$1: java.time.ZoneId,x$2: String,x$3: String,x$4: String,x$5: String,x$6: Boolean,x$7: Int)org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader [INFO] cannot be applied to (java.time.ZoneId, String, String, Boolean, Int) [INFO] val vectorizedReader = new VectorizedParquetRecordReader( [INFO] ^ [ERROR] /home/sparkuser/git/gazelle_plugin/arrow-data-source/parquet/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala:366: error: type mismatch; [INFO] found : org.apache.spark.sql.internal.SQLConf.LegacyBehaviorPolicy.Value [INFO] required: org.apache.spark.sql.catalyst.util.RebaseDateTime.RebaseSpec [INFO] convertTz, enableVectorizedReader = false, datetimeRebaseMode, SQLConf.LegacyBehaviorPolicy.LEGACY) [INFO] ^ [ERROR] /home/sparkuser/git/gazelle_plugin/arrow-data-source/parquet/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala:366: error: type mismatch; [INFO] found : org.apache.spark.sql.internal.SQLConf.LegacyBehaviorPolicy.Value [INFO] required: org.apache.spark.sql.catalyst.util.RebaseDateTime.RebaseSpec [INFO] convertTz, enableVectorizedReader = false, datetimeRebaseMode, SQLConf.LegacyBehaviorPolicy.LEGACY) [INFO] ^ [ERROR] three errors found

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there are some incompatible issues related to parquet reader after bumping spark versions from 3.2.0 to 3.2.1. I have just fixed all issues found in the compiling. So by now, the project with this patch applied can be successfully built on both spark 3.1.1 & 3.2.1.

@PHILO-HE
Copy link
Collaborator Author

PHILO-HE commented Mar 4, 2022

Two build profiles, spark-3.1 & spark-3.2, are supported for selecting spark 3.1.1 and spark 3.2.1 dependencies respectively.

@PHILO-HE PHILO-HE marked this pull request as ready for review March 8, 2022 06:21
@zhouyuan zhouyuan requested a review from weiting-chen March 8, 2022 06:29
@zhouyuan zhouyuan changed the title Introduce shim layer to fix compatibility issues for gazelle on spark 3.1 & 3.2 [NSE-359] [NSE-273] Introduce shim layer to fix compatibility issues for gazelle on spark 3.1 & 3.2 Mar 9, 2022
@github-actions
Copy link

github-actions bot commented Mar 9, 2022

#359

@zhouyuan
Copy link
Collaborator

zhouyuan commented Mar 9, 2022

note: this patch changes the testing configurations, common/shims jar need to be added to class path

@zhouyuan zhouyuan merged commit 308cb58 into oap-project:master Mar 9, 2022
@weiting-chen weiting-chen added the bug Something isn't working label Apr 8, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants