[DO-NOT-MERGE][SPARK-23710] Upgrade built-in Hive to 2.3.4(Without hive-thriftserver) #23552

wangyum · 2019-01-15T16:30:38Z

What changes were proposed in this pull request?

This is the first PR. Just to make it easier to review the changes except hive-thriftserver module.
Complete changes please go:

How was this patch tested?

unit tests and manual tests

This commit does not contains the module of hive-thriftserver as it has many changes.

SparkQA · 2019-01-15T16:46:05Z

Test build #101272 has finished for PR 23552 at commit a4dc989.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

srowen

Is it still necessary to have the hive-contrib-2.3.4.jar etc in the test resources, to make the tests work?

Could you summarize the compatibility changes in the "Docs text" field of the JIRA and add the release-notes tag? that would help everyone understand the implications of the change.

But yeah this seems like something we have to do.

core/pom.xml

srowen · 2019-01-15T17:16:10Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala

+  private def isSubDir(p1: Path, p2: Path, fs: FileSystem): Boolean = {
+    val path1 = fs.makeQualified(p1).toString
+    val path2 = fs.makeQualified(p2).toString
+    if (path1.startsWith(path2)) true else false


No need for if (...) true else false. Just return the value of the predicate

srowen · 2019-01-15T17:16:28Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala

@@ -197,6 +201,13 @@ class OrcFileFormat extends FileFormat with DataSourceRegister with Serializable

    case _ => false
  }
+
+  def toKryo(sarg: SearchArgument): String = {


srowen · 2019-01-15T17:17:01Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFilters.scala

+   */
+  private def castLiteralValue(value: Any, dataType: DataType): Any = dataType match {
+    case ByteType | ShortType | IntegerType | LongType =>
+      value.asInstanceOf[Number].longValue


Trivial, but I'd use () or not consistently between here and line 110

gatorsmile · 2019-01-15T18:43:46Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala

@@ -192,6 +192,7 @@ private[hive] class IsolatedClientLoader(
    (name.startsWith("com.google") && !name.startsWith("com.google.cloud")) ||
    name.startsWith("java.lang.") ||
    name.startsWith("java.net") ||
+    name.startsWith("org.apache.derby.") ||


This breaks Apache Livy, as @HyukjinKwon pointed out. #20944 (comment)

gatorsmile · 2019-01-15T18:44:52Z

sql/core/pom.xml

@@ -86,15 +86,17 @@
      <scope>test</scope>
    </dependency>

+    <dependency>
+      <groupId>org.apache.hive</groupId>


We should not let sql/core depend on hive

gatorsmile · 2019-01-15T18:46:09Z

sql/core/pom.xml

    <dependency>
      <groupId>org.apache.orc</groupId>
      <artifactId>orc-core</artifactId>
-      <classifier>${orc.classifier}</classifier>


Our native orc upgrade should be independent on Hive ORC reader, as @dongjoon-hyun pointed out.

gatorsmile

I really doubt the value of the upgrade of Hive execution JARs based on the risk it brings.

srowen · 2019-03-14T14:00:30Z

@gatorsmile @wangyum I agree with the risks here; one important consideration too is that I don't think Hive 1.x will work at all with Java 9+. Still investigating but it looks like Hive 2.x might. That doesn't mean we have to merge this but may come up again as a tough call to make for 3.0.

wangyum · 2019-03-14T16:21:49Z

@srowen We plan to upgrade the built-in Hive to 2.3.4 for hadoop-3.1. hadoop-2.7 still uses 1.2.1. more details: https://issues.apache.org/jira/browse/SPARK-23710

srowen · 2019-03-14T19:16:38Z

That could make sense, to only publish / test with Java 11 with Hadoop 3.x and Hive 2.x+

Upgrade built-in Hive to 2.3.4.

a4dc989

This commit does not contains the module of hive-thriftserver as it has many changes.

srowen reviewed Jan 15, 2019

View reviewed changes

gatorsmile reviewed Jan 15, 2019

View reviewed changes

gatorsmile requested changes Jan 15, 2019

View reviewed changes

wangyum closed this Mar 14, 2019

wangyum deleted the SPARK-23710-without-thriftserver branch February 15, 2020 01:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DO-NOT-MERGE][SPARK-23710] Upgrade built-in Hive to 2.3.4(Without hive-thriftserver) #23552

[DO-NOT-MERGE][SPARK-23710] Upgrade built-in Hive to 2.3.4(Without hive-thriftserver) #23552

wangyum commented Jan 15, 2019

SparkQA commented Jan 15, 2019

srowen left a comment

srowen Jan 15, 2019

srowen Jan 15, 2019

srowen Jan 15, 2019

gatorsmile Jan 15, 2019 •

edited

Loading

gatorsmile Jan 15, 2019 •

edited

Loading

gatorsmile Jan 15, 2019

gatorsmile left a comment

srowen commented Mar 14, 2019

wangyum commented Mar 14, 2019

srowen commented Mar 14, 2019

[DO-NOT-MERGE][SPARK-23710] Upgrade built-in Hive to 2.3.4(Without hive-thriftserver) #23552

[DO-NOT-MERGE][SPARK-23710] Upgrade built-in Hive to 2.3.4(Without hive-thriftserver) #23552

Conversation

wangyum commented Jan 15, 2019

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Jan 15, 2019

srowen left a comment

Choose a reason for hiding this comment

srowen Jan 15, 2019

Choose a reason for hiding this comment

srowen Jan 15, 2019

Choose a reason for hiding this comment

srowen Jan 15, 2019

Choose a reason for hiding this comment

gatorsmile Jan 15, 2019 • edited Loading

Choose a reason for hiding this comment

gatorsmile Jan 15, 2019 • edited Loading

Choose a reason for hiding this comment

gatorsmile Jan 15, 2019

Choose a reason for hiding this comment

gatorsmile left a comment

Choose a reason for hiding this comment

srowen commented Mar 14, 2019

wangyum commented Mar 14, 2019

srowen commented Mar 14, 2019

gatorsmile Jan 15, 2019 •

edited

Loading

gatorsmile Jan 15, 2019 •

edited

Loading