Upgrade Delta to use Apache Spark 3.3.0 #1257

vkorukanti · 2022-07-07T17:15:38Z

Description

Upgrade the Spark dependency version to 3.3.0. Following are the major changes:

Test fixes to change the expected error message
VacuumCommand: Update the parallel delete to first check if there are entries before trying to reduce.
Update the LogicalPlan used to represent the create or replace command in DeltaTableBuilder
Spark ersion upgrade in build and test setup scripts

tdas · 2022-07-19T15:10:07Z

core/src/main/scala/io/delta/sql/parser/DeltaSqlParser.scala

+   * This API is used just for parsing the SELECT queries. Delta parser doesn't override
+   * the Spark parser, that means this can be delegated directly to the Spark parser.
+   */
+  override def parseQuery(sqlText: String): LogicalPlan = delegate.parseQuery(sqlText)


i didnt quite get what this means. What has changed in the control flow of SELECT queries between 3.2 and 3.3?

is this the issue that was causing a lot of failures?

It is a new API added to the SparkSQL parser interface. We don't need to change anything here, just delegate it to the Spark parser.

No this wasn't causing test failures. It just failed the build.

tdas · 2022-07-19T15:19:34Z

core/src/main/scala/io/delta/tables/DeltaTableBuilder.scala

-        CreateTableStatement(
-          table,
+        CreateTable(
+          UnresolvedDBObjectName(table, false),


what is the false for? use paramName = false format.

added the parameter name isNamespace. As far as I know this is not applicable to Delta catalog, so marked as false

tdas · 2022-07-19T15:19:48Z

core/src/main/scala/io/delta/tables/DeltaTableBuilder.scala

-        ReplaceTableStatement(
-          table,
+        ReplaceTable(
+          UnresolvedDBObjectName(table, false),


same as above

see the response in above comment.

core/src/main/scala/io/delta/tables/execution/DeltaTableOperations.scala

tdas · 2022-07-19T15:23:29Z

core/src/test/scala/org/apache/spark/sql/delta/DeltaAlterTableTests.scala

@@ -406,15 +406,15 @@ trait DeltaAlterTableTests extends DeltaAlterTableTestBase {
             |ALTER TABLE $tableName ADD COLUMNS (m.key.mkv3 long)
         """.stripMargin)
      }
-      checkErrMsg(ex.getMessage, Seq("m", "key", "mkv3"))
+      checkErrMsg(ex.getMessage, Seq("`m`", "`key`", "`mkv3`"))


why is this change needed? Isnt checkErrMsg doing substring match where searching for "m" should work even if the error message has "m"?

it searches for `m`.`key`.`mkv3`. Earlier it used to check for m.key.mkv3. I reworked this to change only the checkErrMsg

tdas · 2022-07-19T15:24:09Z

core/src/test/scala/org/apache/spark/sql/delta/DeltaDropColumnSuite.scala

@@ -117,12 +117,12 @@ class DeltaDropColumnSuite extends QueryTest
      val err1 = intercept[AnalysisException] {
        spark.table("t1").where("a = 'str1'").collect()
      }.getMessage
-      assert(err1.contains("cannot be resolved") || err1.contains("cannot resolve"))
+      assert(err1.contains("Column 'a' does not exist") || err1.contains("cannot resolve"))


I think it would be good to put this as an alternative with ||

tdas · 2022-07-19T15:24:45Z

core/src/test/scala/org/apache/spark/sql/delta/DeltaInsertIntoTableSuite.scala

@@ -221,7 +221,7 @@ abstract class DeltaInsertIntoTestsWithTempViews(
          case e: AnalysisException =>
            assert(e.getMessage.contains("Inserting into a view is not allowed") ||
              e.getMessage.contains("Inserting into an RDD-based table is not allowed") ||
-              e.getMessage.contains("Table default.v not found"))
+              e.getMessage.contains("Table or view 'v' not found in database 'default'"))


can you make this alternatives?

tdas · 2022-07-22T23:44:14Z

Please update the title and description.

tdas · 2022-07-22T23:54:37Z

examples/scala/build.sbt

-  case v10x_and_above if Try {
-        v10x_and_above.split('.')(0).toInt
-      }.toOption.exists(_ >= 1) =>
+  case v21x_and_above if scala.util.Try {


Can you add some plain english explanation for each case. it's getting complicated.

I also think this can be simplified further by pulling out this version splitting into a separate function (there are multiple copies now). Also, I think the function can simply convert string x.y.z into a decimal number x.y (z will never decide anything) making the comparison far easier.

Also, do we really need to support all previous versions? we can only support only up to last major version. In fact, just choosing spark version based on delta version will not work seamlessly as a scala versions will have to change as well if we go too far back.

either way, I am okay with this change in this PR (add docs if you can, not a blocker), but would good to create an issue to simplify this.

The Spark version lookup is added as part of 1118a72. The reason could be to allow running the examples on any Delta version which can be set through env: DELTA_VERSION.

I will simplify this

tdas · 2022-07-22T23:59:07Z

core/src/main/scala/org/apache/spark/sql/delta/commands/VacuumCommand.scala

@@ -318,6 +318,8 @@ trait VacuumCommandImpl extends DeltaCommand {
    import spark.implicits._

    if (parallel) {
+      // If there are no entries, do not call reduce as it results in empty collection error
+      if (diff.count() == 0) return 0


use take(1).isEmpty instead. much cheaper.

jaceklaskowski

LGTM except this tiny nit (sorry, couldn't resist)

jaceklaskowski · 2022-07-23T11:49:05Z

core/src/main/scala/io/delta/tables/DeltaTableBuilder.scala

@@ -320,37 +322,30 @@ class DeltaTableBuilder private[tables](
      colNames.map(name => DeltaTableUtils.parseColToTransform(name))
    }.getOrElse(Seq.empty[Transform])

+    val tableSpec = org.apache.spark.sql.catalyst.plans.logical.TableSpec(
+      properties = this.properties,


nit: remove this. (unless I'm mistaken, it's not used throughout the codebase)

zsxwing

LGTM

## Description Upgrade the Spark dependency version to 3.3.0. Following are the major changes: * Test fixes to change the expected error message * `VacuumCommand`: Update the parallel delete to first check if there are entries before trying to `reduce`. * Update the `LogicalPlan` used to represent the create or replace command in `DeltaTableBuilder` * Spark version upgrade in build and test setup scripts * Spark 3.3 upgraded the log4j from 1.x to 2.x which has a different log4j properties format Fixes delta-io#1217 Closes delta-io#1257 Signed-off-by: Venki Korukanti <[email protected]> GitOrigin-RevId: 3e930d3c2cef5fca5f2cd8dd94a8617dbe2f747b

cometta · 2022-08-26T10:44:30Z

will this be merged anytime soon?

tdas · 2022-08-26T15:01:41Z

It has been merge and there is a Delta 2.1 Preview on Spark 3.3 currently undergoing community testing - https://github.com/delta-io/delta/releases/tag/v2.1.0rc1

We are hoping to make the final release of 2.1 early next week.

vkorukanti force-pushed the spark33 branch from cf45e17 to 3812850 Compare July 11, 2022 17:30

mathbunnyru mentioned this pull request Jul 11, 2022

[ENH] - Delta Image jupyter/docker-stacks#1746

Closed

tdas mentioned this pull request Jul 15, 2022

[Feature Request] Support pyspark 3.3.0 #1273

Closed

2 tasks

vkorukanti force-pushed the spark33 branch from 3812850 to 016b6ba Compare July 15, 2022 17:48

tdas reviewed Jul 19, 2022

View reviewed changes

core/src/main/scala/io/delta/tables/execution/DeltaTableOperations.scala Show resolved Hide resolved

tdas reviewed Jul 19, 2022

View reviewed changes

vkorukanti force-pushed the spark33 branch from c7c1ca5 to 4f48fd2 Compare July 22, 2022 14:53

tdas reviewed Jul 22, 2022

View reviewed changes

vkorukanti changed the title ~~[WIP] Update to Spark 3.3.0~~ Upgrade Delta to use Apache Spark 3.3.0 Jul 23, 2022

jaceklaskowski approved these changes Jul 23, 2022

View reviewed changes

vkorukanti added 2 commits July 23, 2022 12:11

[WIP] Update to Spark 3.3.0

bd5d2f1

upgrade log4j to log4j2

a1b8491

vkorukanti force-pushed the spark33 branch from 3e1ca53 to a1b8491 Compare July 25, 2022 04:01

vkorukanti mentioned this pull request Jul 25, 2022

Support AS OF SQL syntax for time travel queries on Delta Table #1288

Closed

zsxwing approved these changes Jul 29, 2022

View reviewed changes

zsxwing added the waiting for merge label Jul 29, 2022

allisonport-db closed this in 2d47ffd Aug 1, 2022

tdas mentioned this pull request Aug 23, 2022

Roadmap 2022 H2 (discussion) #1307

Open

allisonport-db added this to the 2.1.0 milestone Aug 28, 2022

vkorukanti deleted the spark33 branch October 2, 2023 05:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade Delta to use Apache Spark 3.3.0 #1257

Upgrade Delta to use Apache Spark 3.3.0 #1257

vkorukanti commented Jul 7, 2022 •

edited

Loading

tdas Jul 19, 2022

tdas Jul 19, 2022

vkorukanti Jul 23, 2022

tdas Jul 19, 2022

vkorukanti Jul 23, 2022

tdas Jul 19, 2022

vkorukanti Jul 23, 2022

tdas Jul 19, 2022

vkorukanti Jul 23, 2022

tdas Jul 19, 2022

tdas Jul 19, 2022

tdas commented Jul 22, 2022

tdas Jul 22, 2022

tdas Jul 22, 2022

vkorukanti Jul 23, 2022

tdas Jul 22, 2022

jaceklaskowski left a comment

jaceklaskowski Jul 23, 2022

zsxwing left a comment

cometta commented Aug 26, 2022

tdas commented Aug 26, 2022

Upgrade Delta to use Apache Spark 3.3.0 #1257

Upgrade Delta to use Apache Spark 3.3.0 #1257

Conversation

vkorukanti commented Jul 7, 2022 • edited Loading

Description

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tdas commented Jul 22, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jaceklaskowski left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zsxwing left a comment

Choose a reason for hiding this comment

cometta commented Aug 26, 2022

tdas commented Aug 26, 2022

vkorukanti commented Jul 7, 2022 •

edited

Loading