[BUG] DatabricksShimVersion must carry runtime version info #3532

gerashegalov · 2021-09-17T21:25:36Z

Describe the bug
Shim layer in the Plugin currently identifies Databricks runtime versions using the exposed Spark version only. However, this is not a 1:1 mapping https://docs.databricks.com/release-notes/runtime/releases.html.

Reviewing Spark WebUI in the runtime under "Environment" one can see a config key that we can use to retrieve the runtime version info without creating a binary dependency in Scala using

spark.conf.get("spark.databricks.clusterUsageTags.sparkVersion")

spark.databricks.clusterUsageTags.sparkVersion | 8.4.x-gpu-ml-scala2.12

Steps/Code to reproduce bug
https://nvidia.github.io/spark-rapids/docs/get-started/getting-started-databricks.html

Expected behavior
Shim layer should correctly identify the runtime version to handle Spark differences in runtime versions.

Environment details (please complete the following information)

Databricks

The text was updated successfully, but these errors were encountered:

tgravescs · 2021-09-17T21:32:41Z

note that may work if running an actual notebook but we may need to figure out something for testing

scala> spark.conf.get("spark.databricks.clusterUsageTags.sparkVersion")
java.util.NoSuchElementException: spark.databricks.clusterUsageTags.sparkVersion
at org.apache.spark.sql.internal.SQLConf.$anonfun$getConfString$3(SQLConf.scala:3429)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.internal.SQLConf.getConfString(SQLConf.scala:3429)
at org.apache.spark.sql.RuntimeConfig.get(RuntimeConfig.scala:77)
... 47 elided

gerashegalov · 2021-09-20T06:26:43Z

We can grab it at build time and store as db-version.properties in under db shim dir:

$ grep spark.databricks.clusterUsageTags.sparkVersion /databricks/common/conf/deploy.conf
  spark.databricks.clusterUsageTags.sparkVersion = "7.3.x-gpu-ml-scala2.12"

For shims to work correctly, we have to have shims per runtime instead of per spark version to the tune of:
spark312db84, spark312db90

At run time. spark3XXdbYY.SparkShimServiceProvider can then load 'spark3XXdbYY/db-version.properties' for version matching.

jlowe · 2021-09-20T14:06:33Z

How does grabbing deploy.conf settings and storing it in the shim at build time help Tom's example? We need to detect at runtime, not build time, what Databricks version is there. Tom's example shows that we can end up starting a Spark shell that doesn't have this property set, so how does having a number of them defined in Databricks shims help detect which one is correct? Seems like we're going to have to check the contents of /databricks/common/conf/deploy.conf at runtime (or at least fallback to that behavior if the config is not set).

I'm also not a fan of putting the Spark version in the Databricks shim version. They could change it at a whim (and have in the past, from 3.1.0 to 3.1.1). All that really matters is the Databricks runtime version, and that's what the Databricks user is more familiar with.

gerashegalov · 2021-09-20T17:57:13Z

I can be more explicit about the algorithm but hopefully it was clear where I am going with it:

we have the right version at the build time /databricks/common/conf/deploy.conf that
we need to change getSparkVersion

spark-rapids/sql-plugin/src/main/scala/com/nvidia/spark/rapids/ShimLoader.scala

Line 276 in d6ab561

SPARK_VERSION + "-databricks"

to read from deploy.conf. To match the behavior in the notebook and to simplify the code we should just extract the property in test launcher and add it as --conf as opposed to reading deploy.conf from getSparkVersion directly
Now each spark3XXdbYY.SparkShimServiceProvider can safely compare its build time version to the runtime version

I'm also not a fan of putting the Spark version in the Databricks shim version. They could change it at a whim (and have in the past, from 3.1.0 to 3.1.1). All that really matters is the Databricks runtime version, and that's what the Databricks user is more familiar with.

This is the point of this bug, the databricks ShimServiceProviders should compare the build-time and run-time databricks runtime versions such as 8.4.x-gpu-ml-scala2.12. To make the solution robust we should also look at spark version and look for any other available properties such as build timestamp to disambiguate the x component. Note that the issue does not suggest to add the Spark version. It's already there. I advocate for collecting as much build data as possible to detect when we need to introduce a new shim.

gerashegalov added bug Something isn't working ? - Needs Triage Need team to review and classify labels Sep 17, 2021

gerashegalov added this to the Sep 13 - Sep 24 milestone Sep 19, 2021

gerashegalov removed this from the Sep 13 - Sep 24 milestone Sep 20, 2021

Salonijain27 assigned jlowe Sep 28, 2021

Salonijain27 removed the ? - Needs Triage Need team to review and classify label Sep 28, 2021

jlowe linked a pull request Oct 8, 2021 that will close this issue

Add shim for Databricks 9.1 [databricks] #3767

Merged

jlowe closed this as completed in #3767 Nov 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] DatabricksShimVersion must carry runtime version info #3532

[BUG] DatabricksShimVersion must carry runtime version info #3532

gerashegalov commented Sep 17, 2021 •

edited

Loading

tgravescs commented Sep 17, 2021

gerashegalov commented Sep 20, 2021

jlowe commented Sep 20, 2021

gerashegalov commented Sep 20, 2021 •

edited

Loading

[BUG] DatabricksShimVersion must carry runtime version info #3532

[BUG] DatabricksShimVersion must carry runtime version info #3532

Comments

gerashegalov commented Sep 17, 2021 • edited Loading

tgravescs commented Sep 17, 2021

gerashegalov commented Sep 20, 2021

jlowe commented Sep 20, 2021

gerashegalov commented Sep 20, 2021 • edited Loading

gerashegalov commented Sep 17, 2021 •

edited

Loading

gerashegalov commented Sep 20, 2021 •

edited

Loading