Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] DatabricksShimVersion must carry runtime version info #3532

Closed
gerashegalov opened this issue Sep 17, 2021 · 4 comments · Fixed by #3767
Closed

[BUG] DatabricksShimVersion must carry runtime version info #3532

gerashegalov opened this issue Sep 17, 2021 · 4 comments · Fixed by #3767
Assignees
Labels
bug Something isn't working

Comments

@gerashegalov
Copy link
Collaborator

gerashegalov commented Sep 17, 2021

Describe the bug
Shim layer in the Plugin currently identifies Databricks runtime versions using the exposed Spark version only. However, this is not a 1:1 mapping https://docs.databricks.com/release-notes/runtime/releases.html.

Reviewing Spark WebUI in the runtime under "Environment" one can see a config key that we can use to retrieve the runtime version info without creating a binary dependency in Scala using

spark.conf.get("spark.databricks.clusterUsageTags.sparkVersion")
spark.databricks.clusterUsageTags.sparkVersion | 8.4.x-gpu-ml-scala2.12

Steps/Code to reproduce bug
https://nvidia.github.io/spark-rapids/docs/get-started/getting-started-databricks.html

Expected behavior
Shim layer should correctly identify the runtime version to handle Spark differences in runtime versions.

Environment details (please complete the following information)

  • Databricks
@gerashegalov gerashegalov added bug Something isn't working ? - Needs Triage Need team to review and classify labels Sep 17, 2021
@tgravescs
Copy link
Collaborator

note that may work if running an actual notebook but we may need to figure out something for testing

scala> spark.conf.get("spark.databricks.clusterUsageTags.sparkVersion")
java.util.NoSuchElementException: spark.databricks.clusterUsageTags.sparkVersion
at org.apache.spark.sql.internal.SQLConf.$anonfun$getConfString$3(SQLConf.scala:3429)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.internal.SQLConf.getConfString(SQLConf.scala:3429)
at org.apache.spark.sql.RuntimeConfig.get(RuntimeConfig.scala:77)
... 47 elided

@gerashegalov gerashegalov added this to the Sep 13 - Sep 24 milestone Sep 19, 2021
@gerashegalov
Copy link
Collaborator Author

We can grab it at build time and store as db-version.properties in under db shim dir:

$ grep spark.databricks.clusterUsageTags.sparkVersion /databricks/common/conf/deploy.conf
  spark.databricks.clusterUsageTags.sparkVersion = "7.3.x-gpu-ml-scala2.12"

For shims to work correctly, we have to have shims per runtime instead of per spark version to the tune of:
spark312db84, spark312db90

At run time. spark3XXdbYY.SparkShimServiceProvider can then load 'spark3XXdbYY/db-version.properties' for version matching.

@gerashegalov gerashegalov removed this from the Sep 13 - Sep 24 milestone Sep 20, 2021
@jlowe
Copy link
Contributor

jlowe commented Sep 20, 2021

How does grabbing deploy.conf settings and storing it in the shim at build time help Tom's example? We need to detect at runtime, not build time, what Databricks version is there. Tom's example shows that we can end up starting a Spark shell that doesn't have this property set, so how does having a number of them defined in Databricks shims help detect which one is correct? Seems like we're going to have to check the contents of /databricks/common/conf/deploy.conf at runtime (or at least fallback to that behavior if the config is not set).

I'm also not a fan of putting the Spark version in the Databricks shim version. They could change it at a whim (and have in the past, from 3.1.0 to 3.1.1). All that really matters is the Databricks runtime version, and that's what the Databricks user is more familiar with.

@gerashegalov
Copy link
Collaborator Author

gerashegalov commented Sep 20, 2021

I can be more explicit about the algorithm but hopefully it was clear where I am going with it:

  1. we have the right version at the build time /databricks/common/conf/deploy.conf that
  2. we need to change getSparkVersion to read from deploy.conf. To match the behavior in the notebook and to simplify the code we should just extract the property in test launcher and add it as --conf as opposed to reading deploy.conf from getSparkVersion directly
  3. Now each spark3XXdbYY.SparkShimServiceProvider can safely compare its build time version to the runtime version

I'm also not a fan of putting the Spark version in the Databricks shim version. They could change it at a whim (and have in the past, from 3.1.0 to 3.1.1). All that really matters is the Databricks runtime version, and that's what the Databricks user is more familiar with.

This is the point of this bug, the databricks ShimServiceProviders should compare the build-time and run-time databricks runtime versions such as 8.4.x-gpu-ml-scala2.12. To make the solution robust we should also look at spark version and look for any other available properties such as build timestamp to disambiguate the x component. Note that the issue does not suggest to add the Spark version. It's already there. I advocate for collecting as much build data as possible to detect when we need to introduce a new shim.

@Salonijain27 Salonijain27 removed the ? - Needs Triage Need team to review and classify label Sep 28, 2021
@jlowe jlowe linked a pull request Oct 8, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants