Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: don't force db connect if using serverless #3781

Merged

Conversation

eakmanrq
Copy link
Contributor

@eakmanrq eakmanrq commented Feb 3, 2025

Prior to this PR, if a user said they wanted to use Serverless for databricks-connect then it forced the use of databricks-connect and therefore one could not use the python sql connector. In addition the documentation said that the SQL connector did not support Databricks Serverless Compute which was misleading - although it doesn't support the workspace side Serverless, typically used by Notebooks and Jobs, it does support SQL Warehouse Serverless compute.

Therefore a user could have wanted to use serverless across their stack - Serverless compute for jobs that require PySpark DataFrame and SQL Warehouse Serverless for their SQL queries. This PR now enables this pattern.

One key limitation it works around was temporary objects - since serverless doesn't support global temporary objects, and instead requires session temporary objects, there was an issue mixing databricks-connect and Python SQL connector across the serverless products since they couldn't share this state. This PR resolves this by recording in session connection metadata if a temporary objects was made in a databricks-connect session and if so it will force using databricks-connect for the remainder of the session.

This PR also adds improvements to documentation, removes excess log output in the console, and improves error message if the user has different default catalogs across their SQL and databricks-connect sessions.

Initial PR that added serverless support for context: #3001

@eakmanrq eakmanrq requested review from treysp and izeigerman February 3, 2025 22:47
SQLMesh's Databricks Connect implementation supports Databricks Runtime 13.0 or higher. If SQLMesh detects that you have Databricks Connect installed, then it will use it for all Python models (both Pandas and PySpark DataFrames).
If SQLMesh detects that you have Databricks Connect installed, then it will automatically configure the connection and use it for all Python models that return a Pandas or PySpark DataFrame.

To have databricks-connect installed but ignored by SQLMesh, set `disable_databricks_connect` to `true` in the connection configuration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When/why would someone want to have it installed but ignored?

Just needing installed in the env for a non-sqlmesh reason, or people would switch back and forth between using it in sqlmesh and not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just needing installed in the env for a non-sqlmesh reason

Yeah this is what I am thinking of. They use a single python environment for their work and therefore they don't want to uninstall it just so SQLMesh behaves as they expect.

@eakmanrq eakmanrq force-pushed the eakmanrq/improvements_databricks_serverless_handling branch from a8ccac5 to 0619fd3 Compare February 4, 2025 17:58
@eakmanrq eakmanrq force-pushed the eakmanrq/improvements_databricks_serverless_handling branch from 0619fd3 to 9977f31 Compare February 4, 2025 17:59
@eakmanrq eakmanrq enabled auto-merge (squash) February 4, 2025 18:00
@eakmanrq eakmanrq merged commit 591645c into main Feb 4, 2025
21 checks passed
@eakmanrq eakmanrq deleted the eakmanrq/improvements_databricks_serverless_handling branch February 4, 2025 18:08
izeigerman added a commit that referenced this pull request Feb 4, 2025
@eakmanrq eakmanrq restored the eakmanrq/improvements_databricks_serverless_handling branch February 4, 2025 20:57
@eakmanrq eakmanrq deleted the eakmanrq/improvements_databricks_serverless_handling branch February 4, 2025 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants