-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature/databricks-sql-warehouse-compatibility #121
feature/databricks-sql-warehouse-compatibility #121
Conversation
{% macro is_databricks_sql_warehouse(target) %} | ||
{% if target.type in ('databricks','spark') %} | ||
{% set re = modules.re %} | ||
{% set path_match = target.http_path %} | ||
{% set regex_pattern = "/sql/.+/warehouses/" %} | ||
{% set match_result = re.search(regex_pattern, path_match) %} | ||
{% if match_result %} | ||
{{ return(True) }} | ||
{% else %} | ||
{{ return(False) }} | ||
{% endif %} | ||
{% else %} | ||
{{ return(False) }} | ||
{% endif %} | ||
{% endmacro %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also have an open question on dbt Slack which asks if there is a better way to do this natively using the dbt-databricks adapter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems per dbt Slack that this is the preferred route.
if [ "$db" = "databricks-sql" ]; then | ||
dbt seed --vars '{fivetran_platform_schema: sqlw_tests}' --target "$db" --full-refresh | ||
dbt compile --vars '{fivetran_platform_schema: sqlw_tests}' --target "$db" | ||
dbt run --vars '{fivetran_platform_schema: sqlw_tests}' --target "$db" --full-refresh | ||
dbt run --vars '{fivetran_platform_schema: sqlw_tests}' --target "$db" | ||
dbt test --vars '{fivetran_platform_schema: sqlw_tests}' --target "$db" | ||
dbt run --vars '{fivetran_platform_schema: sqlw_tests, fivetran_platform__usage_pricing: true}' --target "$db" --full-refresh | ||
dbt run --vars '{fivetran_platform_schema: sqlw_tests, fivetran_platform__usage_pricing: true}' --target "$db" | ||
dbt test --target "$db" | ||
dbt run --vars '{fivetran_platform_schema: sqlw_tests, fivetran_platform__credits_pricing: false, fivetran_platform__usage_pricing: true}' --target "$db" --full-refresh | ||
dbt run --vars '{fivetran_platform__credits_pricing: false, fivetran_platform__usage_pricing: true}' --target "$db" | ||
dbt test --vars '{fivetran_platform_schema: sqlw_tests}' --target "$db" | ||
dbt run --vars '{fivetran_platform_schema: sqlw_tests, fivetran_platform__usage_pricing: false, fivetran_platform_using_destination_membership: false, fivetran_platform_using_user: false}' --target "$db" --full-refresh | ||
dbt run --vars '{fivetran_platform_schema: sqlw_tests, fivetran_platform__usage_pricing: false, fivetran_platform_using_destination_membership: false, fivetran_platform_using_user: false}' --target "$db" | ||
dbt test --vars '{fivetran_platform_schema: sqlw_tests}' --target "$db" | ||
else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is all necessary as the SQL Warehouse still uses the same hive_metastore
and comes into conflict with the All Purpose Cluster when they both run in parallel with the same schema.
Therefore, we need to explicitly use a different schema for the two.
@@ -24,7 +24,7 @@ vars: | |||
|
|||
models: | |||
fivetran_log: | |||
+schema: fivetran_platform | |||
+schema: "{{ 'sqlw_tests' if target.name == 'databricks-sql' else 'fivetran_platform' }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an artifact of needing to use two different schemas for the Databricks jobs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the changes to the readme and changelog, and also had one small suggestion for the audit table config, but otherwise this is looking great!
I confirm I was able to reproduce the issue locally on Databricks SQL Warehouse, and it was resolved by this fix, with the audit model materialized as a table.
For good measure, I also ran this locally on SQL server since it was mentioned in your review notes, and all looks good there!
Co-authored-by: fivetran-catfritz <[email protected]>
Thanks @fivetran-catfritz! I agree with your suggestion and committed it to the branch. I also really appreciate your README updates 🙏. I made one small wording change to be consistent with the terminology of the runtime name (Datarbricks All Purpose Cluster). Lastly, I regenerated the docs as the incremental file format change needed to be picked up in the docs. Let me know if there are any other comments needed before approving. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great--approved!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left some mostly doc-related comments!
@@ -0,0 +1,15 @@ | |||
{% macro is_databricks_sql_warehouse(target) %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we may want to add this to fivetran_utils in the future since Databricks SQL Warehouse folks using other packages with incremental models will have the same issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. This is something we will migrate there in the future.
Jamie review notes Co-authored-by: Jamie Rodriguez <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no other comments!
PR Overview
This PR will address the following Issue/Feature: Issue #120
This PR will result in the following new package version:
v1.7.1
This will not impact existing users who are not using Databricks SQL Warehouse runtimes. A SQL Warehouse runtime user will never have seen success. Therefore, this fix is not breaking and should ensure they may now see success.
Please provide the finalized CHANGELOG entry which details the relevant changes included in this PR:
PR Checklist
Basic Validation
Please acknowledge that you have successfully performed the following commands locally:
Before marking this PR as "ready for review" the following have been applied:
Detailed Validation
Please share any and all of your validation steps:
For validating these changes I wanted to ensure the following in relation to the
fivetran_platform__audit_table
model:See the validations below:
SQL Warehouse doesn't have any incremental strategy associated with it. You can see there is no incremental strategy used as the incremental run compiled code does not include the
where
statement that would identify the incremental strategy is being used.I was then able to verify for all other warehouses that the incremental strategy was working as expected. You can see the compiled code for all the warehouses is returning the expected results.
If you had to summarize this PR in an emoji, which would it be?
🧱