[Feature] force correct 'use' when deploying #1051

AartBluestoke · 2024-05-17T06:52:41Z

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward extension of existing dbt-snowflake functionality, rather than a Big Idea better suited to a discussion

Describe the feature

snowflake when deploying models that target other schemas or database, doesn't prefix its command with 'USE' so commands can executed from random contexts.

eg #1047 would not be an issue if the rename statement was issued from the context of useing the database and schema of the target model
#1017 also shows an error with 'Cannot perform CREATE TABLE. This session does not have a current schema. Call 'USE SCHEMA', or use a qualified name.'

Describe alternatives you've considered

i have a partially mitagated this with
+pre_hook: "USE {{target.name}}_{{ target.database }};"

Who will this benefit?

anyone who writes a view, or has code execute, and expects the code to be run from the context of the model
(create view statements with hard coded targets can have funny outcomes if executed from the wrong context)

Are you interested in contributing this feature?

No response

Anything else?

this should take into account model specific custom database and schema so that when a model is deployed to a unexpected location there are no surprises.

jtcohen6 · 2024-05-17T10:45:01Z

@AartBluestoke Is the idea that dbt would, before a model's materialization, always call use {{ database }}.{{ schema }} for the model's configured database + schema?

This could happen as a pre_model_hook / post_model_hook, which is where we:

already call use warehouse (for the snowflake_warehouse config)
could call use secondary role ([ADAP-944] [Feature] Support secondary roles #803)

But I believe the real solution is for dbt-snowflake to always use fully qualified names. I haven't seen this come up as an issue at all, outside the specific context of dynamic tables, as reflected in this bug (of which #1017 and #1047 are both duplicates):

[Bug] Use qualified names in snowflake__get_rename_x_sql macros #1031

I'd rather just fix that specific bug. But please let me know if there's something I'm missing!

AartBluestoke · 2024-05-21T01:28:35Z

@jtcohen6 i'm (from the one dbt project) deploying to multiple databases and schemas, and i've had to use the pre-hook to allow some queries to work.

IMO any code that works without a databases or schema override config should continue to work when one is set - and this is currently not true, but that use statement to run the queries from the context of the requested database and schema would reduce the friction here.

I'm targeting databases dev_landing and dev_datamart - the 'dev' prefix is parameterised with the current target. with some models in one, some in the other.

I don't want to use fully qualified names in my views, etc ... (specifically, i don't want the database component, due to wanting code reusability between dev, test, and prod).

jtcohen6 · 2024-05-21T08:19:08Z

@AartBluestoke Thanks for the reply - I have a few more questions:

i've had to use the pre-hook to allow some queries to work.

Which queries have these been? As I said above, my impression is that this is only an issue for dynamic tables right now (due to #1031), because we should using fully qualified names everywhere else.

I don't want to use fully qualified names in my views, etc ... (specifically, i don't want the database component, due to wanting code reusability between dev, test, and prod).

Is there a reason why dbt's environment handling for {{ source() }} + {{ ref() }} is insufficient here? Are you putting raw identifier names (schema.identifier) directly into your code? Or is it a matter of needing to clone / promote (alter-swap) the same already-created view objects across environments?

AartBluestoke · 2024-05-21T23:12:56Z

re "are you putting raw identifier names (schema.identifier) directly into your code?" -- no
BUT i do have a custom ref generator that says 'if the source and target database in this ref are the same, don't output the database'
This way i create a database that can be schema-cloned from prod to test, and not have views pointing from dev to prod

  {% macro ref(model_name) %}
     {% set m_reff=builtins.ref(model_name) %}
      {% if m_reff.database==model.database %}
        {% do return(m_reff.include(database=false)) %}
      {% else %}
        {% do return(m_reff) %}
      {% endif %}
  {% endmacro %}`

Which works until you split some things between landing and datamart, and everything you put to datamart fails to release because the 'use' context is still landing, and it doesn't find the datamart objects to connect to.

To work around this i have several places where i set 'use' in this my codebase

AartBluestoke · 2024-05-22T02:06:46Z

I also have a custom snowflake materialiser that turns

{{ config(materialized="table_stream", target_lag='10 minutes') }}

select s.* 
from {{ stream_source(ref('journal_items')) }} s
{{ if_not_create() }}
// where we havn''t loaded those journal_id''s yet
where not exists (select 1 from {{ this.schema }}.{{ this.table }} j where  s.journal_date=j.journal_date and s.journal_id=j.journal_id and s.src_event_portal_uid=j.src_event_portal_uid)
{{ endif_not_create() }}

into

USE dev_DATAMART.data ;


/* first setup the stream */

  create or replace stream /*IF NOT EXISTS*/ dev_DATAMART.data.dbt_data_journal_items_to_journal_items_wide 
  on view  
    dev_DATAMART.data.journal_items;
	

    create or replace  table dev_DATAMART.data.journal_items_wide
     as
    (select s.* 
from data.journal_items s
    );

create or replace task dev_DATAMART.data.journal_items_wide_refresh
    warehouse = WH_AIRFLOW
    SCHEDULE = '10 minutes'
    ALLOW_OVERLAPPING_EXECUTION  = false
    WHEN 

SYSTEM$STREAM_HAS_DATA('data.dbt_data_journal_items_to_journal_items_wide') OR 
as 
      insert into data.journal_items_wide
select s.* 
from data.dbt_data_journal_items_to_journal_items_wide s
where not exists (select 1 from data.journal_items_wide j where  s.journal_date=j.journal_date and s.journal_id=j.journal_id and s.src_event_portal_uid=j.src_event_portal_uid)

;

ALTER TASK dev_DATAMART.data.journal_items_wide_refresh RESUME ;

note the difference between the queries in the top and bottom - we create selecting the reff, the task reads from the stream we created on the reff, and also includes the extra where clause to guard against duplicates.

AartBluestoke added enhancement New feature or request triage labels May 17, 2024

jtcohen6 added awaiting_response and removed triage labels May 17, 2024

github-actions bot added triage and removed awaiting_response labels May 21, 2024

jtcohen6 added awaiting_response and removed triage labels May 21, 2024

github-actions bot added triage and removed awaiting_response labels May 21, 2024

jtcohen6 mentioned this issue May 24, 2024

[Bug] Use fully qualified names in rename for tables and views #1060

Merged

4 tasks

amychen1776 removed the triage label Jul 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] force correct 'use' when deploying #1051

[Feature] force correct 'use' when deploying #1051

AartBluestoke commented May 17, 2024

jtcohen6 commented May 17, 2024 •

edited

Loading

AartBluestoke commented May 21, 2024

jtcohen6 commented May 21, 2024

AartBluestoke commented May 21, 2024 •

edited

Loading

AartBluestoke commented May 22, 2024 •

edited

Loading

[Feature] force correct 'use' when deploying #1051

[Feature] force correct 'use' when deploying #1051

Comments

AartBluestoke commented May 17, 2024

Is this your first time submitting a feature request?

Describe the feature

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?

jtcohen6 commented May 17, 2024 • edited Loading

AartBluestoke commented May 21, 2024

jtcohen6 commented May 21, 2024

AartBluestoke commented May 21, 2024 • edited Loading

AartBluestoke commented May 22, 2024 • edited Loading

jtcohen6 commented May 17, 2024 •

edited

Loading

AartBluestoke commented May 21, 2024 •

edited

Loading

AartBluestoke commented May 22, 2024 •

edited

Loading