Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] force correct 'use' when deploying #1051

Open
3 tasks done
AartBluestoke opened this issue May 17, 2024 · 5 comments
Open
3 tasks done

[Feature] force correct 'use' when deploying #1051

AartBluestoke opened this issue May 17, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@AartBluestoke
Copy link

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt-snowflake functionality, rather than a Big Idea better suited to a discussion

Describe the feature

snowflake when deploying models that target other schemas or database, doesn't prefix its command with 'USE' so commands can executed from random contexts.

eg #1047 would not be an issue if the rename statement was issued from the context of useing the database and schema of the target model
#1017 also shows an error with 'Cannot perform CREATE TABLE. This session does not have a current schema. Call 'USE SCHEMA', or use a qualified name.'

Describe alternatives you've considered

i have a partially mitagated this with
+pre_hook: "USE {{target.name}}_{{ target.database }};"

Who will this benefit?

anyone who writes a view, or has code execute, and expects the code to be run from the context of the model
(create view statements with hard coded targets can have funny outcomes if executed from the wrong context)

Are you interested in contributing this feature?

No response

Anything else?

this should take into account model specific custom database and schema so that when a model is deployed to a unexpected location there are no surprises.

@AartBluestoke AartBluestoke added enhancement New feature or request triage labels May 17, 2024
@jtcohen6
Copy link
Contributor

jtcohen6 commented May 17, 2024

@AartBluestoke Is the idea that dbt would, before a model's materialization, always call use {{ database }}.{{ schema }} for the model's configured database + schema?

This could happen as a pre_model_hook / post_model_hook, which is where we:

But I believe the real solution is for dbt-snowflake to always use fully qualified names. I haven't seen this come up as an issue at all, outside the specific context of dynamic tables, as reflected in this bug (of which #1017 and #1047 are both duplicates):

I'd rather just fix that specific bug. But please let me know if there's something I'm missing!

@AartBluestoke
Copy link
Author

@jtcohen6 i'm (from the one dbt project) deploying to multiple databases and schemas, and i've had to use the pre-hook to allow some queries to work.

IMO any code that works without a databases or schema override config should continue to work when one is set - and this is currently not true, but that use statement to run the queries from the context of the requested database and schema would reduce the friction here.

I'm targeting databases dev_landing and dev_datamart - the 'dev' prefix is parameterised with the current target. with some models in one, some in the other.

I don't want to use fully qualified names in my views, etc ... (specifically, i don't want the database component, due to wanting code reusability between dev, test, and prod).

@jtcohen6
Copy link
Contributor

@AartBluestoke Thanks for the reply - I have a few more questions:

i've had to use the pre-hook to allow some queries to work.

Which queries have these been? As I said above, my impression is that this is only an issue for dynamic tables right now (due to #1031), because we should using fully qualified names everywhere else.

I don't want to use fully qualified names in my views, etc ... (specifically, i don't want the database component, due to wanting code reusability between dev, test, and prod).

Is there a reason why dbt's environment handling for {{ source() }} + {{ ref() }} is insufficient here? Are you putting raw identifier names (schema.identifier) directly into your code? Or is it a matter of needing to clone / promote (alter-swap) the same already-created view objects across environments?

@AartBluestoke
Copy link
Author

AartBluestoke commented May 21, 2024

re "are you putting raw identifier names (schema.identifier) directly into your code?" -- no
BUT i do have a custom ref generator that says 'if the source and target database in this ref are the same, don't output the database'
This way i create a database that can be schema-cloned from prod to test, and not have views pointing from dev to prod

  {% macro ref(model_name) %}
     {% set m_reff=builtins.ref(model_name) %}
      {% if m_reff.database==model.database %}
        {% do return(m_reff.include(database=false)) %}
      {% else %}
        {% do return(m_reff) %}
      {% endif %}
  {% endmacro %}`

Which works until you split some things between landing and datamart, and everything you put to datamart fails to release because the 'use' context is still landing, and it doesn't find the datamart objects to connect to.

To work around this i have several places where i set 'use' in this my codebase

@AartBluestoke
Copy link
Author

AartBluestoke commented May 22, 2024

I also have a custom snowflake materialiser that turns

{{ config(materialized="table_stream", target_lag='10 minutes') }}

select s.* 
from {{ stream_source(ref('journal_items')) }} s
{{ if_not_create() }}
// where we havn''t loaded those journal_id''s yet
where not exists (select 1 from {{ this.schema }}.{{ this.table }} j where  s.journal_date=j.journal_date and s.journal_id=j.journal_id and s.src_event_portal_uid=j.src_event_portal_uid)
{{ endif_not_create() }}

into

USE dev_DATAMART.data ;


/* first setup the stream */

  create or replace stream /*IF NOT EXISTS*/ dev_DATAMART.data.dbt_data_journal_items_to_journal_items_wide 
  on view  
    dev_DATAMART.data.journal_items;
	

    create or replace  table dev_DATAMART.data.journal_items_wide
     as
    (select s.* 
from data.journal_items s
    );

create or replace task dev_DATAMART.data.journal_items_wide_refresh
    warehouse = WH_AIRFLOW
    SCHEDULE = '10 minutes'
    ALLOW_OVERLAPPING_EXECUTION  = false
    WHEN 

SYSTEM$STREAM_HAS_DATA('data.dbt_data_journal_items_to_journal_items_wide') OR 
as 
      insert into data.journal_items_wide
select s.* 
from data.dbt_data_journal_items_to_journal_items_wide s
where not exists (select 1 from data.journal_items_wide j where  s.journal_date=j.journal_date and s.journal_id=j.journal_id and s.src_event_portal_uid=j.src_event_portal_uid)

;

ALTER TASK dev_DATAMART.data.journal_items_wide_refresh RESUME ; 

note the difference between the queries in the top and bottom - we create selecting the reff, the task reads from the stream we created on the reff, and also includes the extra where clause to guard against duplicates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants