Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-1813] [CT-1378] [Bug] Python models not picking up custom schema #393

Closed
2 tasks done
kellyldougan opened this issue Oct 20, 2022 · 32 comments · Fixed by #476, #506, #507, #508 or #509
Closed
2 tasks done

[CT-1813] [CT-1378] [Bug] Python models not picking up custom schema #393

kellyldougan opened this issue Oct 20, 2022 · 32 comments · Fixed by #476, #506, #507, #508 or #509
Labels
type:bug Something isn't working

Comments

@kellyldougan
Copy link

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

When I try to create a python model using v1.3, I get an error that a schema was not provided. I have set a schema in my dbt_project.yml file for the directory that the model is in, and have the generate_schema_name set to change the schema based on the environment.

Expected Behavior

The python model will run and create a model in the same schema as sql models in the same directory

Steps To Reproduce

  1. Set the dbt_project.yml to a use a given schema, e.g.
models:
  dbt:
    datamarts:
      dim:
        schema: dim
  1. add a python file dbt/models/datamarts/dim/test_python.py in that directory, e.g:
def model(dbt, session):
    dbt.config(materialized="table")
    df = dbt.ref("my_other_model").to_pandas()
    return df
  1. create the macros generate_schema_name_for_env and generate_schema_name:
{% macro generate_schema_name_for_env(custom_schema_name, node) -%}
    {%- set default_schema = target.schema -%}
    {%- if target.name = 'prod' and custom_schema_name is not none -%}
        {{ custom_schema_name | trim }}
    {%- else -%}
        {{ default_schema }}_{{custom_schema_name | trim}}
    {%- endif -%}
{%- endmacro %}


{% macro generate_schema_name(custom_schema_name, node) -%}
    {{ generate_schema_name_for_env(custom_schema_name, node) }}
{%- endmacro %}
  1. Run the python model

Relevant log output

"/usr/lib/python_udf/431102875043084cdb1968114cbff94386b72d5e184e97acf96b1780c382f509/lib/python3.8/site-packages/snowflake/connector/pandas_tools.py", line 122, in write_pandas
      raise ProgrammingError(
  snowflake.connector.errors.ProgrammingError: Schema has to be provided to write_pandas when a database is provided
   in function TEST_PYTHON__DBT_SP with handler main
  compiled Code at target/run/dbt/models/datamarts/dim/test_python.py

Environment

- OS: dbt Cloud IDE
- Python: python in dbt cloud IDE
- dbt: 1.3

Which database adapter are you using with dbt?

snowflake

Additional Context

No response

@kellyldougan kellyldougan added type:bug Something isn't working triage:product labels Oct 20, 2022
@github-actions github-actions bot changed the title [Bug] Python models not picking up custom schema [CT-1378] [Bug] Python models not picking up custom schema Oct 20, 2022
@lostmygithubaccount lostmygithubaccount self-assigned this Oct 23, 2022
@ChenyuLInx
Copy link
Contributor

Found this after a quick search. Might be a bug on the connector side(although I am not too sure what is the connector version being used here in stored procedure). Will dig more when get to it.

@jtcohen6
Copy link
Contributor

Good find, Chenyu!

It sounds like this could be related to a bug in snowflake-connector-python, which will be fixed in a release coming in Jan 2023. We can try again then, and see if the issue persists.

@leahwicz
Copy link
Contributor

Will close out once we confirm that Snowflake has fixed this on their end
snowflakedb/snowflake-connector-python#1034 (comment)

@jtcohen6 jtcohen6 removed their assignment Nov 18, 2022
@lostmygithubaccount
Copy link

bumping this from another community report: https://getdbt.slack.com/archives/C03QUA7DWCW/p1673461860389249

@jtcohen6 looks like the underlying Snowflake issues is fixed, could we fix here as well?

@kellyldougan
Copy link
Author

+1 when/in what version can we expect this fix to get merged in?

@jtcohen6
Copy link
Contributor

It looks like the linked issue/fix on Snowflake's side (snowflakedb/snowflake-connector-python#1274) will be included in a new version of dbt-snowflake, either v2.9 released in December, or a new release coming later this month.

We still need to test compatibility with that new version in dbt-snowflake. I'm going to transfer this issue over to the dbt-snowflake repo

@jtcohen6 jtcohen6 transferred this issue from dbt-labs/dbt-core Jan 13, 2023
@github-actions github-actions bot changed the title [CT-1378] [Bug] Python models not picking up custom schema [CT-1813] [CT-1378] [Bug] Python models not picking up custom schema Jan 13, 2023
@niclas-roos-quinyx
Copy link

I'm getting the same error when running my project on dbt cloud. It works perfectly when building from the IDE but I get this error when I try running it in production.

The SPs from the build are left in my production schema and I tried calling them manually from the snowflake UI and that works, if I've first selected the correct database and schema, maybe that it also a hint about what's going wrong?

@dweaver33
Copy link

dweaver33 commented Jan 31, 2023

Seeing this error as well since moving to custom schemas. Any workarounds?

Using dbt-snowflake: 1.3.0
Core: 1.3.2

Edit:
Same error with the following versions:
dbt-snowflake: 1.4.0
Core: 1.4.1

@bama-troyel
Copy link

Issue still persists with dbt-snowflake 1.4.1. Any update this issue? It was said here that this was to be fixed in 1.4.1.

@jtcohen6
Copy link
Contributor

jtcohen6 commented Feb 9, 2023

The (hopeful) fix for this was included in snowflake-connector-python==3.0.0 just a few weeks ago (after the dbt Core v1.4 release): https://community.snowflake.com/s/article/Snowflake-Connector-for-Python-Release-Notes

Since it's a major version bump, there's some additional due diligence warranted before including the upgrade in a v1.4.x patch release.

@Willchancot
Copy link

Seems that the problem still exists in dbt-snowflake v1.4.1.

@Fleid
Copy link
Contributor

Fleid commented Feb 13, 2023

Hey all, we're lining this up for our next sprint. So hopefully we should have something out in the next 2 weeks.

@mikealfare
Copy link
Contributor

I believe #476 resolves this issue. However, that just pushes into main. We'll need to backport this to released versions (e.g. 1.3.latest) before it shows up in dbt Cloud. Just calling that out here in case merging #476 closes this issue via automation.

@mikealfare
Copy link
Contributor

Re-opening; this was closed due to automation. Once we've confirmed the changes in the backport we can close this.

@ernestoongaro
Copy link
Contributor

ernestoongaro commented Mar 23, 2023 via email

@Fleid
Copy link
Contributor

Fleid commented Mar 24, 2023

I feel like there's multiple things going on here.

First, @Tonayya, @bama-troyel, the schema config should go in the config block:

- name: dim_trial_data
    description: dim_trial_data
    config:
        materialized: table
        schema: dw
        tags: ['python']

Just covering the basis. An alternative is to put that in the Python model file itself:

def model(dbt, session):
    dbt.config(
        materialized="table",
        schema ="dw"
    )

Then @ernestoongaro, what happens if you have access to the base schema as well as the overridden one, and you don't drop anything during the run? ;)
Just checking that the base scenario - everything exists and you have access to it - does work, before going into the weeds.

@satpai1215
Copy link

satpai1215 commented Apr 19, 2023

Is there any update on this issue or any workarounds? I am experiencing the same error as described in the opening and the last few most recent messages when running dbt run --models <my_python_model> on Python models

snowflake.connector.errors.ProgrammingError: Schema has to be provided to write_pandas when a database is provided

Should I expect this to be fixed? I'm using the following library versions:

dbt-core 1.4.5
dbt-snowflake 1.4.2
snowflake-connector-python 3.0.0

Please let me know if further information would be helpful

@eaphilli
Copy link

eaphilli commented Apr 20, 2023

I am also on:

dbt-core 1.4.5
dbt-snowflake 1.4.2
snowflake-connector-python 3.0.0

My custom schema is DEV_MODEL_python_model_test
The default schema is ADMIN

My python model:

def model(dbt, session):
    """
    Entry point for dbt
    """
    dbt.config(materialized = "table")
    df = dbt.ref("users")
    pandas_df = df.limit(100).to_pandas()
    return pandas_df

The error:

02:10:48  Database Error in model python_model (models/model/python_model_test/python_model.py)
02:10:48    100357 (P0000): Python Interpreter Error:
02:10:48    Traceback (most recent call last):
02:10:48      File "_udf_code.py", line 94, in main
02:10:48      File "_udf_code.py", line 88, in materialize
02:10:48      File "/usr/lib/python_udf/694df3cf538f091a277e3df26e69af371a3630fe8656f8ad7d386069b297bf34/lib/python3.8/site-packages/snowflake/snowpark/session.py", line 1453, in create_dataframe
02:10:48        t = self.write_pandas(
02:10:48      File "/usr/lib/python_udf/694df3cf538f091a277e3df26e69af371a3630fe8656f8ad7d386069b297bf34/lib/python3.8/site-packages/snowflake/snowpark/session.py", line 1367, in write_pandas
02:10:48        raise pe
02:10:48      File "/usr/lib/python_udf/694df3cf538f091a277e3df26e69af371a3630fe8656f8ad7d386069b297bf34/lib/python3.8/site-packages/snowflake/snowpark/session.py", line 1346, in write_pandas
02:10:48        success, nchunks, nrows, ci_output = write_pandas(
02:10:48      File "/usr/lib/python_udf/694df3cf538f091a277e3df26e69af371a3630fe8656f8ad7d386069b297bf34/lib/python3.8/site-packages/snowflake/connector/pandas_tools.py", line 223, in write_pandas
02:10:48        cursor.execute(file_format_sql, _is_internal=True)
02:10:48      File "/usr/lib/python_udf/694df3cf538f091a277e3df26e69af371a3630fe8656f8ad7d386069b297bf34/lib/python3.8/site-packages/snowflake/connector/cursor.py", line 829, in execute
02:10:48        Error.errorhandler_wrapper(
02:10:48      File "/usr/lib/python_udf/694df3cf538f091a277e3df26e69af371a3630fe8656f8ad7d386069b297bf34/lib/python3.8/site-packages/snowflake/connector/errors.py", line 232, in errorhandler_wrapper
02:10:48        handed_over = Error.hand_to_other_handler(
02:10:48      File "/usr/lib/python_udf/694df3cf538f091a277e3df26e69af371a3630fe8656f8ad7d386069b297bf34/lib/python3.8/site-packages/snowflake/connector/errors.py", line 287, in hand_to_other_handler
02:10:48        cursor.errorhandler(connection, cursor, error_class, error_value)
02:10:48      File "/usr/lib/python_udf/694df3cf538f091a277e3df26e69af371a3630fe8656f8ad7d386069b297bf34/lib/python3.8/site-packages/snowflake/connector/errors.py", line 165, in default_errorhandler
02:10:48        raise error_class(
02:10:48    snowflake.connector.errors.ProgrammingError: 003001 (42501): SQL access control error:
02:10:48    Insufficient privileges to operate on schema 'ADMIN'
02:10:48     in function PYTHON_MODEL__DBT_SP with handler main
02:10:48    compiled Code at target/run/consumer_dbt/models/model/python_model_test/python_model.py
02:10:48  
02:10:48  Finished running 1 table model in 0 hours 0 minutes and 24.04 seconds (24.04s).
02:10:48  Encountered an error:
FailFast Error in model python_model (models/model/python_model_test/python_model.py)
  Failing early due to test failure or runtime error
Error: Process completed with exit code 2.

I've tried to write the pandas_df to a snowpark df by running write_pandas e.g.:

def model(dbt, session):
    """
    Entry point for dbt
    """
    dbt.config(materialized = "table")
    df = dbt.ref("participant_characteristics_wide")
    pandas_df = df.limit(100).to_pandas()
    new_df = session.write_pandas(
        df = pandas_df,
        table_name = f'{dbt.this.identifier}_temp',
        database = dbt.this.database,
        schema = dbt.this.schema,
        auto_create_table = True,
        table_type = 'temp',
        overwrite = True
    )
    return new_df

But then I get a new error which seems to have a syntax issue in the schema name (extra quotation marks):

02:51:40  Database Error in model python_model (models/model/python_model_test/python_model.py)
02:51:40    100357 (P0000): Python Interpreter Error:
02:51:40    Traceback (most recent call last):
02:51:40      File "_udf_code.py", line 102, in main
02:51:40      File "_udf_code.py", line 16, in model
02:51:40      File "/usr/lib/python_udf/694df3cf538f091a277e3df26e69af371a3630fe8656f8ad7d386069b297bf34/lib/python3.8/site-packages/snowflake/snowpark/session.py", line 1367, in write_pandas
02:51:40        raise pe
02:51:40      File "/usr/lib/python_udf/694df3cf538f091a277e3df26e69af371a3630fe8656f8ad7d386069b297bf34/lib/python3.8/site-packages/snowflake/snowpark/session.py", line 1346, in write_pandas
02:51:40        success, nchunks, nrows, ci_output = write_pandas(
02:51:40      File "/usr/lib/python_udf/694df3cf538f091a277e3df26e69af371a3630fe8656f8ad7d386069b297bf34/lib/python3.8/site-packages/snowflake/connector/pandas_tools.py", line 202, in write_pandas
02:51:40        cursor.execute(drop_table_sql, _is_internal=True)
02:51:40      File "/usr/lib/python_udf/694df3cf538f091a277e3df26e69af371a3630fe8656f8ad7d386069b297bf34/lib/python3.8/site-packages/snowflake/connector/cursor.py", line 829, in execute
02:51:40        Error.errorhandler_wrapper(
02:51:40      File "/usr/lib/python_udf/694df3cf538f091a277e3df26e69af371a3630fe8656f8ad7d386069b297bf34/lib/python3.8/site-packages/snowflake/connector/errors.py", line 232, in errorhandler_wrapper
02:51:40        handed_over = Error.hand_to_other_handler(
02:51:40      File "/usr/lib/python_udf/694df3cf538f091a277e3df26e69af371a3630fe8656f8ad7d386069b297bf34/lib/python3.8/site-packages/snowflake/connector/errors.py", line 287, in hand_to_other_handler
02:51:40        cursor.errorhandler(connection, cursor, error_class, error_value)
02:51:40      File "/usr/lib/python_udf/694df3cf538f091a277e3df26e69af371a3630fe8656f8ad7d386069b297bf34/lib/python3.8/site-packages/snowflake/connector/errors.py", line 165, in default_errorhandler
02:51:40        raise error_class(
02:51:40    snowflake.connector.errors.ProgrammingError: 002003 (02000): SQL compilation error:
02:51:40    Schema 'STAGE."DEV_MODEL_python_model_test"' does not exist or not authorized.
02:51:40     in function PYTHON_MODEL__DBT_SP with handler main
02:51:40    compiled Code at target/run/consumer_dbt/models/model/python_model_test/python_model.py
02:51:40  
02:51:40  Finished running 1 table model in 0 hours 0 minutes and 23.87 seconds (23.87s).
02:51:40  Encountered an error:
FailFast Error in model python_model (models/model/python_model_test/python_model.py)
  Failing early due to test failure or runtime error
Error: Process completed with exit code 2.

Oddly, when I try setting quote_identifiers = False in the write_pandas function, I get the first error (the default schema) again.

@Fleid
Copy link
Contributor

Fleid commented Apr 20, 2023

Hey @satpai1215 and @eaphilli,

To cover the basics:

  • Custom schema values are concatenated to the base target schema, so @eaphilli your resulting schema will be ADMIN_ DEV_MODEL_python_model_test, not DEV_MODEL_python_model_test. I'm wondering if in your case the resulting schema doesn't exist and you don't have the privilege to create the custom schema, resulting in a weird behavior
  • I can't reproduce the issue in the versions you are both using with the simplest setup detailed below. Could you please both try to see if that works for you too?

Starting from a new project (dbt init), configure a schema in your profiles.

Then add the following files in your project:
In /models, my_python_model.py:

def model(dbt, session):
    dbt.config(
        materialized="table",
        schema="customSchemaSuffix"
    )

    df = dbt.ref("my_first_dbt_model")
    
    return df

In /models, my_python_model.yml:

version: 2

models:
  - name: my_python_model
    description: my_python_model
    config:
        materialized: table
        schema: customSchemaSuffix
        tags: ['python']

You can define the custom schema property in either the property file or the model file. Both should work.

Do make sure that all of the following schemas exist and you have access to them (or you have enough privilege to have dbt create them on the fly):

  • baseSchema
  • baseSchema_customSchemaSuffix

@satpai1215
Copy link

satpai1215 commented Apr 20, 2023

Thank you for your response here!

I can confirm that your base case scenario is working on my end. I've isolated the issue to problems occurring when returning a newly created dataframe from the model function (as opposed to one derived from dbt.ref()).

Here is my test case below, that is returning the error that follows. Is there something I should be doing differently here? (apologies if this is not the correct forum for these types inquiries):

def model(dbt, session):
    dbt.config(
        materialized="table",
        schema="MY_CUSTOM_SCHEMA",
        alias="_TEST_PYTHON_MODEL",
    )

    df = pd.DataFrame({
        'City': ['Buenos Aires', 'Brasilia', 'Santiago', 'Bogota', 'Caracas'],
        'Country': ['Argentina', 'Brazil', 'Chile', 'Colombia', 'Venezuela'],
        'Latitude': [-34.58, -15.78, -33.45, 4.60, 10.48],
        'Longitude': [-58.66, -47.91, -70.66, -74.08, -66.86]
    })
    
    return df

Error Trace

15:53:50  Database Error in model TEST_PYTHON_MODEL (models/python/TEST_PYTHON_MODEL.py)
15:53:50    100357 (P0000): Python Interpreter Error:
15:53:50    Traceback (most recent call last):
15:53:50      File "_udf_code.py", line 108, in main
15:53:50      File "_udf_code.py", line 102, in materialize
15:53:50      File "/usr/lib/python_udf/694df3cf538f091a277e3df26e69af371a3630fe8656f8ad7d386069b297bf34/lib/python3.8/site-packages/snowflake/snowpark/session.py", line 1453, in create_dataframe
15:53:50        t = self.write_pandas(
15:53:50      File "/usr/lib/python_udf/694df3cf538f091a277e3df26e69af371a3630fe8656f8ad7d386069b297bf34/lib/python3.8/site-packages/snowflake/snowpark/session.py", line 1367, in write_pandas
15:53:50        raise pe
15:53:50      File "/usr/lib/python_udf/694df3cf538f091a277e3df26e69af371a3630fe8656f8ad7d386069b297bf34/lib/python3.8/site-packages/snowflake/snowpark/session.py", line 1346, in write_pandas
15:53:50        success, nchunks, nrows, ci_output = write_pandas(
15:53:50      File "/usr/lib/python_udf/694df3cf538f091a277e3df26e69af371a3630fe8656f8ad7d386069b297bf34/lib/python3.8/site-packages/snowflake/connector/pandas_tools.py", line 124, in write_pandas
15:53:50        raise ProgrammingError(
15:53:50    snowflake.connector.errors.ProgrammingError: Schema has to be provided to write_pandas when a database is provided
15:53:50     in function _TEST_PYTHON_MODEL__DBT_SP with handler main
15:53:50    compiled Code at /models/python/TEST_PYTHON_MODEL.py
15:53:50  
15:53:50  Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

@jtcohen6
Copy link
Contributor

@patkearns10 and I had a chance to live-debug this with a user who's been running into the issue, and I think we got to the bottom of it! I'm still not sure why this is cropping up for only some users, and not others (me!) — and I'm pretty sure the root-cause fix here was (or should have been) in snowflakedb/snowflake-connector-python#1274 — but this managed to reliably solve the problem.

The quickest fix was to add two lines within the Python model code, right before the return of the Pandas dataframe:

def model(dbt, session):
    dbt.config(schema="custom_schema")
    pandas_df = dbt.ref("my_model").to_pandas()
    
    # add these lines
    session.use_database(dbt.this.database)
    session.use_schema(dbt.this.schema)
    
    return pandas_df

I believe we could just fix this for everyone by updating these lines to include those session.use_* calls as well:

if isinstance(df, pandas.core.frame.DataFrame):
# session.write_pandas does not have overwrite function
df = session.createDataFrame(df)

Namely, by changing that to:

        if isinstance(df, pandas.core.frame.DataFrame):
          session.use_database(target_relation.database)
          session.use_schema(target_relation.schema)
          # session.write_pandas does not have overwrite function
          df = session.createDataFrame(df)

@Fleid
Copy link
Contributor

Fleid commented May 9, 2023

Tracking this for our next sprint

@mikealfare mikealfare removed their assignment May 10, 2023
@aranke
Copy link
Member

aranke commented May 12, 2023

This issue also came up in community Slack: https://getdbt.slack.com/archives/CBSQTAPLG/p1683920034037219

@VersusFacit
Copy link
Contributor

I'm soon going to merge the fix Jeremy provided but admittedly on our system, as alluded to above, it's hard to recreate this. Even after this issue closes and the fix has been backported, I heartily encourage users that continue to find custom schema problems to speak up!

@waitsfornone
Copy link

I have just ran into this issue this week with switching our Python models to custom schemas. Is there an ETA or release version I can be looking for the fix to be released on? Thank you.

@Fleid
Copy link
Contributor

Fleid commented May 31, 2023

Hey @waitsfornone, this should be out right now in 1.3.2, 1.4.3 and 1.5.1.
Could you check it's fixed on your end and let me know?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment