Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] spark_catalog requires a single-part namespace in dbt python incremental model #553

Open
2 tasks done
carlos-veris opened this issue Jul 23, 2024 · 0 comments
Open
2 tasks done
Labels
feature:python-models Issues related to python models pkg:dbt-bigquery Issue affects dbt-bigquery type:bug Something isn't working as documented

Comments

@carlos-veris
Copy link

carlos-veris commented Jul 23, 2024

Is this a new bug in dbt-bigquery?

  • I believe this is a new bug in dbt-bigquery
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

When running a dbt python model with an incremental strategy and using the property dbt.this to access the location of the current model, the code breaks.

Here's the faulty code:

# Processs new rows only
if dbt.is_incremental:
    # only new rows compared to max in current table
    max_from_this = f"select max(created_at) from {dbt.this}"
    df = df.filter(df.created_at >= session.sql(max_from_this).collect()[0][0])

Here's the error output:

df = df.filter(df.created_at >= session.sql(max_from_this).collect()[0][0])
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 1034, in sql
File "/usr/lib/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in __call__
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 196, in deco
pyspark.sql.utils.AnalysisException: spark_catalog requires a single-part namespace, but got [x, y]

X is the project name and Y is the dataset name.
It is using the dbt-bigquery adapter (v.1.7.2) and uses dataproc to submit the Python model.

Expected Behavior

It is expected that one can make use of the aforementioned property in order to run incremental models.

Steps To Reproduce

  1. Python model using the dbt-bigquery adapter
  2. Under the model() function, set the materialized property of dbt.config to incremental
def model(dbt, session):
    dbt.config(
        materialized="incremental",
        dataproc_region=<DATAPROC_REGION>
        submission_method=<SUBMISSION_METHOD>
    )
  1. Try to use the dbt.this property
max_from_this = f"select max(created_at) from {dbt.this}
df = df.filter(df.created_at >= session.sql(max_from_this).collect()[0][0])
  1. Run the model using dbt run
  2. You should be able to check the logs in the dataproc batch, using the Google Cloud Console.

Relevant log output

Using the default container image
Waiting for container log creation
PYSPARK_PYTHON=/opt/dataproc/conda/bin/python
JAVA_HOME=/usr/lib/jvm/temurin-11-jdk-amd64
SPARK_EXTRA_CLASSPATH=
:: loading settings :: file = /etc/spark/conf/ivysettings.xml
/usr/lib/spark/python/lib/pyspark.zip/pyspark/pandas/__init__.py:49: UserWarning: 'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It is required to set this environment variable to '1' in both driver and executor sides if you use pyarrow>=2.0.0. pandas-on-Spark will set it for you but it does not work if there is a Spark context already launched.
Traceback (most recent call last):
  File "/var/dataproc/tmp/srvls-batch-0c5d7153-2f67-4614-86b2-1ed2f1264837/<PYTHON-MODEL.py>", line 264, in <module>
    df = model(dbt, spark)
  File "/var/dataproc/tmp/srvls-batch-0c5d7153-2f67-4614-86b2-1ed2f1264837/<PYTHON-MODEL.py>", line 165, in model
    df = df.filter(df.created_at >= session.sql(max_from_this).collect()[0][0])
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 1034, in sql
  File "/usr/lib/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in __call__
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 196, in deco
pyspark.sql.utils.AnalysisException: spark_catalog requires a single-part namespace, but got [x, y]

Environment

dbt-core: 1.7.2
dbt-bigquery: 1.7.2

Additional Context

References:

@carlos-veris carlos-veris added type:bug Something isn't working as documented triage:product In Product's queue labels Jul 23, 2024
@amychen1776 amychen1776 added feature:python-models Issues related to python models and removed triage:product In Product's queue labels Aug 28, 2024
@mikealfare mikealfare added the pkg:dbt-bigquery Issue affects dbt-bigquery label Jan 14, 2025
@mikealfare mikealfare transferred this issue from dbt-labs/dbt-bigquery Jan 14, 2025
colin-rogers-dbt pushed a commit that referenced this issue Feb 3, 2025

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com>
Co-authored-by: Serhii Dimchenko <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature:python-models Issues related to python models pkg:dbt-bigquery Issue affects dbt-bigquery type:bug Something isn't working as documented
Projects
None yet
Development

No branches or pull requests

3 participants