[Bug] spark_catalog requires a single-part namespace in dbt python incremental model #553

carlos-veris · 2024-07-23T15:07:24Z

Is this a new bug in dbt-bigquery?

I believe this is a new bug in dbt-bigquery
I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

When running a dbt python model with an incremental strategy and using the property dbt.this to access the location of the current model, the code breaks.

Here's the faulty code:

# Processs new rows only
if dbt.is_incremental:
    # only new rows compared to max in current table
    max_from_this = f"select max(created_at) from {dbt.this}"
    df = df.filter(df.created_at >= session.sql(max_from_this).collect()[0][0])

Here's the error output:

df = df.filter(df.created_at >= session.sql(max_from_this).collect()[0][0])
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 1034, in sql
File "/usr/lib/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in __call__
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 196, in deco
pyspark.sql.utils.AnalysisException: spark_catalog requires a single-part namespace, but got [x, y]

X is the project name and Y is the dataset name.
It is using the dbt-bigquery adapter (v.1.7.2) and uses dataproc to submit the Python model.

Expected Behavior

It is expected that one can make use of the aforementioned property in order to run incremental models.

Steps To Reproduce

Python model using the dbt-bigquery adapter
Under the model() function, set the materialized property of dbt.config to incremental

def model(dbt, session):
    dbt.config(
        materialized="incremental",
        dataproc_region=<DATAPROC_REGION>
        submission_method=<SUBMISSION_METHOD>
    )

Try to use the dbt.this property

max_from_this = f"select max(created_at) from {dbt.this}
df = df.filter(df.created_at >= session.sql(max_from_this).collect()[0][0])

Run the model using dbt run
You should be able to check the logs in the dataproc batch, using the Google Cloud Console.

Relevant log output

Using the default container image
Waiting for container log creation
PYSPARK_PYTHON=/opt/dataproc/conda/bin/python
JAVA_HOME=/usr/lib/jvm/temurin-11-jdk-amd64
SPARK_EXTRA_CLASSPATH=
:: loading settings :: file = /etc/spark/conf/ivysettings.xml
/usr/lib/spark/python/lib/pyspark.zip/pyspark/pandas/__init__.py:49: UserWarning: 'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It is required to set this environment variable to '1' in both driver and executor sides if you use pyarrow>=2.0.0. pandas-on-Spark will set it for you but it does not work if there is a Spark context already launched.
Traceback (most recent call last):
  File "/var/dataproc/tmp/srvls-batch-0c5d7153-2f67-4614-86b2-1ed2f1264837/<PYTHON-MODEL.py>", line 264, in <module>
    df = model(dbt, spark)
  File "/var/dataproc/tmp/srvls-batch-0c5d7153-2f67-4614-86b2-1ed2f1264837/<PYTHON-MODEL.py>", line 165, in model
    df = df.filter(df.created_at >= session.sql(max_from_this).collect()[0][0])
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 1034, in sql
  File "/usr/lib/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in __call__
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 196, in deco
pyspark.sql.utils.AnalysisException: spark_catalog requires a single-part namespace, but got [x, y]

Environment

dbt-core: 1.7.2
dbt-bigquery: 1.7.2

Additional Context

References:

python models in dbt

The text was updated successfully, but these errors were encountered:

Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com> Co-authored-by: Serhii Dimchenko <[email protected]>

carlos-veris added type:bug triage:product labels Jul 23, 2024

amychen1776 added feature:python-models and removed triage:product labels Aug 28, 2024

mikealfare added the pkg:dbt-bigquery label Jan 14, 2025

mikealfare transferred this issue from dbt-labs/dbt-bigquery Jan 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] spark_catalog requires a single-part namespace in dbt python incremental model #553

[Bug] spark_catalog requires a single-part namespace in dbt python incremental model #553

carlos-veris commented Jul 23, 2024 •

edited

Loading

[Bug] spark_catalog requires a single-part namespace in dbt python incremental model #553

[Bug] spark_catalog requires a single-part namespace in dbt python incremental model #553

Comments

carlos-veris commented Jul 23, 2024 • edited Loading

Is this a new bug in dbt-bigquery?

Current Behavior

Expected Behavior

Steps To Reproduce

Relevant log output

Environment

Additional Context

References:

carlos-veris commented Jul 23, 2024 •

edited

Loading