Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAP-945] [Bug] submission_method from dbt profile not being applied to dbt Python models #588

Open
2 tasks done
gbmarc1 opened this issue Oct 12, 2023 · 3 comments
Open
2 tasks done
Labels
help_wanted Extra attention is needed pkg:dbt-bigquery Issue affects dbt-bigquery type:bug Something isn't working as documented

Comments

@gbmarc1
Copy link

gbmarc1 commented Oct 12, 2023

Is this a new bug in dbt-bigquery?

  • I believe this is a new bug in dbt-bigquery
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

I have the following profile. I want a job to be created in the provided cluster name but it always end up as a serverless batch.

ml:
  target: dev
  outputs:
    dev: &dev_config
      type: bigquery
      dataset: "{{ env_var('USER') }}"
      project: shopify-ml-adhoc
      priority: interactive
      method: oauth
      location: US
      job_execution_timeout_seconds: 600
      job_retries: 1
      threads: 2
      submission_method: cluster
      dataproc_region: us-central1
      gcs_bucket: ml-adhoc-dataproc-jobs
      dataproc_cluster_name: ml-adhoc-dataproc-us-central1

This is the model. If I uncomment the dbt.config it works properly. But I want this config in the profile not in the model itself.

def model(dbt, session):
    # dbt.config(
    #     submission_method="cluster",
    #     dataproc_cluster_name="ml-adhoc-dataproc-us-central1",
    # )
    my_sql_model_df = dbt.source("safe_content_moderation", "safe_content_moderation")

    final_df = my_sql_model_df

    return final_df

Expected Behavior

The profile config is respected and the job is executed in the cluster.

Steps To Reproduce

dbt run

Relevant log output

dbt run --models nsfw
15:50:47  Running with dbt=1.6.6
15:50:47  Registered adapter: bigquery=1.6.7
15:50:47  Unable to do partial parsing because profile has changed
15:50:48  Found 5 models, 12 tests, 7 sources, 0 exposures, 0 metrics, 661 macros, 0 groups, 0 semantic models
15:50:48  
15:50:50  Concurrency: 2 threads (target='dev')
15:50:50  
15:50:50  1 of 2 START sql table model mab_nsfw.multi_label_v1 ........................... [RUN]
15:50:50  2 of 2 START python table model mab_nsfw.multi_label_v2 ........................ [RUN]
15:50:54  1 of 2 OK created sql table model mab_nsfw.multi_label_v1 ...................... [CREATE TABLE (84.1k rows, 10.5 MiB processed) in 4.40s]

Environment

- OS: macos
- Python: 3.11.1
- dbt-core: 1.6.6
- dbt-bigquery: 1.6.7

Additional Context

No response

@gbmarc1 gbmarc1 added type:bug Something isn't working as documented triage:product In Product's queue labels Oct 12, 2023
@github-actions github-actions bot changed the title submission_method ignored in profile (dbt-bigquery) [ADAP-945] submission_method ignored in profile (dbt-bigquery) Oct 12, 2023
@dbeatty10
Copy link
Contributor

Thanks for reporting this @gbmarc1

It sounds like this didn't work for you:

def model(dbt, session):
    my_sql_model_df = dbt.source("safe_content_moderation", "safe_content_moderation")

    final_df = my_sql_model_df

    return final_df

But this did work:

def model(dbt, session):
    dbt.config(
        submission_method="cluster",
        dataproc_cluster_name="ml-adhoc-dataproc-us-central1",
    )
    my_sql_model_df = dbt.source("safe_content_moderation", "safe_content_moderation")

    final_df = my_sql_model_df

    return final_df

To help troubleshoot

Did you happen to try either of these as well? This could help nail down where the missing piece(s) might be.

Configuring submission_method only:

def model(dbt, session):
    dbt.config(
        submission_method="cluster",
    )
    my_sql_model_df = dbt.source("safe_content_moderation", "safe_content_moderation")

    final_df = my_sql_model_df

    return final_df

Or configuring dataproc_cluster_name only:

def model(dbt, session):
    dbt.config(
        dataproc_cluster_name="ml-adhoc-dataproc-us-central1",
    )
    my_sql_model_df = dbt.source("safe_content_moderation", "safe_content_moderation")

    final_df = my_sql_model_df

    return final_df

@dbeatty10 dbeatty10 added triage:awaiting-response Awaiting a response from the reporter and removed triage:product In Product's queue labels Oct 12, 2023
@gbmarc1
Copy link
Author

gbmarc1 commented Oct 12, 2023

Hello,
Thanks for looking at this! :)

Seems the profile's submission_method get ignored.

  • Configuring submission_method only 👍
  • Configuring dataproc_cluster_name only 👎

@github-actions github-actions bot added triage:product In Product's queue and removed triage:awaiting-response Awaiting a response from the reporter labels Oct 12, 2023
@dbeatty10 dbeatty10 changed the title [ADAP-945] submission_method ignored in profile (dbt-bigquery) [ADAP-945] [Bug] submission_method from dbt profile not being applied to dbt Python models Oct 12, 2023
@dbeatty10
Copy link
Contributor

Thanks @gbmarc1 -- that gives us the info we need 👍

Acceptance criteria

As noted in the original issue, dbt should use the cluster submission method (rather than serverless) when using the following project files:

profiles.yml

ml:
  target: dev
  outputs:
    dev: &dev_config
      type: bigquery
      dataset: "{{ env_var('USER') }}"
      project: shopify-ml-adhoc
      priority: interactive
      method: oauth
      location: US
      job_execution_timeout_seconds: 600
      job_retries: 1
      threads: 2
      submission_method: cluster
      dataproc_region: us-central1
      gcs_bucket: ml-adhoc-dataproc-jobs
      dataproc_cluster_name: ml-adhoc-dataproc-us-central1

models/my_model

def model(dbt, session):
    dbt.config(
        dataproc_cluster_name="ml-adhoc-dataproc-us-central1",
    )
    my_sql_model_df = dbt.source("safe_content_moderation", "safe_content_moderation")

    final_df = my_sql_model_df

    return final_df

Relevant code

@dbeatty10 dbeatty10 removed the triage:product In Product's queue label Oct 12, 2023
@martynydbt martynydbt added the help_wanted Extra attention is needed label Feb 8, 2024
@mikealfare mikealfare added the pkg:dbt-bigquery Issue affects dbt-bigquery label Jan 14, 2025
@mikealfare mikealfare transferred this issue from dbt-labs/dbt-bigquery Jan 14, 2025
mikealfare pushed a commit that referenced this issue Jan 20, 2025
* Update tox requirement from ~=4.9 to ~=4.10

Updates the requirements on [tox](https://github.com/tox-dev/tox) to permit the latest version.
- [Release notes](https://github.com/tox-dev/tox/releases)
- [Changelog](https://github.com/tox-dev/tox/blob/main/docs/changelog.rst)
- [Commits](tox-dev/tox@4.9.0...4.10.0)

---
updated-dependencies:
- dependency-name: tox
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <[email protected]>

* Add automated changelog yaml from template for bot PR

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Github Build Bot <[email protected]>
mikealfare added a commit that referenced this issue Jan 24, 2025
* bumping .latest branch variable in update_dependencies.sh to 1.5.latest

* updating env variable to 1.5.latest in nightly-release.yml

* created 1.5.0rc1 changelog (#566)

* updated changelog (#569)

* Bumping version to 1.5.0 and generate changelog

* Fix Issue URLs in 1.5.0 Changelog (#582)

* Fix 1.5.0 changelog links

* Patch changie for Spark->Snowflake

* fix regular expression for redaction of row values

redaction of row values did not work if value contained '\n' characters, eg in JSON, format because the regular expression would fail to detect such values and thus, the data would not get redacted. I added the newline character to the regular expression to fix this.

* finish rebase

* added changelog

---------

Co-authored-by: Github Build Bot <[email protected]>
Co-authored-by: Mike Alfare <[email protected]>
Co-authored-by: Kevin Wang <[email protected]>
Co-authored-by: Matthew McKnight <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help_wanted Extra attention is needed pkg:dbt-bigquery Issue affects dbt-bigquery type:bug Something isn't working as documented
Projects
None yet
Development

No branches or pull requests

4 participants