Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Nested record field is not quoted and causes build error when contract enforced enabled. #554

Open
2 tasks done
philBatardiere opened this issue Jul 17, 2024 · 2 comments
Labels
feature:nested-columns Issues related to nested columns, like BigQuery's struct pkg:dbt-bigquery Issue affects dbt-bigquery type:bug Something isn't working as documented

Comments

@philBatardiere
Copy link

Is this a new bug in dbt-bigquery?

  • I believe this is a new bug in dbt-bigquery
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Hello,

I have the following model that use two reserved keywords groups and full as example:

{{
    config(
        materialized='table'
    )
}}

WITH data AS (
  SELECT 1 AS id, [STRUCT(11 AS id, 'Peter' AS name, ['a', 'b', 'c'] AS `full`)] AS `groups`
  UNION ALL
  SELECT 2 AS id, [STRUCT(22 AS id, 'Bob' AS name, ['a', 'b', 'c'] AS `full`)] AS `groups`
)

SELECT 
  id,
  `groups`
FROM data

and the following model schema:

version: 2

models:
  - name: model_bug
    config:
      contract:
        enforced: true
    columns:
      - name: id
        data_type: INTEGER
        constraints:
          - type: not_null
        description: "ID Description."
      - name: groups
        quote: true
        data_type: ARRAY
        description: "Groups description."
      - name: groups.id
        quote: true
        data_type: INTEGER
        constraints:
          - type: not_null
        description: "Group ID description."
      - name: groups.name
        quote: true
        data_type: STRING
        constraints:
          - type: not_null
        description: "Group name description."
      - name: groups.full
        quote: true
        data_type: ARRAY<STRING>
        description: "Group full description."

The model cannot be build because the nested column full is not quoted during contract validation step. However, the groups column is correctly quoted. Here is the generated query that causes the issue:

select * from (
        select
    
      
    cast(null as INT64)
 as id, 
      
    cast(null as array<struct<id INT64, name STRING, full ARRAY<STRING>>>)
 as `groups`
    ) as __dbt_sbq
    where false and current_timestamp() = current_timestamp()
    limit 0

Expected Behavior

The dbt-bigquery adapter should quote any reference of the full column during table schema validation and next steps.

Steps To Reproduce

  1. Open your DBT project

  2. Create the following model_bug.sql:

{{
    config(
        materialized='table'
    )
}}

WITH data AS (
  SELECT 1 AS id, [STRUCT(11 AS id, 'Peter' AS name, ['a', 'b', 'c'] AS `full`)] AS `groups`
  UNION ALL
  SELECT 2 AS id, [STRUCT(22 AS id, 'Bob' AS name, ['a', 'b', 'c'] AS `full`)] AS `groups`
)

SELECT 
  id,
  `groups`
FROM data
  1. Create the following schema.yml:
version: 2

models:
  - name: model_bug
    config:
      contract:
        enforced: true
    columns:
      - name: id
        data_type: INTEGER
        constraints:
          - type: not_null
        description: "ID Description."
      - name: groups
        quote: true
        data_type: ARRAY
        description: "Groups description."
      - name: groups.id
        quote: true
        data_type: INTEGER
        constraints:
          - type: not_null
        description: "Group ID description."
      - name: groups.name
        quote: true
        data_type: STRING
        constraints:
          - type: not_null
        description: "Group name description."
      - name: groups.full
        quote: true
        data_type: ARRAY<STRING>
        description: "Group full description."
  1. Run dbt build

Relevant log output

No response

Environment

- dbt-cloud: 2024.7.209

Additional Context

No response

@philBatardiere philBatardiere added type:bug Something isn't working as documented triage:product In Product's queue labels Jul 17, 2024
@philBatardiere
Copy link
Author

After reviewing the code, it seems the issue is related to the methods get_nested_column_data_types and _update_nested_column_data_types, where the method is called from the adapter bigquery__get_empty_schema_sql method.

Problem: The current _update_nested_column_data_types method may not handle nested records where quoting is necessary (as indicated by the YAML configuration) to correctly represent data types.

Solution: Introducing a quote boolean parameter allows explicit control over whether values within the nested record should be quoted. This aligns the method's behavior with the YAML configuration.

@miki-lwy
Copy link

If you remove the below, the error should be gone

  • name: groups
    quote: true
    data_type: ARRAY
    description: "Groups description."

@mikealfare mikealfare added the pkg:dbt-bigquery Issue affects dbt-bigquery label Jan 14, 2025
@mikealfare mikealfare transferred this issue from dbt-labs/dbt-bigquery Jan 14, 2025
colin-rogers-dbt pushed a commit that referenced this issue Feb 3, 2025
Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com>
Co-authored-by: Serhii Dimchenko <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature:nested-columns Issues related to nested columns, like BigQuery's struct pkg:dbt-bigquery Issue affects dbt-bigquery type:bug Something isn't working as documented
Projects
None yet
Development

No branches or pull requests

4 participants