Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for BigQuery hourly partitioned tables #2476

Closed
jtcohen6 opened this issue May 20, 2020 · 4 comments · Fixed by #2903
Closed

Add support for BigQuery hourly partitioned tables #2476

jtcohen6 opened this issue May 20, 2020 · 4 comments · Fixed by #2903
Labels
bigquery enhancement New feature or request good_first_issue Straightforward + self-contained changes, good for new contributors!

Comments

@jtcohen6
Copy link
Contributor

Describe the feature

BigQuery hourly partitioned tables are now in beta.

 CREATE TABLE
   mydataset.newtable (transaction_id INT64,
     transaction_ts TIMESTAMP)
 PARTITION BY
   TIMESTAMP_TRUNC(transaction_ts, HOUR)

Benefits:

  • Greater control over partition pruning when querying timeseries data
  • The insert_overwrite incremental strategy could replace partitions at a finer level of precision

Required change

We need to:

  • Find a decent way to reflect hourly partitioning in the partition_by spec
  • Update partition_by.render() to avoid wrapping the partition column in date() if the user has configured hourly partitioning

Since 0.16.0, the partition_by spec has looked like:

{{ config(
      partition_by = {
        'field': 'my_date_or_ts_column',
        'data_type': 'date' or 'timestamp'
      }
}}

The bq API will reflect hourly partitioning like:

"timePartitioning": {
        "field": "ts_column",
        "type": "HOUR"
      },

Quick implementation thoughts:

  • I'm not a huge fan of adding a type attribute separate from data_type
  • If dbt accepted 'data_type': 'hour', would it assume that the field has already been truncated, or that field is a timestamp in need of truncation?

Who will this benefit?

  • BigQuery users
@jtcohen6 jtcohen6 added enhancement New feature or request bigquery good_first_issue Straightforward + self-contained changes, good for new contributors! labels May 20, 2020
@jackiexsun
Copy link

+1 This support would be great to have!

@ebragas
Copy link

ebragas commented Oct 6, 2020

+1 This would be great, especially if it extended to support month and year partitioning as well.

@rpedela
Copy link

rpedela commented Oct 10, 2020

This is GA now, and I would love support for hour, day, month, year as well. What about adding granularity option and then default is day just like BQ API? BQ also supports datetime now.

@jtcohen6
Copy link
Contributor Author

What about adding granularity option and then default is day just like BQ API?

I like it! This could look like:

{{ config(
    materialized = 'table',
    partition_by = {
      'field': 'my_column',
      'data_type': 'datetime',
      'granularity': 'hour'
    }
}}

Then dbt would know to return create table ... partition by datetime_trunc(my_column, hour) as ....

Is anyone interested in contributing this change? I think it could be a fairly straightforward update to this code:

https://github.com/fishtown-analytics/dbt/blob/72038258ed4b5cb7830734ffee3f43376075d9f3/plugins/bigquery/dbt/adapters/bigquery/impl.py#L49-L63

Plus adding an integration test here :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bigquery enhancement New feature or request good_first_issue Straightforward + self-contained changes, good for new contributors!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants