Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support TTL for BigQuery tables #2711

Merged
merged 18 commits into from
Aug 19, 2020
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
- Upgraded snowflake-connector-python dependency to 2.2.10 and enabled the SSO token cache ([#2613](https://github.com/fishtown-analytics/dbt/issues/2613), [#2689](https://github.com/fishtown-analytics/dbt/issues/2689), [#2698](https://github.com/fishtown-analytics/dbt/pull/2698))

### Features
- Support TTL for BigQuery tables([#2711](https://github.com/fishtown-analytics/dbt/pull/2711))
- Add better retry support when using the BigQuery adapter ([#2694](https://github.com/fishtown-analytics/dbt/pull/2694), follow-up to [#1963](https://github.com/fishtown-analytics/dbt/pull/1963))
- Added a `dispatch` method to the context adapter and deprecated `adapter_macro`. ([#2302](https://github.com/fishtown-analytics/dbt/issues/2302), [#2679](https://github.com/fishtown-analytics/dbt/pull/2679))
- The built-in schema tests now use `adapter.dispatch`, so they can be overridden for adapter plugins ([#2415](https://github.com/fishtown-analytics/dbt/issues/2415), [#2684](https://github.com/fishtown-analytics/dbt/pull/2684))
Expand All @@ -18,7 +19,7 @@

Contributors:
- [@bbhoss](https://github.com/bbhoss) ([#2677](https://github.com/fishtown-analytics/dbt/pull/2677))
- [@kconvey](https://github.com/kconvey) ([#2694](https://github.com/fishtown-analytics/dbt/pull/2694))
- [@kconvey](https://github.com/kconvey) ([#2694](https://github.com/fishtown-analytics/dbt/pull/2694), [#2711], (https://github.com/fishtown-analytics/dbt/pull/2711))

## dbt 0.18.0b2 (July 30, 2020)

Expand Down
7 changes: 7 additions & 0 deletions plugins/bigquery/dbt/adapters/bigquery/impl.py
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,7 @@ class BigqueryConfig(AdapterConfig):
labels: Optional[Dict[str, str]] = None
partitions: Optional[List[str]] = None
grant_access_to: Optional[List[Dict[str, str]]] = None
time_to_expiration: Optional[int] = None


class BigQueryAdapter(BaseAdapter):
Expand Down Expand Up @@ -745,6 +746,12 @@ def get_table_options(
expiration = 'TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 12 hour)'
opts['expiration_timestamp'] = expiration

if (config.get('time_to_expiration') is not None) and (not temporary):
expiration = (
'TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL '
'{} hour)').format(config.get('time_to_expiration'))
opts['expiration_timestamp'] = expiration

if config.persist_relation_docs() and 'description' in node:
description = sql_escape(node['description'])
opts['description'] = '"""{}"""'.format(description)
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
select 1 as id
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
""""Test adapter specific config options."""
from test.integration.base import DBTIntegrationTest, use_profile
import textwrap
import yaml


class TestBigqueryAdapterSpecific(DBTIntegrationTest):

@property
def schema(self):
return "bigquery_test_022"

@property
def models(self):
return "adapter-specific-models"

@property
def profile_config(self):
return self.bigquery_profile()

@property
def project_config(self):
return yaml.safe_load(textwrap.dedent('''\
config-version: 2
models:
test:
materialized: table
expiring_table:
time_to_expiration: 4
'''))

@use_profile('bigquery')
def test_bigquery_time_to_expiration(self):
_, stdout = self.run_dbt_and_capture()
self.assertIn(
'expiration_timestamp: TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL '
'4 hour)', stdout)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the example I copied here was one that expected failure on the model, so the query would be dumped in stdout.

I probably want to inspect results from self.run_dbt(), but could use a pointer to the compiled SQL within the results to do this assertIn (it's a little hard to decipher the schema sometimes). Let me know if that makes sense

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want results[index].node.injected_sql. You can look for results by node name using results[index].node.name.

Also, don't feel at all obligated to do this, but because we use pytest for tests now you are free to use the (much more ergonomic, at least to me) assert whatever in stdout syntax.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@beckjake The error I got makes me think injected_sql isn't what I'm looking for in results.

E AssertionError: 'expiration_timestamp: TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 4 hour)' not found in 'select 1 as id'

This config adds the expiration_timestamp as part of the ddl, and if this was ddl, it should say something like create or replace table as .... I can't remember if this is present in debug (which I believe just dumps the query), or where else the full ddl might be in the results. Any ideas?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, now that I look at this more carefully, I think this will be in the output if you run with --debug, but not the injected_sql. injected_sql contains the value that will end up as the sql value in the materialization. But this change happens ultimately in the create_table_as macro that's called from the materialization.

It is, I suppose, always possible that we don't log all our queries on bigquery? That would be pretty bad behavior on our part.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran this on a real project locally and it doesn't look like the ddl is anywhere in run_results.json, but it definitely is in the output with --debug.

31 changes: 30 additions & 1 deletion test/unit/test_bigquery_adapter.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
import unittest
from contextlib import contextmanager
from requests.exceptions import ConnectionError
from unittest.mock import patch, MagicMock, Mock
from unittest.mock import patch, MagicMock, Mock, create_autospec

import hologram

Expand Down Expand Up @@ -571,6 +571,35 @@ def test_parse_partition_by(self):
}
)

def test_time_to_expiration(self):
adapter = self.get_adapter('oauth')
mock_config = create_autospec(
dbt.context.providers.RuntimeConfigObject)
config = {'time_to_expiration': 4}
mock_config.get.side_effect = lambda name: config.get(name)

expected = {
'expiration_timestamp': 'TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 4 hour)',
}
actual = adapter.get_table_options(mock_config, node={}, temporary=False)
self.assertEqual(expected, actual)


def test_time_to_expiration_temporary(self):
adapter = self.get_adapter('oauth')
mock_config = create_autospec(
dbt.context.providers.RuntimeConfigObject)
config={'time_to_expiration': 4}
mock_config.get.side_effect = lambda name: config.get(name)

expected = {
'expiration_timestamp': (
'TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 12 hour)'),
}
actual = adapter.get_table_options(mock_config, node={}, temporary=True)
self.assertEqual(expected, actual)



class TestBigQueryFilterCatalog(unittest.TestCase):
def test__catalog_filter_table(self):
Expand Down