-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support TTL for BigQuery tables #2711
Support TTL for BigQuery tables #2711
Conversation
@use_profile('bigquery') | ||
def test_bigquery_location_invalid(self): | ||
_, stdout = self.run_dbt_and_capture() | ||
self.assertIn( | ||
'expiration_timestamp: TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL ' | ||
'4 hour)', stdout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is working the way I'm hoping it is (asserts the ddl contains the option and succeeds), I'll probably add a couple more of these tests for other adapter specific configs like kms_key_name
, etc.
@@ -745,6 +746,11 @@ def get_table_options( | |||
expiration = 'TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 12 hour)' | |||
opts['expiration_timestamp'] = expiration | |||
|
|||
if (config.get('time_to_expiration') is not None) and (not temporary): | |||
expiration = ('TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL ' | |||
'{} hour').format(config.get('time_to_expiration')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name 'time_to_expiration' doesn't provide any hints about the unit. Maybe this could be 'hours_to_expiration'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm totally in favor of hours_to_expiration
_, stdout = self.run_dbt_and_capture() | ||
self.assertIn( | ||
'expiration_timestamp: TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL ' | ||
'4 hour)', stdout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the example I copied here was one that expected failure on the model, so the query would be dumped in stdout
.
I probably want to inspect results
from self.run_dbt()
, but could use a pointer to the compiled SQL within the results to do this assertIn
(it's a little hard to decipher the schema sometimes). Let me know if that makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you want results[index].node.injected_sql
. You can look for results by node name using results[index].node.name
.
Also, don't feel at all obligated to do this, but because we use pytest for tests now you are free to use the (much more ergonomic, at least to me) assert whatever in stdout
syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@beckjake The error I got makes me think injected_sql
isn't what I'm looking for in results.
E AssertionError: 'expiration_timestamp: TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 4 hour)' not found in 'select 1 as id'
This config adds the expiration_timestamp as part of the ddl, and if this was ddl, it should say something like create or replace table as ...
. I can't remember if this is present in debug (which I believe just dumps the query), or where else the full ddl might be in the results. Any ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, now that I look at this more carefully, I think this will be in the output if you run with --debug
, but not the injected_sql
. injected_sql
contains the value that will end up as the sql
value in the materialization. But this change happens ultimately in the create_table_as
macro that's called from the materialization.
It is, I suppose, always possible that we don't log all our queries on bigquery? That would be pretty bad behavior on our part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran this on a real project locally and it doesn't look like the ddl is anywhere in run_results.json, but it definitely is in the output with --debug
.
This is passing all tests at this point @beckjake. Thanks for the help & patience with this! @jtcohen6 bumping the suggestion to call this Still think it would be nice if the BigQuery ddl existed somewhere other than debug, but I don't believe it does. Something to think about. |
@kconvey I'm in favor of changing this to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
resolves #2697
Description
Add
time_to_expiration
as a BigQuery specific adapter configuration option. This will only be applied if the adapter is not already creating atemporary
table (which uses the same method to set a ttl of 12 hours). This option should be an integer, and is in hours.Checklist
CHANGELOG.md
and added information about my change to the "dbt next" section.