-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Array macros #5823
Array macros #5823
Conversation
This is ready for review. After it is approved and merged, I will revert the changes to |
models__array_append_expected_sql = """ | ||
select 1 as id, {{ array_construct([1,2,3,4]) }} as array_col union all | ||
select 2 as id, {{ array_construct([4]) }} as array_col | ||
""".lstrip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the .lstrip()
for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question! In this particular case, it isn't needed, and I can remove it :)
More detail:
.lstrip()
strips off the leading whitespace so that there isn't an empty newline at the beginning of this "file" (string that ends up being materialized as a file for pytest'ing purposes).
It's useful for seed CSV "files" in particular because it allows the text to be more readable. You can see examples with git grep "lstrip" ./
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the explanation!
# check types equal | ||
expected_cols = get_relation_columns(project.adapter, "expected") | ||
actual_cols = get_relation_columns(project.adapter, "actual") | ||
print(f"Expected: {expected_cols}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to print these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not strictly necessary. Copied from here.
Having this printing in here is particularly handy if there is a test failure in the future. Definitely came in handy for me during development too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general leaving print statements in can result in debug logs being full of extraneous statements that you then have to sift through so I'd recommend removing it (or just commenting it out so if someone wants to use it to debug they can just remove the comments)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think print()
statements in pytest
tests only go to go to standard output (stdout) in two cases:
- when there is a test failure
- if you add the
--show-capture
command-line option (likepytest ... -s
)
Otherwise, pytest
"captures" all output so that you don't see anything other than the test summary.
{# all inputs must be the same data type to match postgres functionality #} | ||
{% macro default__array_construct(inputs, data_type) -%} | ||
{% if inputs|length > 0 %} | ||
array[ {{ inputs|join(' , ') }} ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style nit: do we generally indent within an if block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if we have a style guide for internal Jinja or not. This particular implementation was copied-pasted from dbt-utils as-is.
{% if inputs|length > 0 %} | ||
array[ {{ inputs|join(' , ') }} ] | ||
{% else %} | ||
array[]::{{data_type}}[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like the adapter specific PRs don't have the functionality to return an empty array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This dbt-core PR includes multiple instances of array_construct([])
, namely within these tests:
tests/adapter/dbt/tests/adapter/utils/fixture_array_construct.py
tests/adapter/dbt/tests/adapter/utils/fixture_array_concat.py
tests/adapter/dbt/tests/adapter/utils/test_array_append.py
I was hoping that these are sufficient to establish that the adapters handle the base case of an empty array. But if you take a look at those and it doesn't seem sufficient, then let's discuss it further to update the tests accordingly!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as long as we're validating via tests that we don't need to implement the if condition for each adapter that's good to me
{%- endmacro %} | ||
|
||
{% macro default__array_concat(array_1, array_2) -%} | ||
array_cat({{ array_1 }}, {{ array_2 }}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if array_concat
is more common should we use that as the default instead of postgres syntax?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Between Postgres, Snowflake, BigQuery, and Redshift, it's six of one, half a dozen of the other.
array_cat
- Postgresarray_cat
- Snowflakearray_concat
- BigQueryarray_concat
- Redshift
Spark/Databricks refuses to break the tie by using a unique function name for array concatenation:
concat
- Spark
I don't think this function name is covered in the SQL standard either.
Original implementations in dbt-utils
In an alternate universe:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
such is life (and sql standards)!
@@ -0,0 +1,12 @@ | |||
# array_construct | |||
|
|||
models__array_construct_expected_sql = """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure how other folks feel but the python standard for constants is to use upper case, see. As I understand the previous convention was based on emulating how we match macros in jinja but since we're not doing that here it's not necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lowercase capitalization you see here is consistent with the tests I'm familiar with. You can see a sample of those existing tests via git grep '_sql = """' ./
.
In terms of broader compliance with the PEP 8 style guide, we do use flake8
checks in CI and in pre-commit
checks. I haven't read the nitty gritty details of what flake8
does/doesn't cover though. You can see our flake8 config here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style/testing nits but overall LGTM
related to #5520
Description
All the related pull requests:
Checklist
changie new
to create a changelog entry