BigQuery deduplicate
macro doesn't support downstream partition pruning
#928
Labels
deduplicate
macro doesn't support downstream partition pruning
#928
Describe the feature
The Bigquery deduplicate macro uses
array_agg
to deduplicate. The way it is currently set up, any query using the macro will not be able to partition prune downstream of the macro due to the wayarray_agg
interacts with partition pruning because the partition column is not explicitly selected separate from the array agg. This can be avoided in a table by doing the partition filtering in a CTE prior to using the deduplicate macro, but it can't be avoided in a view.Describe alternatives you've considered
For models materialized as tables, we can add the partition filtering above the deduplication. For views, the alternative is to write the deduplication step ourselves to manual select the partition column outside of the
array_agg
.Basically it is written like this:
and instead it needs to look like this:
Additional context
This is specific to the BQ deuplicate macro.
Who will this benefit?
BQ users who want to deduplicate in a view.
Are you interested in contributing this feature?
I can create a PR that implements the fix we used.
The text was updated successfully, but these errors were encountered: