-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Advanced node selection syntax #2172
Comments
@drewbanin @jtcohen6 - I'm very invested in this feature. I think it could meaningfully improve the incremental run times of our production DAG. Especially the ability to skip any I looks like this depends on #2203, so I'm assuming there's nothing I can do to help right now, but I'm very keen to help out if I can - event if that's just constructing a bank of potential test cases. Please let me know if I can help. 😁 |
@beckjake to review and advise. Sounds like PowerShell and jq have good syntaxes for arbitrary selection over a list -- what do those look like, and can we be inspired by them? |
I'd like to propose a possible implementation for the "diff-only" (
Importantly, this can be performed using static code analysis and is sensitive to upstream model changes. The use cases supported here are:
Would this type of "smart rebuild" be feasible and is this similar perhaps to what is already being planned? |
This could also improve the data lineage usability in dbt docs. I don't think this is covered above. When working with massive DAGs I don't want all children/parents recursively. But want to traverse the tree a level at a time or specify the depth I want to traverse. Much like the nix command dbt model_name^1 # only immediate children
dbt model_name^2 # immediate children and grandchildren
dbt 1^model_name # immediate parents |
We want to enable a mechanism of node selection that is:
We think that this is best implemented as YML. It should be similar to CLI
--models
and--select
syntaxes, but it will also allow us to move beyond what's possible with CLI flags + arguments.Selectors
+my_model
my_model+
@my_model
my_macro+
Set logic
--exclude
union(A,B) —exclude intersect(A,B)
)Well defined "pseudo-selectors"
We can encode a dynamic selector that returns resources based on a set of conditions, which dbt uses to pick specific nodes at build time. I'm including a couple possibilities of varying complexity, mainly to spur the imagination:
this_package_only
build_if_missing
target
database + schemabuild_if_changed
manifest.json
from a different dbt build, and dbt can compare to infer changed resourcesbuild_if_updated
manifest.json
from a different dbt build, and the result of a more recentdbt source snapshot-freshness
. dbt can determine whether(Very) hypothetical spec
Prior art
This carries on the legacy of several past issues (going back to #550, if not earlier). It's something we've been thinking about for some time.
Looking ahead, I believe that a good approach here will form the basis for features we're very interested in supporting:
The text was updated successfully, but these errors were encountered: