Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jinja context variable for selected resources #3471

Closed
jtcohen6 opened this issue Jun 17, 2021 · 10 comments · Fixed by #5001
Closed

Jinja context variable for selected resources #3471

jtcohen6 opened this issue Jun 17, 2021 · 10 comments · Fixed by #5001
Labels
enhancement New feature or request help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors node selection Functionality and syntax for selecting DAG nodes

Comments

@jtcohen6
Copy link
Contributor

jtcohen6 commented Jun 17, 2021

Describe the feature

A Jinja context variable that includes the list of resources (by unique_id) selected in the current invocation:

{% if 'model.package_name.my_model' in selected_resources %}
  ...
{% endif %}

This would have similar caveats to graph and execute, because it wouldn't be populated at parse time, only at compile/execute/runtime.

Bonus points if it includes DAG information, i.e. the order in which resources will be queued for execution.

Use cases:

Describe alternatives you've considered

  • Adding a property to manifest nodes: selected: true|false. Then users would access the same graph context variable and use that property. I understand that some users already do something similar, e.g. inspect the tags on each node, to emulate selection criteria. We could also consider selection_criteria, detailing why a given node has been selected. This could be included in the list output and a helpful tool for visualizing selection groupings.

  • Make the selection criteria directly available, via flags.args.models and flags.args.exclude: Add a "full run" indicator to Jinja context #2253 (comment)

Additional context

Further down the road:

Who will this benefit?

This comes up in a surprising number of more-complex use cases.

@jtcohen6 jtcohen6 added the enhancement New feature or request label Jun 17, 2021
@jtcohen6 jtcohen6 added the node selection Functionality and syntax for selecting DAG nodes label Aug 2, 2021
@vergenzt
Copy link

Is there any sort of workaround available for something like this right now? Not a blocker at the moment, but would make deployment logic in my use case much terser and more intuitive.

My use case

I'm using a custom generate_schema_name macro that includes a __dbt_tmp_<timestamp> suffix on the schema to do blue/green style promotion at the schema level instead of just the table level.

However I'm trying to make it so that nodes that are outside the selection for rebuilding are ref'd from their un-suffixed schema name rather than the schema with temp suffix (in which they don't exist).

Currently, because my main project is split into three (root project → source project 1 & source project 2), and the only selections I currently do in my build process are at the package level (i.e. either source project1 or source project 2 or root project), right now I just check whether node.package_name == project_name in my generate_schema_name macro. However I'd love to replace that check with node.unique_id in selected_nodes!

@b-per
Copy link
Contributor

b-per commented Mar 28, 2022

Hi! Would it be ok for me to contribute to this issue? There are a few occasions where I wished I had this information in the Jinja context.

From what I see we would need to add a @contexproperty function in context/providers.py and potentially retrieve the resources from get_graph_queue(). Does that seem right?

I have not thought about tests so far but feel free to mention where would be the best place to add tests around the feature.

@jtcohen6
Copy link
Contributor Author

@b-per I'm all for you taking a crack at it. The broad overview you've got sounds about right; my instinct would be to retrieve the resources (list of unique_id) from selector.get_selected, rather than GraphQueue.

We'll want to very clearly document that, just as with flags, users should not use selected_resources as the input to any parse-time logic (dependencies or configs). I imagine selected_resources should be an empty list at parse time.

@jtcohen6 jtcohen6 added help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors Team:Execution labels Mar 28, 2022
@b-per
Copy link
Contributor

b-per commented Mar 28, 2022

After having given a further look, it doesn't look as straightforward as I thought it would be 😄 .

get_selected is executed on a NodeSelector that requires a Graph (not available in the current context) and a Manifest (available in context/providers.py) and takes as input a spec object (not available in the current context).

class ProviderContext(ManifestContext):
    # subclasses are MacroContext, ModelContext, TestContext
    def __init__(
        self,
        model,
        config: RuntimeConfig,
        manifest: Manifest,
        provider: Provider,
        context_config: Optional[ContextConfig],
    ) -> None:

Should the approach be to import the graph and spec objects in context/providers.py and calculate get_selected in selected_resources or to calculate it outside of the context and import it already calculated?

@jtcohen6
Copy link
Contributor Author

jtcohen6 commented Mar 28, 2022

Not surprised it's a bit tricky :)

calculate it outside of the context and import it already calculated?

This one feels like the better approach. I'm not sure which object it should live on, or the extent to which it would require rewiring in our provider. Someone from the Language pr Execution team might be able to provide clearer guidance (since this functionality really spans the two).

@jtcohen6
Copy link
Contributor Author

Very cool! @b-per Any chance I could ask you to open an issue in https://github.com/dbt-labs/docs.getdbt.com to get this documented for v1.1?

@b-per
Copy link
Contributor

b-per commented Apr 12, 2022

No worries, I'll create the issue and draft a PR

@tarrafil
Copy link

tarrafil commented May 13, 2022

@b-per sorry to bother, but I need some help. I am trying to implement the case @vergenzt talked about, but all I get is an empty list with selected_resources. I am using dbt 1.1.0. When I ref this variable inside a model like

-- {{ selected_resources }}
select * from bla

it compiles to

-- ['model.my_project.my_model']
select * from bla

but when I use it inside a macro like

{% macro generate_schema_name(custom_schema_name, node) -%}

    {%- do log(selected_resources, info=true) -%}

    {%- set default_schema = target.schema -%}
    {%- if custom_schema_name is none -%}
        {{ default_schema }}
    {%- else -%}
        {{ default_schema }}_{{ custom_schema_name | trim }}
    {%- endif -%}

{%- endmacro %}

all I get is an empty list in the logs. Is there something I need to initialize first?

@vergenzt
Copy link

vergenzt commented May 14, 2022

Aha - unfortunately I think generate_schema_name is called at parse time -- which means it won't have access to this value iiuc. 🙁

#5001 (comment)

@tarrafil
Copy link

tarrafil commented May 14, 2022

That is a bummer. @vergenzt, did you find any work-around? right now i have ~50 teams in analytics, and for every team to have all the 300 tables is time, storage and compute consuming. The ability to pass 2 schemas would be much helpful, one for the models being run, and one for the rest. The first schema would be dev or qa, and the second the prd one. Each team would build only its models, and take the prd version of the other teams models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors node selection Functionality and syntax for selecting DAG nodes
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants