Jinja context variable for selected resources #3471

jtcohen6 · 2021-06-17T23:38:50Z

Describe the feature

A Jinja context variable that includes the list of resources (by unique_id) selected in the current invocation:

{% if 'model.package_name.my_model' in selected_resources %}
  ...
{% endif %}

This would have similar caveats to graph and execute, because it wouldn't be populated at parse time, only at compile/execute/runtime.

Bonus points if it includes DAG information, i.e. the order in which resources will be queued for execution.

Use cases:

An on-run-start hook that wants to know all resources about to execute. This is available in the on-run-end context, in the form of the Results object.
Use standard node selection syntax as the input to a macro. E.g. run audit_helper.compare_relations() for each resource selected/built by a CI job, to ensure no regressions, by looping over selected_resources in a run-operation (Automating Non Regression Test : How to get a ref() to the deferred version of the selected model ? #2740)
- Could we do a half-decent version of this with Results today in an on-run-end hook?
- Is this better as post-hook on each model?

Describe alternatives you've considered

Adding a property to manifest nodes: selected: true|false. Then users would access the same graph context variable and use that property. I understand that some users already do something similar, e.g. inspect the tags on each node, to emulate selection criteria. We could also consider selection_criteria, detailing why a given node has been selected. This could be included in the list output and a helpful tool for visualizing selection groupings.
Make the selection criteria directly available, via flags.args.models and flags.args.exclude: Add a "full run" indicator to Jinja context #2253 (comment)

Additional context

Further down the road:

The run-operation task should accept --select and --selector
Custom tasks (tasks that define their own arguments and types #2381) that can run nodes in parallel using threads, rather than in sequence

Who will this benefit?

This comes up in a surprising number of more-complex use cases.

The text was updated successfully, but these errors were encountered:

vergenzt · 2022-01-19T18:10:28Z

Is there any sort of workaround available for something like this right now? Not a blocker at the moment, but would make deployment logic in my use case much terser and more intuitive.

My use case

I'm using a custom generate_schema_name macro that includes a __dbt_tmp_<timestamp> suffix on the schema to do blue/green style promotion at the schema level instead of just the table level.

However I'm trying to make it so that nodes that are outside the selection for rebuilding are ref'd from their un-suffixed schema name rather than the schema with temp suffix (in which they don't exist).

Currently, because my main project is split into three (root project → source project 1 & source project 2), and the only selections I currently do in my build process are at the package level (i.e. either source project1 or source project 2 or root project), right now I just check whether node.package_name == project_name in my generate_schema_name macro. However I'd love to replace that check with node.unique_id in selected_nodes!

b-per · 2022-03-28T07:19:41Z

Hi! Would it be ok for me to contribute to this issue? There are a few occasions where I wished I had this information in the Jinja context.

From what I see we would need to add a @contexproperty function in context/providers.py and potentially retrieve the resources from get_graph_queue(). Does that seem right?

I have not thought about tests so far but feel free to mention where would be the best place to add tests around the feature.

jtcohen6 · 2022-03-28T08:14:58Z

@b-per I'm all for you taking a crack at it. The broad overview you've got sounds about right; my instinct would be to retrieve the resources (list of unique_id) from selector.get_selected, rather than GraphQueue.

We'll want to very clearly document that, just as with flags, users should not use selected_resources as the input to any parse-time logic (dependencies or configs). I imagine selected_resources should be an empty list at parse time.

b-per · 2022-03-28T13:17:52Z

After having given a further look, it doesn't look as straightforward as I thought it would be 😄 .

get_selected is executed on a NodeSelector that requires a Graph (not available in the current context) and a Manifest (available in context/providers.py) and takes as input a spec object (not available in the current context).

class ProviderContext(ManifestContext):
    # subclasses are MacroContext, ModelContext, TestContext
    def __init__(
        self,
        model,
        config: RuntimeConfig,
        manifest: Manifest,
        provider: Provider,
        context_config: Optional[ContextConfig],
    ) -> None:

Should the approach be to import the graph and spec objects in context/providers.py and calculate get_selected in selected_resources or to calculate it outside of the context and import it already calculated?

jtcohen6 · 2022-03-28T13:22:01Z

Not surprised it's a bit tricky :)

calculate it outside of the context and import it already calculated?

This one feels like the better approach. I'm not sure which object it should live on, or the extent to which it would require rewiring in our provider. Someone from the Language pr Execution team might be able to provide clearer guidance (since this functionality really spans the two).

jtcohen6 · 2022-04-12T16:39:38Z

Very cool! @b-per Any chance I could ask you to open an issue in https://github.com/dbt-labs/docs.getdbt.com to get this documented for v1.1?

b-per · 2022-04-12T16:52:05Z

No worries, I'll create the issue and draft a PR

tarrafil · 2022-05-13T16:09:15Z

@b-per sorry to bother, but I need some help. I am trying to implement the case @vergenzt talked about, but all I get is an empty list with selected_resources. I am using dbt 1.1.0. When I ref this variable inside a model like

-- {{ selected_resources }}
select * from bla

it compiles to

-- ['model.my_project.my_model']
select * from bla

but when I use it inside a macro like

{% macro generate_schema_name(custom_schema_name, node) -%}

    {%- do log(selected_resources, info=true) -%}

    {%- set default_schema = target.schema -%}
    {%- if custom_schema_name is none -%}
        {{ default_schema }}
    {%- else -%}
        {{ default_schema }}_{{ custom_schema_name | trim }}
    {%- endif -%}

{%- endmacro %}

all I get is an empty list in the logs. Is there something I need to initialize first?

vergenzt · 2022-05-14T00:29:13Z

Aha - unfortunately I think generate_schema_name is called at parse time -- which means it won't have access to this value iiuc. 🙁

#5001 (comment)

tarrafil · 2022-05-14T03:13:32Z

That is a bummer. @vergenzt, did you find any work-around? right now i have ~50 teams in analytics, and for every team to have all the 300 tables is time, storage and compute consuming. The ability to pass 2 schemas would be much helpful, one for the models being run, and one for the rest. The first schema would be dev or qa, and the second the prd one. Each team would build only its models, and take the prd version of the other teams models.

jtcohen6 added the enhancement New feature or request label Jun 17, 2021

jtcohen6 added the node selection Functionality and syntax for selecting DAG nodes label Aug 2, 2021

jtcohen6 added help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors Team:Execution labels Mar 28, 2022

This was referenced Apr 6, 2022

Add selected_resources to the Jinja context #5001

Merged

[CT-466] [Feature] Make run-operation accept selectors to be able to use the selected_resources Jinja variable #5005

Open

ChenyuLInx closed this as completed in #5001 Apr 12, 2022

b-per mentioned this issue Apr 12, 2022

Add documentation about selected_resources (releasing with 1.1) dbt-labs/docs.getdbt.com#1330

Closed

1 task

yu-iskw mentioned this issue Jun 23, 2022

[FEATURE] Add the possibility to compute metrics for a subset of all monitored tables re-data/re-data#294

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jinja context variable for selected resources #3471

Jinja context variable for selected resources #3471

jtcohen6 commented Jun 17, 2021 •

edited

Loading

vergenzt commented Jan 19, 2022

b-per commented Mar 28, 2022

jtcohen6 commented Mar 28, 2022

b-per commented Mar 28, 2022

jtcohen6 commented Mar 28, 2022 •

edited

Loading

jtcohen6 commented Apr 12, 2022

b-per commented Apr 12, 2022

tarrafil commented May 13, 2022 •

edited

Loading

vergenzt commented May 14, 2022 •

edited

Loading

tarrafil commented May 14, 2022 •

edited

Loading

Jinja context variable for selected resources #3471

Jinja context variable for selected resources #3471

Comments

jtcohen6 commented Jun 17, 2021 • edited Loading

Describe the feature

Describe alternatives you've considered

Additional context

Who will this benefit?

vergenzt commented Jan 19, 2022

My use case

b-per commented Mar 28, 2022

jtcohen6 commented Mar 28, 2022

b-per commented Mar 28, 2022

jtcohen6 commented Mar 28, 2022 • edited Loading

jtcohen6 commented Apr 12, 2022

b-per commented Apr 12, 2022

tarrafil commented May 13, 2022 • edited Loading

vergenzt commented May 14, 2022 • edited Loading

tarrafil commented May 14, 2022 • edited Loading

jtcohen6 commented Jun 17, 2021 •

edited

Loading

jtcohen6 commented Mar 28, 2022 •

edited

Loading

tarrafil commented May 13, 2022 •

edited

Loading

vergenzt commented May 14, 2022 •

edited

Loading

tarrafil commented May 14, 2022 •

edited

Loading