Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-539] [Feature] Add zip and modules.itertools to BaseContext #5130

Closed
1 task done
bd3dowling opened this issue Apr 21, 2022 · 1 comment · Fixed by #5140
Closed
1 task done

[CT-539] [Feature] Add zip and modules.itertools to BaseContext #5130

bd3dowling opened this issue Apr 21, 2022 · 1 comment · Fixed by #5140
Labels
enhancement New feature or request help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors

Comments

@bd3dowling
Copy link
Contributor

Is there an existing feature request for this?

  • I have searched the existing issues

Describe the Feature

Add python's built-in zip function to the Jinja context for use like a global macro and select functions from itertools as accessible from the modules variable.

Describe alternatives you've considered

Attempt to manually replicate the functions as Jinja macros:

  • Exceptionally cumbersome Jinja and often impossible to fully replicate functionality.
    E.g. implementation of itertools.product:
    {% macro product() %}
    
    {% set pools = varargs|map("list")|list %}
    
    {% set ns = namespace(result = [[]]) %}
    
    {% for pool in pools %}
        {% set tmp = [] %}
        {% for x in ns.result %}
            {% for y in pool %}
                {{ tmp.append(x+[y]) }}
            {% endfor %}
        {% endfor %}
        {% set ns.result = tmp %}
    {% endfor %}
    
    {{ return(ns.result) }}
    
    {% endmacro %}
    

Who will this benefit?

This will benefit end-users, allowing more concise models and removing need to manually recreate such functionality (or import a package doing so).

For example, suppose we have 3 tables, and a list of their dimension fields:

{% set dim_map = {
    "table_1": ["t11", "t12", "t12"],
    "table_2": ["t21", "t22"],
    "table_3": ["t31", "t32", "t33", "t34"]
} %}

Suppose I wish to generate a surrogate key based on all of these keys in the same model in which I'm joining them all together. Presently this can be accomplished with something like:

{% set ns = namespace(surrogate_key_args = []) %}
{% for table, dims in dim_map.items() %}
    {% do ns.surrogate_key_args.extend(product([table], dims)|map("join", ".")|list) %}
{% endfor %}

select
    {{ dbt_utils.surrogate_key(*ns.surrogate_key_args) }},
    ...
from table_1
left join table_2 on ...
left join table_3 on ...

where product is a jinja macro with definition given above.

With this proposed feature added to dbt-core, the same could be accomplished without having to add a product macro and instead being able to do:

{% set ns = namespace(surrogate_key_args = []) %}
{% for table, dims in dim_map.items() %}
    {% do ns.surrogate_key_args.extend(modules.itertools.product([table], dims)|map("join", ".")|list) %}
{% endfor %}

or even enable one to go really crazy:

{% set it = modules.itertools %}
{% set surrogate_key_args = it.chain.from_iterable(
    it.starmap(
        it.product,
        zip(
            it.starmap(it.repeat, zip(dim_map.keys(), it.repeat(1))),
            dim_map.values()
        )
    )
)|map("join", ".")|list %}

...

or if the itertools functions were implemented as filters into the context (obviously only logical for a subset):

{% set it = modules.itertools %}
{% set thing = zip(
    zip(dim_map.keys(), it.repeat(1))|starmap('it.repeat'),
    dim_map.values()
)|starmap('it.product')|chain_from_iterable
%}

Point being, it would give a huge amount of flexibility and expressibility when writing Jinja.

Are you interested in contributing this feature?

Can do

Anything else?

I believe should be minimal code with just core/dbt/context/base.py needing to be edited:

def get_itertools_module_context() -> Dict[str, Any]:
    context_exports = [
        "count", "cycle", "repeat",
        "accumulate", "chain", "islice", "starmap", "zip_longest",
        "product", "permutations", "combinations", "combinations_with_replacement"
    ]

    return {name: getattr(itertools, name) for name in context_exports} 


def get_context_modules() -> Dict[str, Dict[str, Any]]:
    return {
        "pytz": get_pytz_module_context(),
        "datetime": get_datetime_module_context(),
        "re": get_re_module_context(),
        "itertools": get_itertools_module_context(),
    }

for adding itertools to the modules var and

    @contextmember("zip")
    @staticmethod
    def _zip(*iterables) -> Iterable[Any]:
        """...docstring..."""
        return zip(*iterables)

for adding zip.

@bd3dowling bd3dowling added enhancement New feature or request triage labels Apr 21, 2022
@github-actions github-actions bot changed the title [Feature] Add zip and modules.itertools to BaseContext [CT-539] [Feature] Add zip and modules.itertools to BaseContext Apr 21, 2022
@emmyoop
Copy link
Member

emmyoop commented Apr 22, 2022

@bd3dowling this is a great suggestion and a wonderfully detailed write up! Thank you!

@jeremyyeo is already in the process of adding zip over in #5107. It's still in draft but is bring worked on! We would also welcome a PR to add itertools as described above. Your use cases are attractive and make total sense.

In addition to the code changes, I'll ask you (or whoever decides to work on this) to also open an issue to update documentation for the modules docs at modules.md.

@emmyoop emmyoop added help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors and removed triage labels Apr 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants