Feature: Add set/zip function to contexts #5107

jeremyyeo · 2022-04-19T11:19:46Z

resolves #2345

Description

Adding the set and zip functions as prescribed in the issue linked above.

Checklist

I have signed the CLA
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
I have added information about my change to be included in the CHANGELOG.

github-actions · 2022-04-19T11:20:05Z

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

jeremyyeo · 2022-04-20T10:27:54Z

@jtcohen6 any thoughts on adding any of the others mentioned in the issue (i.e. any(), all())?

~~Also had some flake8 fails on:~~

flake8...................................................................Failed
- hook id: flake8
- exit code: 1

core/dbt/context/base.py:8:1: F401 'dbt.clients.yaml_helper.SafeLoader' imported but unused
core/dbt/context/base.py:8:1: F401 'dbt.clients.yaml_helper.Loader' imported but unused
core/dbt/context/base.py:8:1: F401 'dbt.clients.yaml_helper.Dumper' imported but unused

~~in my pre-commit, wasn't sure if it was okay to remove or not so I just committed with --no-verify.~~

Looks like my black formatter ignored the #noqa bit and formatted that particular one-liner into 4 separate lines (I have format on save turned on in VS Code)... changing those imports back to a single line results in a passing pre-commit.

core/dbt/context/base.py

emmyoop

Looks good so far. Sorry it took a bit to review! We've been having some conversations around the best pattern for functions that we add.

gshank · 2022-05-02T14:03:10Z

Did you want to try creating the "try_set" methods that @emmyoop asked for?

jeremyyeo · 2022-05-03T06:22:52Z

Did you want to try creating the "try_set" methods that @emmyoop asked for?

Hey @gshank yep keen to do that but kind of waiting to see if we will be adding a new special error type to raise (#5167). You reckon I should just proceed with a generic TypeError for now?

def try_set(value):
    try:
        return set(value)
    except TypeError:
        raise TypeError

gshank · 2022-05-03T13:18:01Z

It looks like we mainly raise CompilationExceptions in the contexts, partly because Jinja catches the exceptions and we don't have code to distinguish between parse-time and run-time for those errors. For now, just make it a CompilationException? We plan to do some work on organizing exceptions at some point and we can clean it up then.

jeremyyeo · 2022-05-04T09:07:55Z

@gshank / @emmyoop what do you think of the try_ methods "skipping" the raise if the user was the one that set up a default return value? Would make it behave similar to other methods, e.g:

var('foo')  
> error if variable 'foo' not defined.

var('foo', 'bar') 
> var 'foo' set to 'bar' if var 'foo' not defined.

emmyoop · 2022-05-04T15:08:39Z

@jeremyyeo the try methods shouldn't allow a default. The intention of their use is for when the value really should always be the expected type. ie. if it's not iterable, etc it should be a hard stop. If the project wants to allow a default then the non-try version of the method should be used. We plan to have both versions of these methods where appropriate so the most appropriate one can be used and we don't break any projects going forward.

gshank · 2022-05-05T18:11:34Z

Looks like this is getting really close :-). If you can just add a couple of simple tests we can get this wrapped up. Probably a test in tests/functional/context_methods would be good; just something simple like a model that uses the functions.

emmyoop

Just some test changes but otherwise is looking great!

emmyoop · 2022-05-11T18:40:18Z

tests/functional/context_methods/test_builtin_functions.py

+{% set set_result = set([1, 2, 2, 3, "foo", False]) %}
+{% set try_set_result = try_set([1, 2, 2, 3, "foo", False]) %}
+
+{% set simple_set = (set_result == try_set_result == set((1, 2, 3, "foo", False))) %}


This check is a bit off since you're using the new method to check the new method.

Instead of using tests I would use macros that log the values of set_result andtry_set_result. Something like:

{% macro validate_set() %} {% set set_result = set([1, 2, 2, 3, 'foo', False]) %} {{ log("set_result: " ~ set_result) }} {% set try_set_result = try_set([1, 2, 2, 3, 'foo', False]) %} {{ log("try_set_result: " ~ try_set_result) }} {% endmacro %}

Then in the test you can capture the logs from run-operation with the --debug flag for the macro and assert the logs show up:

def test_builtin_set_function(self, project): _, log_output = run_dbt_and_capture(["--debug", "run-operation", "validate_set"]) expected_set = {False, 1, 2, 3, 'foo'} assert f"set_result: {expected_set}" in log_output assert f"try_set_result: {expected_set}" in log_output

Also do the same for zip for consistency.

emmyoop

This looks great @jeremyyeo! Thanks for working with us on the few iterations as we move towards a better way to implement these methods.

I'll get it merged in!

emmyoop · 2022-05-13T15:26:30Z

@jtcohen6 should this be back ported to 1.1.latest?

leahwicz · 2022-05-13T16:21:29Z

@emmyoop I would wait for this to go out in v1.2 since it's net new functionality. @jtcohen6 let me know if you feel differently

* add set function to contexts * add zip function to contexts * add changelog * add try_ equivalents * remove defaults * add tests * update tests

skwaugh · 2022-06-17T22:39:48Z

Any chance of this making it into 1.2?

jtcohen6 · 2022-07-14T00:03:15Z

@skwaugh This will be in v1.2!

@jeremyyeo @emmyoop I'm working on docs for this right now (dbt-labs/docs.getdbt.com#1635), and it looks like some of the docstrings need updating. (E.g. all methods include :param default: A default value to return if *args is not iterable, including the ones that do not accept a default argument.)

One other note: To my mind, the implementation of set and try_set here feels exactly backwards. A big inspiration for the try_ nomenclature is Snowflake's try_cast and BigQuery's safe_cast, as opposed to just cast. The idea there is that cast raises an error if the provided column expression / literal value cannot be converted or coerced to the desired type, whereas try_cast/safe_cast return a default value (null) instead.

When using CAST, a query can fail if BigQuery is unable to perform the cast. If you want to protect your queries from these types of errors, you can use SAFE_CAST.

As a dbt user, and as someone familiar with those SQL methods, that's what I was expecting those methods to do, too.

I'm also struggling to come up with a working example where zip fails (raising a TypeError), leading it to return the default value instead. I've tried setting one of the inputs to a non-iterable value:

{% set my_list_a = 1234 %}
{% set my_list_b = ['alice', 'bob', 'another'] %}
{% set my_zip = zip(my_list_a, my_list_b) | list %}

But this just raises:

00:11:08  Encountered an error:
'NoneType' object is not iterable

jeremyyeo · 2022-07-14T07:57:48Z

and it looks like some of the docstrings need updating.

Ah yes - thanks for catching that :)

To my mind, the implementation of set and try_set here feels exactly backwards.

Yeah I think so too.

However, at the same time, I'm pretty sure we were trying to align these new functions (in terms of whether try_ "returns default upon error" or "error upon error") with the behaviour of existing context methods like toyaml, tojson:

dbt-core/core/dbt/context/base.py

Lines 463 to 465 in 3594ced

    
           def toyaml( 
        
               value: Any, default: Optional[str] = None, sort_keys: bool = False 
        
           ) -> Optional[str]:

Which has no try_ equivalent at the moment. I suppose we could:

Swap try_zip/try_set with _zip/_set but then this will deviate a little bit with the other existing methods toyaml, etc.
Swap try_zip/try_set with _zip/_set AND add try_ equivalents of the existing methods too (try_toyaml)?

I'm also struggling to come up with a working example where zip fails (raising a TypeError)

Non-iters seem to raise TypeError in Python...

I think it's because:

{% set my_list_a = 1234 %}
{% set my_list_b = ['alice', 'bob', 'another'] %}
{% set my_zip = zip(my_list_a, my_list_b) %}  -- Here the TypeError is raised and value returned is None
{% set into_list = my_zip | list %} -- Effectively we're doing `list(None)`

And list(None) does result in 'NoneType' object is not iterable.

jtcohen6 · 2022-07-14T08:39:15Z

@jeremyyeo Thanks for the quick response! And good call on the | list filter being the culprit here. I was able to get this example working:

{% set my_list_a = 1234 %}
{% set my_list_b = ['alice', 'bob', 'another'] %}
{% set my_zip = zip(my_list_a, my_list_b, default = []) | list %}
{% do log(my_zip, info = true) %} {# [] #}

I'm catching myself back up on these threads from a few months ago, and just dropped a few thoughts in #5167 (comment). I recall agreeing at the time that:

This would necessarily feel inconsistent with toyaml/fromyaml/tojson/fromjson (we don't love how those were implemented, too much in one method)
At the risk of some interim inconsistency, it's worth doing the right thing going forward. Later on, we can scope and schedule work to bring the others into line.

My strong preference is to get this right going forward, by giving the right functionality to the right name. Two options for us right now (while we're in the v1.2 RC period):

Switch the behavior of set + try_set, zip + try_zip
Rename try_* methods to indicate the opposite behavior, e.g. strict_

@emmyoop @nathaniel-may open to your thoughts as well!

jtcohen6 · 2022-07-14T14:15:21Z

@nathaniel-may and I had a chance to discuss live. We're definitely lacking a larger "language design" discussion and spec which would make it easier for community members to contribute, and for us to review, these types of changes with confidence. If he's interested, I think @dbeatty10 might be able to help with this :)

We don't have the bandwidth to do that in the immediate term, but it's definitely worth doing. In the meantime, let's move forward with one of the short-term fixes identified above, to include in v1.2.0-rc2, just to avoid shipping functionality that does the exact opposite of what (I suspect) many users will expect.

emmyoop · 2022-07-14T14:18:13Z

@jtcohen6 I'd lean toward option 2 to avoid breaking users until we've figured out the standard we want to live by. strict is better than try 💯 . And while try has meaning in Snowflake, does it for other adapters? I don't love using try if it only has meaning in Snowflake. I'd rather move away from it entirely.

jtcohen6 · 2022-07-14T15:27:32Z

@emmyoop Okay, I buy it! If we choose to adopt strict_* as a consistent pattern, it will also be much easier to add in strict_toyaml et al, since it requires no change to the current behavior of toyaml.

@jeremyyeo I can pick this up, unless you have the motivation and capacity to see it through. We'd just need to get the change merged + backported by next Tuesday.

nathaniel-may · 2022-07-14T15:32:11Z

minor note: would you consider toyaml_strict so in an alphabetized docs page they show up next to each other?

Clarification question: If I see a function with strict in its name, what exactly do I expect from its behavior?

jtcohen6 · 2022-07-14T15:37:39Z

minor note: would you consider toyaml_strict so in an alphabetized docs page they show up next to each other?

👍 good call! all about it

Clarification question: If I see a function with strict in its name, what exactly do I expect from its behavior?

As I understand it, "strict" methods:

should raise a standard exception if the value cannot be converted to a set / yaml / json, zipped, etc
should not not allow for a default value to gracefully fail over to

Whereas the "non-strict" versions of those methods return None if the conversion/serialization fails (or a configurable default instead of None).

IIRC, we had previously discussed strict: bool = False as an additional optional argument of these methods, but decided it would be better to implement separate methods / variants.

(One thought: Should it be to_set() and to_set_strict()? Not a hill we must fall down on right now.)

jeremyyeo · 2022-07-15T03:28:58Z

@jtcohen6 Took a crack at that tags/v1.2.0rc1...rename-strict-mode

Should we do toyaml and tojson at this time too?

jtcohen6 · 2022-07-15T09:43:49Z

@jeremyyeo That branch looks great! Want to open a PR for it?

I just opened a dedicated issue for this switch: #5475

The timeline to do fromjson/tojson/fromyaml/toyaml is not as pressing. The reason to rename try_set and try_zip right now is that they've already been included in an RC, and not yet in a final release, so we only have one week to correct it before we've made a commitment to backwards compatibility. That said, we should definitely make sure the work is tracked (in either #5475 or #5167), clearly defined, and (probably) open for community contribution.

add set function to contexts

4d571f2

cla-bot bot added the cla:yes label Apr 19, 2022

jtcohen6 added the Team:Language label Apr 19, 2022

add zip function to contexts

e856850

emmyoop mentioned this pull request Apr 22, 2022

[CT-539] [Feature] Add zip and modules.itertools to BaseContext #5130

Closed

1 task

add changelog

29eebe6

jeremyyeo marked this pull request as ready for review April 23, 2022 20:31

jeremyyeo requested review from a team as code owners April 23, 2022 20:31

jeremyyeo requested review from gshank and VersusFacit April 23, 2022 20:31

emmyoop requested review from emmyoop and removed request for a team and VersusFacit April 25, 2022 13:34

emmyoop reviewed Apr 26, 2022

View reviewed changes

core/dbt/context/base.py Show resolved Hide resolved

emmyoop requested changes Apr 26, 2022

View reviewed changes

jeremyyeo changed the title ~~Feature: Add set function to contexts~~ Feature: Add set/zip function to contexts Apr 27, 2022

gshank requested a review from emmyoop April 28, 2022 13:29

jeremyyeo added 2 commits May 4, 2022 20:09

Merge branch 'main' into feature/2345-more-python-functions

ea3dc24

add try_ equivalents

ce98e6f

remove defaults

2faab0e

add tests

73c67c2

emmyoop requested changes May 11, 2022

View reviewed changes

update tests

3594ced

jeremyyeo requested a review from emmyoop May 13, 2022 10:24

emmyoop approved these changes May 13, 2022

View reviewed changes

emmyoop merged commit 7e43f36 into main May 13, 2022

emmyoop deleted the feature/2345-more-python-functions branch May 13, 2022 15:25

jtcohen6 mentioned this pull request May 18, 2022

[CT-661] test_builtin_set_function is flaky #5266

Closed

emmyoop added this to the v1.2 milestone Jun 21, 2022

jtcohen6 mentioned this pull request Jun 29, 2022

Add set/zip to dbt-Jinja reference dbt-labs/docs.getdbt.com#1635

Closed

1 task

jtcohen6 mentioned this pull request Jul 14, 2022

v1.2 docs omnibus dbt-labs/docs.getdbt.com#1698

Merged

2 tasks

jtcohen6 mentioned this pull request Jul 14, 2022

[CT-562] Allow for opt in to context method failures #5167

Closed

jtcohen6 mentioned this pull request Jul 15, 2022

[CT-845] Rename try_* -> strict_* #5475

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Add set/zip function to contexts #5107

Feature: Add set/zip function to contexts #5107

jeremyyeo commented Apr 19, 2022 •

edited

Loading

github-actions bot commented Apr 19, 2022

jeremyyeo commented Apr 20, 2022 •

edited

Loading

emmyoop left a comment

gshank commented May 2, 2022

jeremyyeo commented May 3, 2022

gshank commented May 3, 2022

jeremyyeo commented May 4, 2022 •

edited

Loading

emmyoop commented May 4, 2022

gshank commented May 5, 2022

emmyoop left a comment

emmyoop May 11, 2022

emmyoop left a comment

emmyoop commented May 13, 2022

leahwicz commented May 13, 2022

skwaugh commented Jun 17, 2022

jtcohen6 commented Jul 14, 2022 •

edited

Loading

jeremyyeo commented Jul 14, 2022

jtcohen6 commented Jul 14, 2022 •

edited

Loading

jtcohen6 commented Jul 14, 2022

emmyoop commented Jul 14, 2022

jtcohen6 commented Jul 14, 2022 •

edited

Loading

nathaniel-may commented Jul 14, 2022 •

edited

Loading

jtcohen6 commented Jul 14, 2022 •

edited

Loading

jeremyyeo commented Jul 15, 2022

jtcohen6 commented Jul 15, 2022

Feature: Add set/zip function to contexts #5107

Feature: Add set/zip function to contexts #5107

Conversation

jeremyyeo commented Apr 19, 2022 • edited Loading

Description

Checklist

github-actions bot commented Apr 19, 2022

jeremyyeo commented Apr 20, 2022 • edited Loading

emmyoop left a comment

Choose a reason for hiding this comment

gshank commented May 2, 2022

jeremyyeo commented May 3, 2022

gshank commented May 3, 2022

jeremyyeo commented May 4, 2022 • edited Loading

emmyoop commented May 4, 2022

gshank commented May 5, 2022

emmyoop left a comment

Choose a reason for hiding this comment

emmyoop May 11, 2022

Choose a reason for hiding this comment

emmyoop left a comment

Choose a reason for hiding this comment

emmyoop commented May 13, 2022

leahwicz commented May 13, 2022

skwaugh commented Jun 17, 2022

jtcohen6 commented Jul 14, 2022 • edited Loading

jeremyyeo commented Jul 14, 2022

jtcohen6 commented Jul 14, 2022 • edited Loading

jtcohen6 commented Jul 14, 2022

emmyoop commented Jul 14, 2022

jtcohen6 commented Jul 14, 2022 • edited Loading

nathaniel-may commented Jul 14, 2022 • edited Loading

jtcohen6 commented Jul 14, 2022 • edited Loading

jeremyyeo commented Jul 15, 2022

jtcohen6 commented Jul 15, 2022

jeremyyeo commented Apr 19, 2022 •

edited

Loading

jeremyyeo commented Apr 20, 2022 •

edited

Loading

jeremyyeo commented May 4, 2022 •

edited

Loading

jtcohen6 commented Jul 14, 2022 •

edited

Loading

jtcohen6 commented Jul 14, 2022 •

edited

Loading

jtcohen6 commented Jul 14, 2022 •

edited

Loading

nathaniel-may commented Jul 14, 2022 •

edited

Loading

jtcohen6 commented Jul 14, 2022 •

edited

Loading