restrict python submission #5822

ChenyuLInx · 2022-09-13T02:35:07Z

resolves #5596

Description

Only allow submit_python_job being used in statement macro within materializition logic.
TODO

make adapter.submit_python_job in dbt snowflake not available in jinja

Checklist

I have read the contributing guide and understand what's expected of me
I have signed the CLA
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
I have opened an issue to add/update docs, or docs changes are not required/relevant for this PR
I have run changie new to create a changelog entry

github-actions · 2022-09-13T02:35:29Z

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

ChenyuLInx · 2022-09-13T02:39:15Z

core/dbt/clients/jinja.py

@@ -285,6 +292,13 @@ def __init__(
        self.node = node
        self.stack = stack

+        if self.context and "adapter" in self.context:
+            if "materialization" in self.macro.unique_id:
+                self.context["special_functions"] = self.context["adapter"].submit_python_job


@gshank This way user actually still can use this submit_python_job function via {{special_function(model, python_code)}} in macro, is there a way to actually have this in context but not allow jinja to access it?

Other ways I have been trying to do is save it somewhere else individually, but haven't found a good place, the MacroGenerater object for materialization macro and statement macro are actually two different objects, with only context being passed from one to another(node, stack are both None for materialization macro)

Also tag @jtcohen6 to see if you have any thoughts

ChenyuLInx · 2022-09-13T22:50:01Z

core/dbt/clients/jinja.py

@@ -301,6 +319,15 @@ def exception_handler(self) -> Iterator[None]:
            e.stack.append(self.macro)
            raise e

+    def _is_statement_under_materailization(self) -> bool:
+        return bool(
+            self.macro.unique_id == "macro.dbt.statement"


@jtcohen6 with this user can't actually access submit_python_job in custom statement macro. Any concern with that?

@ChenyuLInx I'm aligned. I'm not familiar with any reason or legitimate use case for an end user to override the statement macro — this should really be treated as internal to dbt-core.

If we want to loosen this restriction in the future, we can. (E.g. by changing this conditional to check for any macro named statement, rather than the unique_id containing project name dbt.)

ChenyuLInx · 2022-09-13T22:52:42Z

Don't have a test since we can't run python model in core now. but manually checked locally and made sure materialization-> statement is the only place that you can properly access python submit job function. You can also access it in materialization root level with some strange and clearly not good looking syntax. And not in all other places.

ChenyuLInx · 2022-09-13T22:57:03Z

Also skipping changelog since I don't think we want to tell everyone about it

jtcohen6 · 2022-09-14T12:40:15Z

core/dbt/clients/jinja.py

@@ -301,6 +319,15 @@ def exception_handler(self) -> Iterator[None]:
            e.stack.append(self.macro)
            raise e

+    def _is_statement_under_materailization(self) -> bool:


just a typo

Suggested change

def _is_statement_under_materailization(self) -> bool:

def _is_statement_under_materialization(self) -> bool:

and where it's called below

jtcohen6

Just to make sure I understand, the idea here is: submit_python_job can only be called from within a statement, within a materialization (macro containing the name materialization). Otherwise, the user should see an exception like:

submit_python_job is not intended to be called here.

I haven't been able to see the desired behavior when running locally. I tried checking out the branch, and then defining a model like:

-- models/my_model.sql
{% set compiled_code %}

def main(session):
    pass

{% endset %}

{% set result = adapter.submit_python_job(model, compiled_code) %}

-- actual view definition
select '{{ result }}' as result

This seems to actually work — it doesn't get blocked

jtcohen6 · 2022-09-14T12:48:46Z

core/dbt/clients/jinja.py

@@ -301,6 +319,15 @@ def exception_handler(self) -> Iterator[None]:
            e.stack.append(self.macro)
            raise e

+    def _is_statement_under_materailization(self) -> bool:
+        return bool(
+            self.macro.unique_id == "macro.dbt.statement"


@ChenyuLInx I'm aligned. I'm not familiar with any reason or legitimate use case for an end user to override the statement macro — this should really be treated as internal to dbt-core.

If we want to loosen this restriction in the future, we can. (E.g. by changing this conditional to check for any macro named statement, rather than the unique_id containing project name dbt.)

jtcohen6 · 2022-09-14T12:55:10Z

core/dbt/clients/jinja.py

@@ -272,6 +275,13 @@ def pop(self, name):
            raise InternalException(f"popped {got}, expected {name}")


+def raise_error_func(func_name: str) -> Callable:
+    def raise_error(*args, **kwargs):
+        raise InternalException(f"{func_name} is not intended to be called here.")


Will this include the name of the macro / node calling the unsupported function?

Not really, but I can add it!

Added for usage in models, for hooks, operation, test we can't really tell(they use MacroGenerator) but we don't get info about which function actually calls it unless we do some really aggressive tracking. So I just added a generic message indication it is in one of those places. Let me know if we want to be very specific.
I also included the full name adapter.submit_python_job in the message so shouldn't be hard to do a search

Really solid UX improvement - thank you!

ChenyuLInx · 2022-09-14T21:03:32Z

@jtcohen6 fixed the issue you mentioned by also remove that function for compilation time.

gshank

I'm kind of concerned with this approach. I think doing this in the MacroGenerator is almost certainly the wrong place. That code would execute for every single macro that's built. The adapter that's linked in the contexts comes from the global adapter? Why doesn't the last created MacroGenerator determine which monkeypatched method is active?

I think the MacroNamespaceBuilder is a more likely place to put this -- you can modify the context before it's used to create the MacroGenerator object. But I would prefer finding some way to just pass along information on the 'submit_python_job' call that would allow us to determine the validity in the python code. What is it that we need to know? The call stack?

ChenyuLInx · 2022-09-15T23:31:09Z

@gshank the MacroNamespaceBuilder sounds promising. One question, for the materialization macro does that also go through MacroNamespaceBuilder somehow? I feel like we need to somehow pass in the call stack of the macros if we want to check it at that function, not sure how that's going to look like. Scheduled a meeting tomorrow to talk this live

jtcohen6 · 2022-09-16T10:53:36Z

Really appreciate you taking a look here @gshank! If we're going to do this, it ought to be a surgical change in the right spot.

The high-level goal here is to restrict adapter.submit_python_job to only be callable within a statement of a materialization. We want to prevent users from firing up Python jobs willy-nilly (in post-hooks, in tests, wherever). We may want to support those as patterns in the future. So we're trying to put guardrails in place for now, with the possibility of removing them later, without adding a ton of cruft to the codebase in the process.

gshank · 2022-09-16T14:38:39Z

I wonder what the call stack looks like in adapter.submit_python_job. Could we recognize that it comes from a materialization?

Or if that doesn't work, maybe we could make a special context with a 'submit_python_job' method, with two flavors, the error-raising function and the "call adapter.submit_python_job' version, and limit the context with the working 'submit_python_job' to materialization macros. Then we could check the call stack in adapter.submit_python_job and make sure it's only coming from that method.

gshank · 2022-09-16T14:53:05Z

So maybe that second suggestion wouldn't work. It looks like the materialization macro is the wrapping thing that's called to execute models, so it's passed the model_context used by everything else. It wouldn't be very clean to have two contexts at that point...

ChenyuLInx · 2022-09-20T00:06:48Z

core/dbt/clients/jinja.py

-            # only mark depth=0 as a dependency
-            if depth == 0:
+            # only mark depth=0 as a dependency, when creating this dependency we don't pass in stack
+            if depth == 0 and self.node:


@gshank the depth is being kept as 0 because of the comment in I updated

gshank

This looks so much cleaner!

jtcohen6

I agree - looks much cleaner!

jtcohen6 · 2022-09-20T16:58:28Z

core/dbt/context/providers.py

+            raise RuntimeException(
+                f"submit_python_job is not intended to be called here, at model {parsed_model['alias']}, with macro call_stack {self.context_macro_stack.call_stack}."
+            )
+        return self.adapter.submit_python_job(parsed_model, compiled_code)


QQ: Will adapter.submit_python_job still be callable (as a classmethod) from within the Jinja context? Would we need to decorate it as "unavailable"?

It is no longer available from within Jinja context, we make functions available using @available decorator, and I removed that decorator for submit_python_job function. Also have a snowflake PR opened for it(given that snowflake actually submit things slightly different). Gonna also sync with the dbt-databricks maintainer to make sure they don't add that @available decorator anymore

https://github.com/dbt-labs/dbt-core/pull/5822/files#diff-aa1e9e602a0f1e2930b465b30a35a28778fbe131da4fbf2c4e88dd52ca44f140L1223

Resolves #182 ### Description Follows enhancement/refactor python submission in dbt-labs/dbt-spark#452 and dbt-labs/dbt-core#5822.

ChenyuLInx added 2 commits September 12, 2022 19:30

initial attempt

4fb6468

add a default value for special function

cc7a967

ChenyuLInx requested a review from gshank September 13, 2022 02:35

cla-bot bot added the cla:yes label Sep 13, 2022

ChenyuLInx commented Sep 13, 2022

View reviewed changes

jtcohen6 added the Team:Language label Sep 13, 2022

more complete block

d530c27

ChenyuLInx self-assigned this Sep 13, 2022

ChenyuLInx added the Skip Changelog Skips GHA to check for changelog file label Sep 13, 2022

ChenyuLInx marked this pull request as ready for review September 13, 2022 22:43

ChenyuLInx requested a review from a team as a code owner September 13, 2022 22:43

ChenyuLInx requested review from lostmygithubaccount and jtcohen6 September 13, 2022 22:43

ChenyuLInx commented Sep 13, 2022

View reviewed changes

jtcohen6 reviewed Sep 14, 2022

View reviewed changes

more block

e36a556

ChenyuLInx requested review from a team and iknox-fa September 14, 2022 15:53

better error message

159b093

ChenyuLInx requested a review from jtcohen6 September 14, 2022 21:01

fix typo

a8e3030

gshank requested changes Sep 15, 2022

View reviewed changes

gshank and others added 2 commits September 19, 2022 15:24

test setting macro_stack starting from materialization macro

cc0923c

better way to block submit python

d3da7ef

ChenyuLInx requested review from a team as code owners September 19, 2022 23:42

ChenyuLInx requested review from VersusFacit and gshank September 19, 2022 23:42

fix things

634c815

ChenyuLInx commented Sep 20, 2022

View reviewed changes

gshank approved these changes Sep 20, 2022

View reviewed changes

ChenyuLInx added 2 commits September 20, 2022 08:32

Merge branch 'main' into enhancement/block_submit_python

273160c

Update test_context.py

73e5ab2

ChenyuLInx merged commit 646a0c7 into main Sep 20, 2022

ChenyuLInx deleted the enhancement/block_submit_python branch September 20, 2022 16:39

jtcohen6 reviewed Sep 20, 2022

View reviewed changes

ChenyuLInx mentioned this pull request Sep 20, 2022

remove submit_python_job from context and additional PR feedback for #260 dbt-labs/dbt-snowflake#269

Merged

6 tasks

ueshin mentioned this pull request Sep 21, 2022

Follow enhancement/refactor python submission in dbt-spark databricks/dbt-databricks#178

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

restrict python submission #5822

restrict python submission #5822

ChenyuLInx commented Sep 13, 2022 •

edited

Loading

github-actions bot commented Sep 13, 2022

ChenyuLInx Sep 13, 2022

ChenyuLInx Sep 13, 2022

ChenyuLInx Sep 13, 2022

jtcohen6 Sep 14, 2022

ChenyuLInx commented Sep 13, 2022

ChenyuLInx commented Sep 13, 2022

jtcohen6 Sep 14, 2022 •

edited

Loading

ChenyuLInx Sep 14, 2022

jtcohen6 left a comment

jtcohen6 Sep 14, 2022

jtcohen6 Sep 14, 2022

ChenyuLInx Sep 14, 2022

ChenyuLInx Sep 14, 2022

jtcohen6 Sep 15, 2022

ChenyuLInx commented Sep 14, 2022

gshank left a comment

ChenyuLInx commented Sep 15, 2022

jtcohen6 commented Sep 16, 2022

gshank commented Sep 16, 2022

gshank commented Sep 16, 2022

ChenyuLInx Sep 20, 2022

gshank left a comment

jtcohen6 left a comment

jtcohen6 Sep 20, 2022

ChenyuLInx Sep 20, 2022

ChenyuLInx Sep 20, 2022

	def _is_statement_under_materailization(self) -> bool:
	def _is_statement_under_materialization(self) -> bool:

restrict python submission #5822

restrict python submission #5822

Conversation

ChenyuLInx commented Sep 13, 2022 • edited Loading

Description

Checklist

github-actions bot commented Sep 13, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ChenyuLInx commented Sep 13, 2022

ChenyuLInx commented Sep 13, 2022

jtcohen6 Sep 14, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jtcohen6 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ChenyuLInx commented Sep 14, 2022

gshank left a comment

Choose a reason for hiding this comment

ChenyuLInx commented Sep 15, 2022

jtcohen6 commented Sep 16, 2022

gshank commented Sep 16, 2022

gshank commented Sep 16, 2022

Choose a reason for hiding this comment

gshank left a comment

Choose a reason for hiding this comment

jtcohen6 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ChenyuLInx commented Sep 13, 2022 •

edited

Loading

jtcohen6 Sep 14, 2022 •

edited

Loading