-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
restrict python submission #5822
Conversation
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide. |
core/dbt/clients/jinja.py
Outdated
@@ -285,6 +292,13 @@ def __init__( | |||
self.node = node | |||
self.stack = stack | |||
|
|||
if self.context and "adapter" in self.context: | |||
if "materialization" in self.macro.unique_id: | |||
self.context["special_functions"] = self.context["adapter"].submit_python_job |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gshank This way user actually still can use this submit_python_job function via {{special_function(model, python_code)}}
in macro, is there a way to actually have this in context but not allow jinja to access it?
Other ways I have been trying to do is save it somewhere else individually, but haven't found a good place, the MacroGenerater
object for materialization macro
and statement macro
are actually two different objects, with only context being passed from one to another(node, stack are both None for materialization macro
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also tag @jtcohen6 to see if you have any thoughts
core/dbt/clients/jinja.py
Outdated
@@ -301,6 +319,15 @@ def exception_handler(self) -> Iterator[None]: | |||
e.stack.append(self.macro) | |||
raise e | |||
|
|||
def _is_statement_under_materailization(self) -> bool: | |||
return bool( | |||
self.macro.unique_id == "macro.dbt.statement" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jtcohen6 with this user can't actually access submit_python_job in custom statement
macro. Any concern with that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ChenyuLInx I'm aligned. I'm not familiar with any reason or legitimate use case for an end user to override the statement
macro — this should really be treated as internal to dbt-core
.
If we want to loosen this restriction in the future, we can. (E.g. by changing this conditional to check for any macro named statement
, rather than the unique_id
containing project name dbt
.)
Don't have a test since we can't run python model in core now. but manually checked locally and made sure |
Also skipping changelog since I don't think we want to tell everyone about it |
core/dbt/clients/jinja.py
Outdated
@@ -301,6 +319,15 @@ def exception_handler(self) -> Iterator[None]: | |||
e.stack.append(self.macro) | |||
raise e | |||
|
|||
def _is_statement_under_materailization(self) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a typo
def _is_statement_under_materailization(self) -> bool: | |
def _is_statement_under_materialization(self) -> bool: |
and where it's called below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure I understand, the idea here is: submit_python_job
can only be called from within a statement
, within a materialization (macro containing the name materialization
). Otherwise, the user should see an exception like:
submit_python_job is not intended to be called here.
I haven't been able to see the desired behavior when running locally. I tried checking out the branch, and then defining a model like:
-- models/my_model.sql
{% set compiled_code %}
def main(session):
pass
{% endset %}
{% set result = adapter.submit_python_job(model, compiled_code) %}
-- actual view definition
select '{{ result }}' as result
This seems to actually work — it doesn't get blocked
core/dbt/clients/jinja.py
Outdated
@@ -301,6 +319,15 @@ def exception_handler(self) -> Iterator[None]: | |||
e.stack.append(self.macro) | |||
raise e | |||
|
|||
def _is_statement_under_materailization(self) -> bool: | |||
return bool( | |||
self.macro.unique_id == "macro.dbt.statement" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ChenyuLInx I'm aligned. I'm not familiar with any reason or legitimate use case for an end user to override the statement
macro — this should really be treated as internal to dbt-core
.
If we want to loosen this restriction in the future, we can. (E.g. by changing this conditional to check for any macro named statement
, rather than the unique_id
containing project name dbt
.)
core/dbt/clients/jinja.py
Outdated
@@ -272,6 +275,13 @@ def pop(self, name): | |||
raise InternalException(f"popped {got}, expected {name}") | |||
|
|||
|
|||
def raise_error_func(func_name: str) -> Callable: | |||
def raise_error(*args, **kwargs): | |||
raise InternalException(f"{func_name} is not intended to be called here.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this include the name of the macro / node calling the unsupported function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, but I can add it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added for usage in models, for hooks, operation, test we can't really tell(they use MacroGenerator
) but we don't get info about which function actually calls it unless we do some really aggressive tracking. So I just added a generic message indication it is in one of those places. Let me know if we want to be very specific.
I also included the full name adapter.submit_python_job
in the message so shouldn't be hard to do a search
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really solid UX improvement - thank you!
@jtcohen6 fixed the issue you mentioned by also remove that function for compilation time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm kind of concerned with this approach. I think doing this in the MacroGenerator is almost certainly the wrong place. That code would execute for every single macro that's built. The adapter that's linked in the contexts comes from the global adapter? Why doesn't the last created MacroGenerator determine which monkeypatched method is active?
I think the MacroNamespaceBuilder is a more likely place to put this -- you can modify the context before it's used to create the MacroGenerator object. But I would prefer finding some way to just pass along information on the 'submit_python_job' call that would allow us to determine the validity in the python code. What is it that we need to know? The call stack?
@gshank the |
Really appreciate you taking a look here @gshank! If we're going to do this, it ought to be a surgical change in the right spot. The high-level goal here is to restrict |
I wonder what the call stack looks like in adapter.submit_python_job. Could we recognize that it comes from a materialization? Or if that doesn't work, maybe we could make a special context with a 'submit_python_job' method, with two flavors, the error-raising function and the "call adapter.submit_python_job' version, and limit the context with the working 'submit_python_job' to materialization macros. Then we could check the call stack in adapter.submit_python_job and make sure it's only coming from that method. |
So maybe that second suggestion wouldn't work. It looks like the materialization macro is the wrapping thing that's called to execute models, so it's passed the model_context used by everything else. It wouldn't be very clean to have two contexts at that point... |
# only mark depth=0 as a dependency | ||
if depth == 0: | ||
# only mark depth=0 as a dependency, when creating this dependency we don't pass in stack | ||
if depth == 0 and self.node: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gshank the depth is being kept as 0 because of the comment in I updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks so much cleaner!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree - looks much cleaner!
raise RuntimeException( | ||
f"submit_python_job is not intended to be called here, at model {parsed_model['alias']}, with macro call_stack {self.context_macro_stack.call_stack}." | ||
) | ||
return self.adapter.submit_python_job(parsed_model, compiled_code) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QQ: Will adapter.submit_python_job
still be callable (as a classmethod) from within the Jinja context? Would we need to decorate it as "unavailable"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is no longer available from within Jinja context, we make functions available using @available
decorator, and I removed that decorator for submit_python_job
function. Also have a snowflake PR opened for it(given that snowflake actually submit things slightly different). Gonna also sync with the dbt-databricks maintainer to make sure they don't add that @available
decorator anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolves #182 ### Description Follows enhancement/refactor python submission in dbt-labs/dbt-spark#452 and dbt-labs/dbt-core#5822.
resolves #5596
Description
Only allow
submit_python_job
being used in statement macro within materializition logic.TODO
Checklist
changie new
to create a changelog entry