Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-972] [Feature] Block usage of submit_python_job outside of materialization logic #5596

Closed
3 tasks done
ChenyuLInx opened this issue Aug 1, 2022 · 5 comments · Fixed by #5822
Closed
3 tasks done
Labels
enhancement New feature or request python_models

Comments

@ChenyuLInx
Copy link
Contributor

ChenyuLInx commented Aug 1, 2022

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Throw a clear error during parsing(or later time in dbt run) if adapter.submit_python_job is being used in places other than the materialization logic.

Relative information:

adapter.submit_python_job is currently being called in macro statement.

Ideas:

  • when we parse macro we can look up the macro tree and check that for all macro nodes that isn't materialization -> statement, we run regex on it to make sure adapter.submit_python_job is not there. This could be very costy
  • somehow update how we provide context to macros and only provide adapter.submit_python_job in desired situation. This means we will have to remove it from the context sometime during jinjia compilation, this could be complex

related links

@ChenyuLInx ChenyuLInx added enhancement New feature or request triage python_models and removed triage labels Aug 1, 2022
@github-actions github-actions bot changed the title [Feature] Block usage of submit_python_job outside of materialization logic [CT-972] [Feature] Block usage of submit_python_job outside of materialization logic Aug 1, 2022
@ChenyuLInx
Copy link
Contributor Author

@jtcohen6 either way feels not very easy, any idea?

@lostmygithubaccount
Copy link
Contributor

@ChenyuLInx to spike

@ChenyuLInx
Copy link
Contributor Author

Talked about this live, the manipulating context idea might be the better approach

@ChenyuLInx
Copy link
Contributor Author

ChenyuLInx commented Aug 17, 2022

@jtcohen6 @lostmygithubaccount I think I found a path to modify the context so that we only allow the code(not saying I like it)

@contextmanager
def track_call(self):
# This is only called from __call__
if self.stack is None or self.node is None:
yield
else:
unique_id = self.macro.unique_id
depth = self.stack.depth
# only mark depth=0 as a dependency
if depth == 0:
self.node.depends_on.add_macro(unique_id)
self.stack.push(unique_id)
try:
yield
finally:
self.stack.pop(unique_id)
# this makes MacroGenerator objects callable like functions
def __call__(self, *args, **kwargs):
with self.track_call():
return self.call_macro(*args, **kwargs)

The trace_call function here is called for each macro call and all the sub macro calls. So we can do something like: when a macro being called, we figure out whether it is materialization macro(from name), if yes, preserve that submit_python_job function somewhere, update it with a non_op/raise error, then if we run into the statement , we put it back. This way we only allow using the submit_python_job being used in materialization -> statement. still not a total block, but better.
We can't really only allow it in for only statement with name main since we call it in create tmp table sometime. otherwise would that would be the way to only allow 1 python job submission per model. We can do something even more funky to add language to results of statement macro and do some check afterwards.

So the situation now is: we can limit this, with some less than ideal method, and the more restriction we want to have, the less ideal the implementation is going to be. How far do we want to go?

@jtcohen6
Copy link
Contributor

@ChenyuLInx Thanks for the investigation!

This way we only allow using the submit_python_job being used in materialization -> statement. Still not a total block, but better.

I'd be happy to proceed with this approach — not a complete block, but a helpful guardrail — so long as it doesn't add too much cruft to this tightly wound part of the codebase. If it seems like a ton of work, we'd need to weigh it against other priorities during the beta period.

Tracking calls between macros feels like a thread we might want to pull on more in the future, e.g. to understand which macros are "dirty" / volatile and depend upon introspective queries for their results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request python_models
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants