[CT-972] [Feature] Block usage of `submit_python_job` outside of materialization logic #5596

ChenyuLInx · 2022-08-01T23:43:02Z

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Throw a clear error during parsing(or later time in dbt run) if adapter.submit_python_job is being used in places other than the materialization logic.

Relative information:

adapter.submit_python_job is currently being called in macro statement.

Ideas:

when we parse macro we can look up the macro tree and check that for all macro nodes that isn't materialization -> statement, we run regex on it to make sure adapter.submit_python_job is not there. This could be very costy
somehow update how we provide context to macros and only provide adapter.submit_python_job in desired situation. This means we will have to remove it from the context sometime during jinjia compilation, this could be complex

related links

The text was updated successfully, but these errors were encountered:

ChenyuLInx · 2022-08-01T23:43:35Z

@jtcohen6 either way feels not very easy, any idea?

lostmygithubaccount · 2022-08-02T19:24:29Z

@ChenyuLInx to spike

ChenyuLInx · 2022-08-02T22:19:30Z

Talked about this live, the manipulating context idea might be the better approach

ChenyuLInx · 2022-08-17T21:55:54Z

@jtcohen6 @lostmygithubaccount I think I found a path to modify the context so that we only allow the code(not saying I like it)

dbt-core/core/dbt/clients/jinja.py

Lines 301 to 321 in a1ee348

    
           @contextmanager 
        
           def track_call(self): 
        
               # This is only called from __call__ 
        
               if self.stack is None or self.node is None: 
        
                   yield 
        
               else: 
        
                   unique_id = self.macro.unique_id 
        
                   depth = self.stack.depth 
        
                   # only mark depth=0 as a dependency 
        
                   if depth == 0: 
        
                       self.node.depends_on.add_macro(unique_id) 
        
                   self.stack.push(unique_id) 
        
                   try: 
        
                       yield 
        
                   finally: 
        
                       self.stack.pop(unique_id) 
        
           # this makes MacroGenerator objects callable like functions 
        
           def __call__(self, *args, **kwargs): 
        
               with self.track_call(): 
        
                   return self.call_macro(*args, **kwargs)

The trace_call function here is called for each macro call and all the sub macro calls. So we can do something like: when a macro being called, we figure out whether it is materialization macro(from name), if yes, preserve that submit_python_job function somewhere, update it with a non_op/raise error, then if we run into the statement , we put it back. This way we only allow using the submit_python_job being used in materialization -> statement. still not a total block, but better.
We can't really only allow it in for only statement with name main since we call it in create tmp table sometime. otherwise would that would be the way to only allow 1 python job submission per model. We can do something even more funky to add language to results of statement macro and do some check afterwards.

So the situation now is: we can limit this, with some less than ideal method, and the more restriction we want to have, the less ideal the implementation is going to be. How far do we want to go?

jtcohen6 · 2022-08-18T13:00:59Z

@ChenyuLInx Thanks for the investigation!

This way we only allow using the submit_python_job being used in materialization -> statement. Still not a total block, but better.

I'd be happy to proceed with this approach — not a complete block, but a helpful guardrail — so long as it doesn't add too much cruft to this tightly wound part of the codebase. If it seems like a ton of work, we'd need to weigh it against other priorities during the beta period.

Tracking calls between macros feels like a thread we might want to pull on more in the future, e.g. to understand which macros are "dirty" / volatile and depend upon introspective queries for their results.

ChenyuLInx added enhancement New feature or request triage python_models and removed triage labels Aug 1, 2022

github-actions bot changed the title ~~[Feature] Block usage of submit_python_job outside of materialization logic~~ [CT-972] [Feature] Block usage of submit_python_job outside of materialization logic Aug 1, 2022

ChenyuLInx mentioned this issue Aug 30, 2022

refactor submission method and add command API as defualt dbt-labs/dbt-spark#442

Merged

4 tasks

ChenyuLInx mentioned this issue Sep 13, 2022

restrict python submission #5822

Merged

7 tasks

ChenyuLInx closed this as completed in #5822 Sep 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CT-972] [Feature] Block usage of `submit_python_job` outside of materialization logic #5596

[CT-972] [Feature] Block usage of `submit_python_job` outside of materialization logic #5596

ChenyuLInx commented Aug 1, 2022 •

edited

Loading

ChenyuLInx commented Aug 1, 2022

lostmygithubaccount commented Aug 2, 2022

ChenyuLInx commented Aug 2, 2022

ChenyuLInx commented Aug 17, 2022 •

edited

Loading

jtcohen6 commented Aug 18, 2022

[CT-972] [Feature] Block usage of submit_python_job outside of materialization logic #5596

[CT-972] [Feature] Block usage of submit_python_job outside of materialization logic #5596

Comments

ChenyuLInx commented Aug 1, 2022 • edited Loading

Is this your first time submitting a feature request?

Describe the feature

Relative information:

Ideas:

related links

ChenyuLInx commented Aug 1, 2022

lostmygithubaccount commented Aug 2, 2022

ChenyuLInx commented Aug 2, 2022

ChenyuLInx commented Aug 17, 2022 • edited Loading

jtcohen6 commented Aug 18, 2022

[CT-972] [Feature] Block usage of `submit_python_job` outside of materialization logic #5596

[CT-972] [Feature] Block usage of `submit_python_job` outside of materialization logic #5596

ChenyuLInx commented Aug 1, 2022 •

edited

Loading

ChenyuLInx commented Aug 17, 2022 •

edited

Loading