-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decouple "job runner" from BaseJob ORM model #30255
Conversation
cc: @vincbeck |
@@ -1,225 +0,0 @@ | |||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be old and unused/unusable test class.
Hello Everyone, I've been working on it on-off for quite a while as I attempted to make things work for the AIP-44. I actually attempted to do in a few different ways and I did not like the previous ones. Ths one I think is at the sweet-spot of solving the (a little) intertwined BaseJob dependencies with the need of decoupling of the Task running from the ORM models.. A little context on that one.LocalTaskJob as we have it curently implemented inherits from BaseJob (same as other jobs). It is in fact a polimorphic dependency - all of the jobs are stored in the same 'BaseJob' table. This is (and always has been) a little problematic - because the ''Job" objects inherit from the ORM object and there is an assumption that they are DB-related, on the other had they also had the "running" logic implemented in I attempted to do it in various ways but I had the goals: a) the resulting architecture wil decouple from the ORM object from the logic (so that we could have serialized Pydantic objects introduced in #29776 used instead (so basically we should be able to pass and use BaseJob and BaseJobPydantic around) b) it shoud touch as little logic change as possible (basically shuffling around objects and calling different objects was most of the changes I wanted to do) - so that it will be easy to review and reason about. c) the resulting architecture will make sense ResultI think I finally achieved all three goals. Summarizing of what has been done here:
Current stateIt looks like a huge change but if you look closly most of the changes are changes in tests to adapt to the new object hierarchy. So I hope the review will not be that difficult. I still have a few (heartbeat) tests not passing and I am working on those (likely something missing in heartbeat processing) - but other than those, I think everything else is in place. FutureNow - this is not YET AIP-44-compatible change. This refactor is just a basic decoupling. We will need to implement several other follow up changes after this one is merged:
Follow upsI am not sure of that but those changes also make it possible to do something else. Namely they allow us to limit for how long connections are opened from running tasks. Previously we kept them open all the time when the task was running and that was kinda strange as in most circumstances we only needed it to do some initial setup, heartbeat and save job state when complete. I think this change will enable something else (but that's something to see when the other changes are completed - we could optimize that away and (mimicking what Internal API will be doing) we could only get the session/connections established for short times by the running task. I hope we can get there. Looking forward to comments and feedback. BTW. It's I think impossible to split this PR into smaller ones :( BTW2. DON'T be scared about the size of hte change. It's not as big as it seems - it just needed a lot of changes in tests. The "code" changes:
The
So it is not that big. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice decoupling! I really like that. That would make things easier for AIP-44. I only have 2 nits, the decoupling + the code make sense to me.
PS: +1 on the "it is easier to review than it seems"
5ba89fb
to
242c9ef
Compare
242c9ef
to
2671674
Compare
As a follow-up after decoupling of the job logic from the BaseJob ORM object (apache#30255), the `run` method of BaseJob should also be decoupled from it (allowing BaseJobPydantic to be passed) as well as split into three steps, in order to allow db-less mode. The "prepare" and "complete" steps of the `run` method are modifying BaseJob ORM-mapped object, so they should be called over the internal-api from LocalTask, DafFileProcessor and Triggerer running in db-less mode. The "execute" method however does not need the database however and should be run locally. This is not yet full AIP-44 conversion, this is a prerequisite to do so - and AIP-44 conversion will be done as a follow-up after this one. However we added a mermaid diagram showing the job lifecycle with and without Internal API to make it easier to reason about it Closes: apache#30295
…#30308) * Separate and split run job method into prepare/execute/complete steps As a follow-up after decoupling of the job logic from the BaseJob ORM object (#30255), the `run` method of BaseJob should also be decoupled from it (allowing BaseJobPydantic to be passed) as well as split into three steps, in order to allow db-less mode. The "prepare" and "complete" steps of the `run` method are modifying BaseJob ORM-mapped object, so they should be called over the internal-api from LocalTask, DafFileProcessor and Triggerer running in db-less mode. The "execute" method however does not need the database however and should be run locally. This is not yet full AIP-44 conversion, this is a prerequisite to do so - and AIP-44 conversion will be done as a follow-up after this one. However we added a mermaid diagram showing the job lifecycle with and without Internal API to make it easier to reason about it Closes: #30295 Co-authored-by: Jed Cunningham <[email protected]>
This is the final step of decoupling of the job runner from ORM based BaseJob. After this change, finally we rich the state that the BaseJob is just a state of the Job being run, but all the logic is kept in separate "JobRunner" entity which just keeps the reference to the job. Also it makes sure that job in each runner is defined as appropriate for each job type: * SchedulerJobRunner, BackfillJobRunner can only use BaseJob * DagProcessorJobRunner, TriggererJobRunner and especially the LocalTaskJobRunner can keep both BaseJob and it's Pydantic BaseJobPydantic representation - for AIP-44 usage. The highlights of this change: * Job does not have job_runner reference any more * Job is a mandatory parameter when creating each JobRunner * run_job method takes as parameter the job (i.e. where the state of the job is called) and executor_callable - i.e. the method to run when the job gets executed * heartbeat callback is also passed a generic callable in order to execute the post-heartbeat operation of each of the job type * there is no more need to specify job_type when you create BaseJob, the job gets its type by a simply creating a runner with the job This is the final stage of refactoring that was split into reviewable stages: apache#30255 -> apache#30302 -> apache#30308 -> this PR. Closes: apache#30325
This is the final step of decoupling of the job runner from ORM based BaseJob. After this change, finally we rich the state that the BaseJob is just a state of the Job being run, but all the logic is kept in separate "JobRunner" entity which just keeps the reference to the job. Also it makes sure that job in each runner is defined as appropriate for each job type: * SchedulerJobRunner, BackfillJobRunner can only use BaseJob * DagProcessorJobRunner, TriggererJobRunner and especially the LocalTaskJobRunner can keep both BaseJob and it's Pydantic BaseJobPydantic representation - for AIP-44 usage. The highlights of this change: * Job does not have job_runner reference any more * Job is a mandatory parameter when creating each JobRunner * run_job method takes as parameter the job (i.e. where the state of the job is called) and executor_callable - i.e. the method to run when the job gets executed * heartbeat callback is also passed a generic callable in order to execute the post-heartbeat operation of each of the job type * there is no more need to specify job_type when you create BaseJob, the job gets its type by a simply creating a runner with the job This is the final stage of refactoring that was split into reviewable stages: #30255 -> #30302 -> #30308 -> this PR. Closes: #30325
Sinc apache#30255 scheduler heartrate has not been properly calculated. We missed the check for SchedulerJob type and setting heartrate value from `scheduler_health_check_threshold`. This PR fixes it. Fix: apache#37971
Originally BaseJob ORM model was extended and Polymorphism has been used to tie different execution logic to different job types. This has proven to be difficult to handle during AIP-44 implementation (internal API) because LocalTaskJob, DagProcessorJob and TriggererJob are all going to not use the ORM BaseJob model, but they should use BaseJobPydantic instead. In order to make it possible, we introduce a new type of object BaseJobRunner and make BaseJob use the runners instead.
This way, the BaseJobRunners are used for the logic of each of the job, where single, non-polimorphic BaseJob is used to keep the records in the database - as a follow up it will allow to completely decouple the job database operations and move it to internal_api component when db-lesss mode is enabled.
Closes: #30294
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.