Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better typing for Job and JobRunners #31240

Merged
merged 1 commit into from
May 15, 2023

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented May 12, 2023

By leveraging Generics, the typing for Runners and Job and JobPydantic is now more complete and accurate.

  • Scheduler and Backfill Runners limit their code to Job and can use all the things that ORM Job allows them to do

  • Other runners are limited to union of Job and JobPydantic version so that they can be run on the client side of the internal API without having all the Job features.

This is a follow up after #31182 that fixed missing job_type for DagProcessor Job and nicely extracted job to BaseRunner but broke MyPy/Typing guards implemented in the runners that should aid the AIP-44 implementation.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@potiuk potiuk requested review from kaxil, ashb and XD-DENG as code owners May 12, 2023 06:54
@boring-cyborg boring-cyborg bot added the area:Scheduler including HA (high availability) scheduler label May 12, 2023
@potiuk potiuk requested review from mhenc and uranusjr May 12, 2023 06:54
@potiuk
Copy link
Member Author

potiuk commented May 12, 2023

cc: @vincbeck @mhenc -> example of where a bit more complex typing can help us to deal with ORM/Pydantic distinction

@potiuk
Copy link
Member Author

potiuk commented May 12, 2023

cc: @uranusjr -> I'd appreciate your keen eye on it, I am not 100% sure if this is the right way of doing it, typing system of Python when it comes to Generics is kinda convoluted and I am still learning it. The way I implemented it, however should explain the intent I had, but maybe there are better ways of implementing the intent.

@potiuk potiuk force-pushed the add-generic-typing-to-job branch from ad2cb93 to b690b8b Compare May 12, 2023 07:06
@potiuk
Copy link
Member Author

potiuk commented May 12, 2023

Thanks @uranusjr - yep, this is much nicer now :)

@potiuk potiuk force-pushed the add-generic-typing-to-job branch from b690b8b to e345c9b Compare May 12, 2023 07:44
@potiuk potiuk force-pushed the add-generic-typing-to-job branch from e345c9b to f5aac1e Compare May 12, 2023 08:22
@potiuk
Copy link
Member Author

potiuk commented May 12, 2023

Actually - I am at a loss now :).

had to remove it all from TYPE_CHECKING actually (Generic cannot have types defined in TYPE_CHECKING). ... And when I did ....

airflow/jobs/local_task_job_runner.py:90: error: Argument 1 to "__init__" of
"BaseJobRunner" has incompatible type "JobPydantic"; expected "JobOrJobPydantic"
 [arg-type]
            super().__init__(job)

I even tried to cast it and ...

airflow/jobs/local_task_job_runner.py:90: error: Redundant cast to "JobPydantic" 
[redundant-cast]
            super().__init__(cast(JobOrJobPydantic, job))

Typing is hard :)

@potiuk
Copy link
Member Author

potiuk commented May 12, 2023

Somehow mypy thinks JobOrJobPydantic is actually JobPydantic :)

@potiuk potiuk marked this pull request as draft May 12, 2023 09:18
@potiuk
Copy link
Member Author

potiuk commented May 12, 2023

Aactually - turned out to be quite a bit more - moving out of TYPE_CHECKING causes (of course) circular imports so things get messy (have to rethink it)

@ashb
Copy link
Member

ashb commented May 12, 2023

@potiuk Might using a protocol here be a fix? Rather than trying to say JobOrJobPydantic, define a Protocol that has the type we need, which I think might be just this:

class JobProtocol(Protocol):
    job_type: str

@potiuk
Copy link
Member Author

potiuk commented May 12, 2023

Yeah. That was my next idea to try :)

By avoiding setting the job in the BaseJobRunner, the typing for Runners
and Job and JobPydantic is now more complete and accurate.

Scheduler and Backfill Runners limit their code to Job and can use all
the things that ORM Job allows them to do

Other runners are limited to union of Job and JobPydantic version so
that they can be run on the client side of the internal API without
having all the Job features.

This is a follow up after apache#31182 that fixed missing job_type for
DagProcessor Job and nicely extracted job to BaseRunner but broke
MyPy/Typing guards implemented in the runners that should aid the AIP-44
implementation.
@potiuk potiuk force-pushed the add-generic-typing-to-job branch from f5aac1e to a577260 Compare May 14, 2023 23:11
@potiuk potiuk marked this pull request as ready for review May 14, 2023 23:12
@potiuk
Copy link
Member Author

potiuk commented May 14, 2023

Actually, it was easier than I thought. All that was needed was to move setting self.job up to the concrete runners (while leaving passing Job/JobPydantic) to the BaseJobRunner constructor, so that it can set the right job_type.

This (counterintuitively - we do not set job as a field in baseRunner, it makes perfect sense.

  • we do not set the job in base runner (not needed nor used there)
  • we set the job in concrete runners (and there it has the right type)
  • we won't forget to set the "job_type" by the runner when assigned to a Job as it is set in the base runner
  • no need to add extra Protocol class

@potiuk potiuk requested review from uranusjr and eladkal May 14, 2023 23:17
Copy link
Member

@uranusjr uranusjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ll experiement some other approaches later as well, this assignment still fits better in the base class.

@potiuk potiuk merged commit 9432a3f into apache:main May 15, 2023
@potiuk potiuk deleted the add-generic-typing-to-job branch May 15, 2023 08:39
@ephraimbuddy ephraimbuddy added this to the Airflow 2.7.0 milestone Jul 6, 2023
@ephraimbuddy ephraimbuddy added the type:improvement Changelog: Improvements label Jul 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:Scheduler including HA (high availability) scheduler type:improvement Changelog: Improvements
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

6 participants