Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes Executor Load Time Optimizations #30727

Conversation

o-nikolas
Copy link
Contributor

@o-nikolas o-nikolas commented Apr 19, 2023

Overview

This PR aims to improve the time it takes to load/import the kubernetes executor module.

Motivation

Executors are imported in more places now that various compatibility checks are in core Airflow code (re: AIP-51). Also, decreasing import times is very important for the work of executors vending CLI commands (see #29055), since the CLI code in Airflow is particularly sensitive to slow imports (because all code is loaded fresh each time you run an individual Airflow CLI command).

The changes

  1. Move some expensive typing related imports to be under TYPE_CHECKING.
  2. Refactor expensive classes (other than the KubernetesExecutor) out of the kubernetes_executor.py module into a utils module so they are only loaded at runtime. Classes moved include KubernetesJobWatcher, AirflowKubernetesScheduler, ResourceVersion
  3. Also move some imports closer to their usage.

Testing

I benchmarked these changes by writing a script to import the executor module in a fresh python runtime and timing how long that takes (you can test this yourself quickly from a bash shell by doing something like time python -c 'from airflow.executors.local_executor import KubernetesExecutor'). Then doing that in a loop for several samples (with some randomness in the order for fairness) both on main and on my development branch.

Results

Screenshot from 2023-04-21 17-16-13


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added area:CLI provider:cncf-kubernetes Kubernetes provider related issues area:Scheduler including HA (high availability) scheduler labels Apr 19, 2023
@o-nikolas o-nikolas marked this pull request as draft April 19, 2023 01:07
@o-nikolas o-nikolas force-pushed the onikolas/kubernetes_executor_load_time_optimizations branch 5 times, most recently from 3b41453 to 8c31199 Compare April 24, 2023 18:37
@o-nikolas o-nikolas marked this pull request as ready for review April 24, 2023 19:26
@o-nikolas
Copy link
Contributor Author

@potiuk @uranusjr This one is now green and ready for review. More executor load time optimizations, similar to what you reviewed in #30361, but this one is targeted towards the kubernetes executor

@o-nikolas o-nikolas force-pushed the onikolas/kubernetes_executor_load_time_optimizations branch 4 times, most recently from 91c7c9c to 02183fa Compare April 27, 2023 20:17
@o-nikolas
Copy link
Contributor Author

Just resolved the latest conflicts and got the build green again. I'd like to get this one merged soon since due to the refactoring there are conflicts everytime someone edits the kubernetes executor in main.

@potiuk and @uranusjr if you have time please take a look, thanks!

@potiuk
Copy link
Member

potiuk commented Apr 29, 2023

Not an easy one to review, even if it is just a refactor :(

@potiuk
Copy link
Member

potiuk commented Apr 29, 2023

Just a thought. @o-nikolas . MAYBE there is a way to split that one into two (at least) prs -> one extracting/renaming stuff/methods, maybe extracting the common k8S executor types, and another moving stuff arround between files and splitting it?

This way the "move" will be easier to review as it will be basically same code here and there, just moved around.

I think this is what makes it difficult to review that it is all-in-one. Also it might be in the future easier to find any problems if you do it this way.

This is the technique i used in the "base_job" refactoring case and I think it makes a lot of sense ? Would it be possible that you give it a try ?

I am not trying to make more work for you BTW, Just thinking about the case when we will find some bugs in the future and will want to track the changes for it.

@o-nikolas
Copy link
Contributor Author

Hey @potiuk, thanks for the review! 😃

Just a thought. @o-nikolas . MAYBE there is a way to split that one into two (at least) prs -> one extracting/renaming code/methods, maybe extracting the common k8S executor types, and another moving stuff arround between files and splitting it?

The extracted method and modified code that is unrelated to the splitting is a very small amount of code. Most changes are related to the moved code. So I decided to leave it all in one.

This is the technique i used in the "base_job" refactoring case and I think it makes a lot of sense ? Would it be possible that you give it a try ?

Yupp, I agree with this philosophy, the overall code I'm delivering is already broken up. By the end it will be about 3 or 4 PRs, with one already merged, this one, and some more to come after. The process becomes very grueling if I separate them all even more, I will end up with 6-8 PRs. Which is especially frustrating when reviews are hard to come by and I must constanty manage the conflicts from main.

However, if I have still not convinced you I can try separate out what I can on Monday. Let me know what you think. Thanks again for the review I really do appreciate it 🙏

@potiuk
Copy link
Member

potiuk commented Apr 29, 2023

Which is especially frustrating when reviews are hard to come by and I must constanty manage the conflicts from main.

I know - been there, done that, but let's say now since I asked for it I might more self-push myself to do it quickly.

@o-nikolas
Copy link
Contributor Author

In this particular case, I really do feel strongly that it is unwarranted to split this particular PR further. There have been many PRs before of refactored code which are pages long. This change set is really just moving code from kubernetes_executor.py to two other modules (without changing any of the functionality). Some moved imports (closer to use) and a single helper function are the only modification to kubernetes_executor.py and they are easy to tell apart because they are the green/additions in the diff for that module (all the other big red/deletions are the aforementioned migration of code).

@uranusjr gave it a 👍 (other than a small nit on module name which I'm waiting to hear back on), hopefully some others can have a look. Perhaps @jedcunningham, since he's worked on the kubernetes executor lately and will be familiar with the code.

airflow/executors/kubernetes_executor.py Outdated Show resolved Hide resolved
airflow/executors/kubernetes_executor.py Outdated Show resolved Hide resolved
airflow/executors/kubernetes_executor.py Show resolved Hide resolved
airflow/executors/kubernetes_executor.py Outdated Show resolved Hide resolved
return cls._instance


class KubernetesJobWatcher(multiprocessing.Process, LoggingMixin):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do folks think about moving the core executor into kubernetes_executor/__init__.py, and moving the stuff we are pulling out into files under that new folder? I'm not sure I love the kubernetes_executor_x.py pattern we are establishing here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That to me felt a bit invasive, especially since executors are public-ish (people modify them with plugins or base new executors off of them).

We plan to move the executors to their own provider packages very soon so module names/locations will all be shuffled then. Can we agree to defer this piece of feedback until that time? I've already gotten feedback that this PR is too large, so if I could keep the two decoupled that'd be awesome!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that new approach would have any more impact than the current one, on the public-ish-ness front. If folks are literally patching the executor, sure, but 🤷‍♂️.

My concern is how soon "very soon" is in reality. I'd hate to land 2.7 with this pattern in it. I'd be happier to say we do the refactoring in a follow up than tie it to the provider move.

Copy link
Contributor Author

@o-nikolas o-nikolas May 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that new approach would have any more impact than the current one, on the public-ish-ness front.

That's my bad, I should have been more clear. Let me take another hack at it: what I meant by that was let's not break things twice for users by changing the import path of the executor now and then changing it again when it's moved to it's own provider package. Not that it will be any different, just package the changes to paths all at once.

I'd be happier to say we do the refactoring in a follow up than tie it to the provider move.

If you still disagree with the above then that's fair (though I'm curious to hear!) and I'd agree to cutting a ticket to move it after this PR 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what I meant by that was let's not break things twice for users by changing the import path of the executor now

That's the beauty of it though, users would still:

from airflow.executors.kubernetes_executor import KubernetesExecutor

So the stuff that stays in the "main" file just moves on disk, how folks would import it wouldn't change at all. (We can ignore the stuff being moved like KubernetesJobWatcher, those are changing either way)

Am I overlooking something here (very possible)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I overlooking something here (very possible)?

Ah nope, I see what you mean now! Yeah, I think that would cover most interactions that users have with patching and base classing the executor. I'm happy to make those changes, but I'd request that it'd be in a follow-up PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I love the kubernetes_executor_x.py pattern

But I will note that I'd still name the module something quite similar to this, even in it's own dir. Since generic module names like utils.py can be difficult to work with due to the shared namespace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That to me felt a bit invasive, especially since executors are public-ish (people modify them with plugins or base new executors off of them).

I think executor user-facing behavior is public but executor code is not public.

I think this was the consensus we arrived at, more or less, when @potiuk brought it up on the list, though I don't recall if it ever made it into "policy" via a PR.

So if you subclass an exectuor, you do so very much at your own risk. And therefore we can change path of executor or helpers without worrying about backcompat, and refactor / move executor code at will as long as it doesn't break user-facing behavior (e.g. some radical breaking change w.r.t. user-supplied executor_config).

@o-nikolas o-nikolas requested a review from jedcunningham May 2, 2023 19:52
@pierrejeambrun
Copy link
Member

My two cents:

We have no policy for optimisation PR but we had issues lately with some of them breaking things unexpectedly, I am definitely in favour of what has been said -> optimisation PR goes into minor release from now on.

@o-nikolas if you experience many annoying conflicts while maintaining this open, this is exactly what will happen for the RM. So we can either do what Jarek suggest with PARKING it, but then the burden is on you and that’s not fun, OR we can merge this knowing that it “might” cause painful cherry picking, and if it does release 2.7.0 earlier. (That would go in favour of release more often and ease the pain). Is there a specific reason that we only do a few minor release during the year ? Can we just make one on a “necessary” basis ?

@potiuk
Copy link
Member

potiuk commented Jun 13, 2023

All for "Release more often" :).

@o-nikolas
Copy link
Contributor Author

Hey @pierrejeambrun, thanks for weighing in! Most of the items have been discussed above, but here are some quick replies:

We have no policy for optimisation PR but we had issues lately with some of them breaking things unexpectedly, I am definitely in favour of what has been said -> optimisation PR goes into minor release from now on.

Yupp, I never from the beginning was trying to get this merged into a patch release. I think that was maybe assumed at some point but it was never my intention.
BTW, can you link an Issue/Slack thread for the optimization issue you're referencing? I'm curious!

@o-nikolas if you experience many annoying conflicts while maintaining this open, this is exactly what will happen for the RM. So we can either do what Jarek suggest with PARKING it, but then the burden is on you and that’s not fun, OR we can merge this knowing that it “might” cause painful cherry picking, and if it does release 2.7.0 earlier. (That would go in favour of release more often and ease the pain).

Yeah, as I said above, if I'm resolving conflicts anyway I'm more than happy to resolve any conflicts that RM encounter. In fact I'm more than happy to start doing releases altogether 😄

Is there a specific reason that we only do a few minor release during the year ? Can we just make one on a “necessary” basis ?

+1 to release early and release often! (as long as we're sure they're stable releases for our users of course).

@o-nikolas
Copy link
Contributor Author

LGTM. I think we are close enough to merge it. Can you please rebase and check if it still succeeds.

Did the rebase yesterday and resolved the conflicts. Looks like the build is still green 👍

@potiuk
Copy link
Member

potiuk commented Jun 13, 2023

Yeah, as I said above, if I'm resolving conflicts anyway I'm more than happy to resolve any conflicts that RM encounter. In fact I'm more than happy to start doing releases altogether smile

+100

@o-nikolas
Copy link
Contributor Author

Re: #29055 (comment)

Are folks here okay if we merge this now that 2.6.2 has been cut?

CC: @potiuk @pierrejeambrun @jedcunningham

@potiuk
Copy link
Member

potiuk commented Jun 27, 2023

Yes.

@o-nikolas
Copy link
Contributor Author

Okay, I'll take silence from the other folks as approvals 😆

Looks like there's another conflict though, and I'm just about to head out on a vacation for a week. I'll rebase and resolve the conflict once I'm back and then merge.

Copy link
Member

@hussein-awala hussein-awala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

I have always assumed that patch releases primarily focus on bugfixes to enhance reliability and address issues. Consequently, non-fix changes that have the potential to cause problems should be deferred and not included in patch releases.

I respectfully disagree. IMO, internal enhancements should be welcomed in patch versions as well.

However, it is worth noting that until version 2.5.3, the executors were considered part of the public API. In version 2.6.0, we clarified that only the executor interface is public, while instances are not (#29200). For that, I think we should merge this PR when we cut 2.7.0 to avoid introducing a breaking change for some users.

@potiuk
Copy link
Member

potiuk commented Jul 5, 2023

I respectfully disagree. IMO, internal enhancements should be welcomed in patch versions as well.

Nice discussion :). I think I agree with both statements. I think the key is "Potential to cause problems". There is a class of improvements that should be welcome. Automated refactorings, build improvements, some speed optimizations that are easily verifiable and reviewable. Yes.

But at the moment when you have one person scratching their head telling "You know, I am not sure if we thought of all consequences, I think this optimisation is a bit risky" - that's a clear sign of non-patchlevel release IMHO. This is really the matter of trust and communication we build with our users. If we want to convince our users to follow the SemVer approach - they do not expect there will be regressions in patchlevel, so we should make sure, to avoid them, and certainly there should be no regressions that are coming from somethig that is not a "fix" to an issue.

And yes, it's blurry and mor quantum stuff where we are talking about probability of something happening not 0/1 decisions. Generally speaking if a change is very unlikely to cause a problem, it's fine to consider it.

However, it is worth noting that until version 2.5.3, the executors were considered part of the public API. In version 2.6.0, we clarified that only the executor interface is public, while instances are not (#29200). For that, I think we should merge this PR when we cut 2.7.0 to avoid introducing a breaking change for some users.

Not sure if I understand? do you expect it to wait until AFTER we cut 2.7.0? I am not sure we need it. The clarification is not really tied to specific version. It was merely documenting what we did not have documented reallly but we implicitly knew it. We already "broke" a number of those in the past - for example DB structure changes which we never before explicitly stated as "internal detail" before. I think we never really said "you can extend Kubernetes Executor and it will remain as it is", we said "you can write your own executor" - those are different things. I think in case of the "Public Interface" docs we just clarified the intentions we had when it comes about compatibillty, but we never changed the intentions, they were just poorly (or rather not) documented.

@potiuk
Copy link
Member

potiuk commented Jul 11, 2023

Hey @o-nikolas -> I think it's high time to merge that one. We are close-enough to 2.7.0 and 2.6.3 has just been released - we at most expect one more bugfix release 2.6.4 - but that one will be rather small and localized (and I will be running it so I am happy to handle the risks of the problems with cherry-picking)

I want to move the Kubernetes executor and related code to the provider as a follow up to that one, but I will wait with it untol this one is merged and until Celery one is merged: #32526

@o-nikolas
Copy link
Contributor Author

Hey @o-nikolas -> I think it's high time to merge that one. We are close-enough to 2.7.0 and 2.6.3 has just been released - we at most expect one more bugfix release 2.6.4 - but that one will be rather small and localized (and I will be running it so I am happy to handle the risks of the problems with cherry-picking)

I want to move the Kubernetes executor and related code to the provider as a follow up to that one, but I will wait with it untol this one is merged and until Celery one is merged: #32526

@potiuk Yupp, agreed! I was on holiday last week and just back to the keyboard now. I'll try get to this rebase/conflict resolution today or tomorrow 😃

I'm excited to see the executors moving as well. I'll try to review the Celery PR

@potiuk
Copy link
Member

potiuk commented Jul 11, 2023

@potiuk Yupp, agreed! I was on holiday last week and just back to the keyboard now. I'll try get to this rebase/conflict resolution today or tomorrow 😃

Seems like perfect timing !

I'm excited to see the executors moving as well. I'll try to review the Celery PR

Absolutely! :).

Import optimizations to decrease the amount of time it takes to
load/import the kubernetes executor.
Move some expensive typing related imports to be under TYPE_CHECKING.

Refactor expensive classes (other than the KubernetesExecutor) out of the
kubernetes_executor.py module into a utils modules so they are only
loaded at runtime.

Also move some imports to runtime imports, closer to their usage.
@o-nikolas o-nikolas force-pushed the onikolas/kubernetes_executor_load_time_optimizations branch from 8153230 to 33b4cd1 Compare July 11, 2023 18:27
@eladkal
Copy link
Contributor

eladkal commented Jul 11, 2023

We can also decide to extract the executor as is. Release it with provider wave then apply this optimization patch directly to the provider code.

I think it minimize the challanges and will be handled like any provider code change (users will also have a fallback version if we decide to yank for some reason)

@potiuk
Copy link
Member

potiuk commented Jul 11, 2023

We can also decide to extract the executor as is. Release it with provider wave then apply this optimization patch directly to the provider code.

It will be easier the other way round. I have done 0 changes to it yet, and @o-nikolas already rebased his changes like 10 times so it would not be fair to do it the other way round and add more work for @o-nikolas when I have not started yet.

@potiuk
Copy link
Member

potiuk commented Jul 11, 2023

I think it minimize the challanges and will be handled like any provider code change (users will also have a fallback version if we decide to yank for some reason)

Nope. This code will be gone from Airflow. So there will be no going back.

@eladkal
Copy link
Contributor

eladkal commented Jul 11, 2023

It will be easier the other way round.

Great!

scheduler_job_id: str,
kube_config: Any,
) -> str | None:
self.log.info("Event: and now my watch begins starting at resource_version: %s", resource_version)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i love this reference :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:CLI area:Scheduler including HA (high availability) scheduler provider:cncf-kubernetes Kubernetes provider related issues type:improvement Changelog: Improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.