Kubernetes Executor Load Time Optimizations #30727

o-nikolas · 2023-04-19T01:07:25Z

Overview

This PR aims to improve the time it takes to load/import the kubernetes executor module.

Motivation

Executors are imported in more places now that various compatibility checks are in core Airflow code (re: AIP-51). Also, decreasing import times is very important for the work of executors vending CLI commands (see #29055), since the CLI code in Airflow is particularly sensitive to slow imports (because all code is loaded fresh each time you run an individual Airflow CLI command).

The changes

Move some expensive typing related imports to be under TYPE_CHECKING.
Refactor expensive classes (other than the KubernetesExecutor) out of the kubernetes_executor.py module into a utils module so they are only loaded at runtime. Classes moved include KubernetesJobWatcher, AirflowKubernetesScheduler, ResourceVersion
Also move some imports closer to their usage.

Testing

I benchmarked these changes by writing a script to import the executor module in a fresh python runtime and timing how long that takes (you can test this yourself quickly from a bash shell by doing something like time python -c 'from airflow.executors.local_executor import KubernetesExecutor'). Then doing that in a loop for several samples (with some randomness in the order for fairness) both on main and on my development branch.

Results

^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

o-nikolas · 2023-04-24T19:29:48Z

@potiuk @uranusjr This one is now green and ready for review. More executor load time optimizations, similar to what you reviewed in #30361, but this one is targeted towards the kubernetes executor

airflow/executors/kubernetes_executor_types.py

o-nikolas · 2023-04-27T21:25:39Z

Just resolved the latest conflicts and got the build green again. I'd like to get this one merged soon since due to the refactoring there are conflicts everytime someone edits the kubernetes executor in main.

@potiuk and @uranusjr if you have time please take a look, thanks!

potiuk · 2023-04-29T15:52:28Z

Not an easy one to review, even if it is just a refactor :(

potiuk · 2023-04-29T15:59:12Z

Just a thought. @o-nikolas . MAYBE there is a way to split that one into two (at least) prs -> one extracting/renaming stuff/methods, maybe extracting the common k8S executor types, and another moving stuff arround between files and splitting it?

This way the "move" will be easier to review as it will be basically same code here and there, just moved around.

I think this is what makes it difficult to review that it is all-in-one. Also it might be in the future easier to find any problems if you do it this way.

This is the technique i used in the "base_job" refactoring case and I think it makes a lot of sense ? Would it be possible that you give it a try ?

I am not trying to make more work for you BTW, Just thinking about the case when we will find some bugs in the future and will want to track the changes for it.

o-nikolas · 2023-04-29T16:14:17Z

Hey @potiuk, thanks for the review! 😃

Just a thought. @o-nikolas . MAYBE there is a way to split that one into two (at least) prs -> one extracting/renaming code/methods, maybe extracting the common k8S executor types, and another moving stuff arround between files and splitting it?

The extracted method and modified code that is unrelated to the splitting is a very small amount of code. Most changes are related to the moved code. So I decided to leave it all in one.

This is the technique i used in the "base_job" refactoring case and I think it makes a lot of sense ? Would it be possible that you give it a try ?

Yupp, I agree with this philosophy, the overall code I'm delivering is already broken up. By the end it will be about 3 or 4 PRs, with one already merged, this one, and some more to come after. The process becomes very grueling if I separate them all even more, I will end up with 6-8 PRs. Which is especially frustrating when reviews are hard to come by and I must constanty manage the conflicts from main.

However, if I have still not convinced you I can try separate out what I can on Monday. Let me know what you think. Thanks again for the review I really do appreciate it 🙏

potiuk · 2023-04-29T16:25:12Z

Which is especially frustrating when reviews are hard to come by and I must constanty manage the conflicts from main.

I know - been there, done that, but let's say now since I asked for it I might more self-push myself to do it quickly.

o-nikolas · 2023-05-01T04:56:27Z

In this particular case, I really do feel strongly that it is unwarranted to split this particular PR further. There have been many PRs before of refactored code which are pages long. This change set is really just moving code from kubernetes_executor.py to two other modules (without changing any of the functionality). Some moved imports (closer to use) and a single helper function are the only modification to kubernetes_executor.py and they are easy to tell apart because they are the green/additions in the diff for that module (all the other big red/deletions are the aforementioned migration of code).

@uranusjr gave it a 👍 (other than a small nit on module name which I'm waiting to hear back on), hopefully some others can have a look. Perhaps @jedcunningham, since he's worked on the kubernetes executor lately and will be familiar with the code.

airflow/executors/kubernetes_executor.py

jedcunningham · 2023-05-01T20:47:31Z

airflow/executors/kubernetes_executor_utils.py

+        return cls._instance
+
+
+class KubernetesJobWatcher(multiprocessing.Process, LoggingMixin):


What do folks think about moving the core executor into kubernetes_executor/__init__.py, and moving the stuff we are pulling out into files under that new folder? I'm not sure I love the kubernetes_executor_x.py pattern we are establishing here.

That to me felt a bit invasive, especially since executors are public-ish (people modify them with plugins or base new executors off of them).

We plan to move the executors to their own provider packages very soon so module names/locations will all be shuffled then. Can we agree to defer this piece of feedback until that time? I've already gotten feedback that this PR is too large, so if I could keep the two decoupled that'd be awesome!

I'm not sure that new approach would have any more impact than the current one, on the public-ish-ness front. If folks are literally patching the executor, sure, but 🤷‍♂️.

My concern is how soon "very soon" is in reality. I'd hate to land 2.7 with this pattern in it. I'd be happier to say we do the refactoring in a follow up than tie it to the provider move.

I'm not sure that new approach would have any more impact than the current one, on the public-ish-ness front.

That's my bad, I should have been more clear. Let me take another hack at it: what I meant by that was let's not break things twice for users by changing the import path of the executor now and then changing it again when it's moved to it's own provider package. Not that it will be any different, just package the changes to paths all at once.

I'd be happier to say we do the refactoring in a follow up than tie it to the provider move.

If you still disagree with the above then that's fair (though I'm curious to hear!) and I'd agree to cutting a ticket to move it after this PR 👍

what I meant by that was let's not break things twice for users by changing the import path of the executor now

That's the beauty of it though, users would still:

from airflow.executors.kubernetes_executor import KubernetesExecutor

So the stuff that stays in the "main" file just moves on disk, how folks would import it wouldn't change at all. (We can ignore the stuff being moved like KubernetesJobWatcher, those are changing either way)

Am I overlooking something here (very possible)?

Am I overlooking something here (very possible)?

Ah nope, I see what you mean now! Yeah, I think that would cover most interactions that users have with patching and base classing the executor. I'm happy to make those changes, but I'd request that it'd be in a follow-up PR.

I'm not sure I love the kubernetes_executor_x.py pattern

But I will note that I'd still name the module something quite similar to this, even in it's own dir. Since generic module names like utils.py can be difficult to work with due to the shared namespace.

That to me felt a bit invasive, especially since executors are public-ish (people modify them with plugins or base new executors off of them).

I think executor user-facing behavior is public but executor code is not public.

I think this was the consensus we arrived at, more or less, when @potiuk brought it up on the list, though I don't recall if it ever made it into "policy" via a PR.

So if you subclass an exectuor, you do so very much at your own risk. And therefore we can change path of executor or helpers without worrying about backcompat, and refactor / move executor code at will as long as it doesn't break user-facing behavior (e.g. some radical breaking change w.r.t. user-supplied executor_config).

airflow/executors/kubernetes_executor_utils.py

pierrejeambrun · 2023-06-13T17:04:56Z

My two cents:

We have no policy for optimisation PR but we had issues lately with some of them breaking things unexpectedly, I am definitely in favour of what has been said -> optimisation PR goes into minor release from now on.

@o-nikolas if you experience many annoying conflicts while maintaining this open, this is exactly what will happen for the RM. So we can either do what Jarek suggest with PARKING it, but then the burden is on you and that’s not fun, OR we can merge this knowing that it “might” cause painful cherry picking, and if it does release 2.7.0 earlier. (That would go in favour of release more often and ease the pain). Is there a specific reason that we only do a few minor release during the year ? Can we just make one on a “necessary” basis ?

potiuk · 2023-06-13T17:18:10Z

All for "Release more often" :).

o-nikolas · 2023-06-13T19:10:43Z

Hey @pierrejeambrun, thanks for weighing in! Most of the items have been discussed above, but here are some quick replies:

We have no policy for optimisation PR but we had issues lately with some of them breaking things unexpectedly, I am definitely in favour of what has been said -> optimisation PR goes into minor release from now on.

Yupp, I never from the beginning was trying to get this merged into a patch release. I think that was maybe assumed at some point but it was never my intention.
BTW, can you link an Issue/Slack thread for the optimization issue you're referencing? I'm curious!

@o-nikolas if you experience many annoying conflicts while maintaining this open, this is exactly what will happen for the RM. So we can either do what Jarek suggest with PARKING it, but then the burden is on you and that’s not fun, OR we can merge this knowing that it “might” cause painful cherry picking, and if it does release 2.7.0 earlier. (That would go in favour of release more often and ease the pain).

Yeah, as I said above, if I'm resolving conflicts anyway I'm more than happy to resolve any conflicts that RM encounter. In fact I'm more than happy to start doing releases altogether 😄

Is there a specific reason that we only do a few minor release during the year ? Can we just make one on a “necessary” basis ?

+1 to release early and release often! (as long as we're sure they're stable releases for our users of course).

o-nikolas · 2023-06-13T19:11:23Z

LGTM. I think we are close enough to merge it. Can you please rebase and check if it still succeeds.

Did the rebase yesterday and resolved the conflicts. Looks like the build is still green 👍

potiuk · 2023-06-13T19:12:01Z

Yeah, as I said above, if I'm resolving conflicts anyway I'm more than happy to resolve any conflicts that RM encounter. In fact I'm more than happy to start doing releases altogether smile

+100

o-nikolas · 2023-06-26T22:51:11Z

Re: #29055 (comment)

Are folks here okay if we merge this now that 2.6.2 has been cut?

CC: @potiuk @pierrejeambrun @jedcunningham

potiuk · 2023-06-27T05:48:28Z

Yes.

o-nikolas · 2023-06-29T16:21:48Z

Okay, I'll take silence from the other folks as approvals 😆

Looks like there's another conflict though, and I'm just about to head out on a vacation for a week. I'll rebase and resolve the conflict once I'm back and then merge.

hussein-awala

Nice work!

I have always assumed that patch releases primarily focus on bugfixes to enhance reliability and address issues. Consequently, non-fix changes that have the potential to cause problems should be deferred and not included in patch releases.

I respectfully disagree. IMO, internal enhancements should be welcomed in patch versions as well.

However, it is worth noting that until version 2.5.3, the executors were considered part of the public API. In version 2.6.0, we clarified that only the executor interface is public, while instances are not (#29200). For that, I think we should merge this PR when we cut 2.7.0 to avoid introducing a breaking change for some users.

potiuk · 2023-07-05T18:45:42Z

I respectfully disagree. IMO, internal enhancements should be welcomed in patch versions as well.

Nice discussion :). I think I agree with both statements. I think the key is "Potential to cause problems". There is a class of improvements that should be welcome. Automated refactorings, build improvements, some speed optimizations that are easily verifiable and reviewable. Yes.

But at the moment when you have one person scratching their head telling "You know, I am not sure if we thought of all consequences, I think this optimisation is a bit risky" - that's a clear sign of non-patchlevel release IMHO. This is really the matter of trust and communication we build with our users. If we want to convince our users to follow the SemVer approach - they do not expect there will be regressions in patchlevel, so we should make sure, to avoid them, and certainly there should be no regressions that are coming from somethig that is not a "fix" to an issue.

And yes, it's blurry and mor quantum stuff where we are talking about probability of something happening not 0/1 decisions. Generally speaking if a change is very unlikely to cause a problem, it's fine to consider it.

However, it is worth noting that until version 2.5.3, the executors were considered part of the public API. In version 2.6.0, we clarified that only the executor interface is public, while instances are not (#29200). For that, I think we should merge this PR when we cut 2.7.0 to avoid introducing a breaking change for some users.

Not sure if I understand? do you expect it to wait until AFTER we cut 2.7.0? I am not sure we need it. The clarification is not really tied to specific version. It was merely documenting what we did not have documented reallly but we implicitly knew it. We already "broke" a number of those in the past - for example DB structure changes which we never before explicitly stated as "internal detail" before. I think we never really said "you can extend Kubernetes Executor and it will remain as it is", we said "you can write your own executor" - those are different things. I think in case of the "Public Interface" docs we just clarified the intentions we had when it comes about compatibillty, but we never changed the intentions, they were just poorly (or rather not) documented.

potiuk · 2023-07-11T16:55:34Z

Hey @o-nikolas -> I think it's high time to merge that one. We are close-enough to 2.7.0 and 2.6.3 has just been released - we at most expect one more bugfix release 2.6.4 - but that one will be rather small and localized (and I will be running it so I am happy to handle the risks of the problems with cherry-picking)

I want to move the Kubernetes executor and related code to the provider as a follow up to that one, but I will wait with it untol this one is merged and until Celery one is merged: #32526

o-nikolas · 2023-07-11T17:32:43Z

Hey @o-nikolas -> I think it's high time to merge that one. We are close-enough to 2.7.0 and 2.6.3 has just been released - we at most expect one more bugfix release 2.6.4 - but that one will be rather small and localized (and I will be running it so I am happy to handle the risks of the problems with cherry-picking)

I want to move the Kubernetes executor and related code to the provider as a follow up to that one, but I will wait with it untol this one is merged and until Celery one is merged: #32526

@potiuk Yupp, agreed! I was on holiday last week and just back to the keyboard now. I'll try get to this rebase/conflict resolution today or tomorrow 😃

I'm excited to see the executors moving as well. I'll try to review the Celery PR

potiuk · 2023-07-11T17:34:04Z

@potiuk Yupp, agreed! I was on holiday last week and just back to the keyboard now. I'll try get to this rebase/conflict resolution today or tomorrow 😃

Seems like perfect timing !

I'm excited to see the executors moving as well. I'll try to review the Celery PR

Absolutely! :).

Import optimizations to decrease the amount of time it takes to load/import the kubernetes executor. Move some expensive typing related imports to be under TYPE_CHECKING. Refactor expensive classes (other than the KubernetesExecutor) out of the kubernetes_executor.py module into a utils modules so they are only loaded at runtime. Also move some imports to runtime imports, closer to their usage.

eladkal · 2023-07-11T18:43:42Z

We can also decide to extract the executor as is. Release it with provider wave then apply this optimization patch directly to the provider code.

I think it minimize the challanges and will be handled like any provider code change (users will also have a fallback version if we decide to yank for some reason)

potiuk · 2023-07-11T18:53:03Z

We can also decide to extract the executor as is. Release it with provider wave then apply this optimization patch directly to the provider code.

It will be easier the other way round. I have done 0 changes to it yet, and @o-nikolas already rebased his changes like 10 times so it would not be fair to do it the other way round and add more work for @o-nikolas when I have not started yet.

potiuk · 2023-07-11T18:55:25Z

I think it minimize the challanges and will be handled like any provider code change (users will also have a fallback version if we decide to yank for some reason)

Nope. This code will be gone from Airflow. So there will be no going back.

eladkal · 2023-07-11T19:06:42Z

It will be easier the other way round.

Great!

hongshaoyang · 2024-11-27T03:12:47Z

airflow/executors/kubernetes_executor_utils.py

+        scheduler_job_id: str,
+        kube_config: Any,
+    ) -> str | None:
+        self.log.info("Event: and now my watch begins starting at resource_version: %s", resource_version)


i love this reference :)

o-nikolas requested review from dstandish, jedcunningham, kaxil, XD-DENG and ashb as code owners April 19, 2023 01:07

boring-cyborg bot added area:CLI provider:cncf-kubernetes Kubernetes provider related issues area:Scheduler including HA (high availability) scheduler labels Apr 19, 2023

o-nikolas marked this pull request as draft April 19, 2023 01:07

o-nikolas force-pushed the onikolas/kubernetes_executor_load_time_optimizations branch 5 times, most recently from 3b41453 to 8c31199 Compare April 24, 2023 18:37

o-nikolas marked this pull request as ready for review April 24, 2023 19:26

uranusjr reviewed Apr 24, 2023

View reviewed changes

airflow/executors/kubernetes_executor_types.py Show resolved Hide resolved

o-nikolas force-pushed the onikolas/kubernetes_executor_load_time_optimizations branch 4 times, most recently from 91c7c9c to 02183fa Compare April 27, 2023 20:17

jedcunningham reviewed May 1, 2023

View reviewed changes

o-nikolas requested a review from jedcunningham May 2, 2023 19:52

jedcunningham approved these changes May 2, 2023

View reviewed changes

airflow/executors/kubernetes_executor_utils.py Outdated Show resolved Hide resolved

o-nikolas mentioned this pull request Jun 21, 2023

[AIP-51] Executors vending CLI commands #29055

Merged

hussein-awala approved these changes Jul 5, 2023

View reviewed changes

o-nikolas added 4 commits July 11, 2023 11:00

Move imports out of try blocks

147e90b

Remove superfluous local import

fd55406

static checks fixes

33b4cd1

o-nikolas force-pushed the onikolas/kubernetes_executor_load_time_optimizations branch from 8153230 to 33b4cd1 Compare July 11, 2023 18:27

jedcunningham approved these changes Jul 11, 2023

View reviewed changes

jedcunningham merged commit f7d4d98 into apache:main Jul 11, 2023

hussein-awala mentioned this pull request Jul 15, 2023

Schedulers are issuing abrupt pod deletes when there is a delay in schedulers' heartbeat #32249

Closed

ephraimbuddy added the type:improvement Changelog: Improvements label Aug 2, 2023

ephraimbuddy added this to the Airflow 2.7.0 milestone Aug 2, 2023

o-nikolas mentioned this pull request Aug 21, 2023

Make auth managers provide their own airflow CLI commands #33481

Merged

hongshaoyang reviewed Nov 27, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubernetes Executor Load Time Optimizations #30727

Kubernetes Executor Load Time Optimizations #30727

o-nikolas commented Apr 19, 2023 •

edited

Loading

o-nikolas commented Apr 24, 2023

o-nikolas commented Apr 27, 2023

potiuk commented Apr 29, 2023

potiuk commented Apr 29, 2023 •

edited

Loading

o-nikolas commented Apr 29, 2023

potiuk commented Apr 29, 2023

o-nikolas commented May 1, 2023

jedcunningham May 1, 2023

o-nikolas May 1, 2023

jedcunningham May 1, 2023

o-nikolas May 1, 2023 •

edited

Loading

jedcunningham May 1, 2023

o-nikolas May 1, 2023

o-nikolas May 2, 2023

dstandish May 4, 2023

pierrejeambrun commented Jun 13, 2023

potiuk commented Jun 13, 2023

o-nikolas commented Jun 13, 2023

o-nikolas commented Jun 13, 2023

potiuk commented Jun 13, 2023

o-nikolas commented Jun 26, 2023

potiuk commented Jun 27, 2023

o-nikolas commented Jun 29, 2023

hussein-awala left a comment

potiuk commented Jul 5, 2023

potiuk commented Jul 11, 2023 •

edited

Loading

o-nikolas commented Jul 11, 2023

potiuk commented Jul 11, 2023

eladkal commented Jul 11, 2023 •

edited

Loading

potiuk commented Jul 11, 2023

potiuk commented Jul 11, 2023

eladkal commented Jul 11, 2023

hongshaoyang Nov 27, 2024

		return cls._instance


		class KubernetesJobWatcher(multiprocessing.Process, LoggingMixin):

Kubernetes Executor Load Time Optimizations #30727

Kubernetes Executor Load Time Optimizations #30727

Conversation

o-nikolas commented Apr 19, 2023 • edited Loading

Overview

Motivation

The changes

Testing

Results

o-nikolas commented Apr 24, 2023

o-nikolas commented Apr 27, 2023

potiuk commented Apr 29, 2023

potiuk commented Apr 29, 2023 • edited Loading

o-nikolas commented Apr 29, 2023

potiuk commented Apr 29, 2023

o-nikolas commented May 1, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

o-nikolas May 1, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pierrejeambrun commented Jun 13, 2023

potiuk commented Jun 13, 2023

o-nikolas commented Jun 13, 2023

o-nikolas commented Jun 13, 2023

potiuk commented Jun 13, 2023

o-nikolas commented Jun 26, 2023

potiuk commented Jun 27, 2023

o-nikolas commented Jun 29, 2023

hussein-awala left a comment

Choose a reason for hiding this comment

potiuk commented Jul 5, 2023

potiuk commented Jul 11, 2023 • edited Loading

o-nikolas commented Jul 11, 2023

potiuk commented Jul 11, 2023

eladkal commented Jul 11, 2023 • edited Loading

potiuk commented Jul 11, 2023

potiuk commented Jul 11, 2023

eladkal commented Jul 11, 2023

Choose a reason for hiding this comment

o-nikolas commented Apr 19, 2023 •

edited

Loading

potiuk commented Apr 29, 2023 •

edited

Loading

o-nikolas May 1, 2023 •

edited

Loading

potiuk commented Jul 11, 2023 •

edited

Loading

eladkal commented Jul 11, 2023 •

edited

Loading