-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubernetes Executor Load Time Optimizations #30727
Kubernetes Executor Load Time Optimizations #30727
Conversation
3b41453
to
8c31199
Compare
91c7c9c
to
02183fa
Compare
Not an easy one to review, even if it is just a refactor :( |
Just a thought. @o-nikolas . MAYBE there is a way to split that one into two (at least) prs -> one extracting/renaming stuff/methods, maybe extracting the common k8S executor types, and another moving stuff arround between files and splitting it? This way the "move" will be easier to review as it will be basically same code here and there, just moved around. I think this is what makes it difficult to review that it is all-in-one. Also it might be in the future easier to find any problems if you do it this way. This is the technique i used in the "base_job" refactoring case and I think it makes a lot of sense ? Would it be possible that you give it a try ? I am not trying to make more work for you BTW, Just thinking about the case when we will find some bugs in the future and will want to track the changes for it. |
Hey @potiuk, thanks for the review! 😃
The extracted method and modified code that is unrelated to the splitting is a very small amount of code. Most changes are related to the moved code. So I decided to leave it all in one.
Yupp, I agree with this philosophy, the overall code I'm delivering is already broken up. By the end it will be about 3 or 4 PRs, with one already merged, this one, and some more to come after. The process becomes very grueling if I separate them all even more, I will end up with 6-8 PRs. Which is especially frustrating when reviews are hard to come by and I must constanty manage the conflicts from main. However, if I have still not convinced you I can try separate out what I can on Monday. Let me know what you think. Thanks again for the review I really do appreciate it 🙏 |
I know - been there, done that, but let's say now since I asked for it I might more self-push myself to do it quickly. |
In this particular case, I really do feel strongly that it is unwarranted to split this particular PR further. There have been many PRs before of refactored code which are pages long. This change set is really just moving code from @uranusjr gave it a 👍 (other than a small nit on module name which I'm waiting to hear back on), hopefully some others can have a look. Perhaps @jedcunningham, since he's worked on the kubernetes executor lately and will be familiar with the code. |
return cls._instance | ||
|
||
|
||
class KubernetesJobWatcher(multiprocessing.Process, LoggingMixin): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do folks think about moving the core executor into kubernetes_executor/__init__.py
, and moving the stuff we are pulling out into files under that new folder? I'm not sure I love the kubernetes_executor_x.py
pattern we are establishing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That to me felt a bit invasive, especially since executors are public-ish (people modify them with plugins or base new executors off of them).
We plan to move the executors to their own provider packages very soon so module names/locations will all be shuffled then. Can we agree to defer this piece of feedback until that time? I've already gotten feedback that this PR is too large, so if I could keep the two decoupled that'd be awesome!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that new approach would have any more impact than the current one, on the public-ish-ness front. If folks are literally patching the executor, sure, but 🤷♂️.
My concern is how soon "very soon" is in reality. I'd hate to land 2.7 with this pattern in it. I'd be happier to say we do the refactoring in a follow up than tie it to the provider move.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that new approach would have any more impact than the current one, on the public-ish-ness front.
That's my bad, I should have been more clear. Let me take another hack at it: what I meant by that was let's not break things twice for users by changing the import path of the executor now and then changing it again when it's moved to it's own provider package. Not that it will be any different, just package the changes to paths all at once.
I'd be happier to say we do the refactoring in a follow up than tie it to the provider move.
If you still disagree with the above then that's fair (though I'm curious to hear!) and I'd agree to cutting a ticket to move it after this PR 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what I meant by that was let's not break things twice for users by changing the import path of the executor now
That's the beauty of it though, users would still:
from airflow.executors.kubernetes_executor import KubernetesExecutor
So the stuff that stays in the "main" file just moves on disk, how folks would import it wouldn't change at all. (We can ignore the stuff being moved like KubernetesJobWatcher, those are changing either way)
Am I overlooking something here (very possible)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I overlooking something here (very possible)?
Ah nope, I see what you mean now! Yeah, I think that would cover most interactions that users have with patching and base classing the executor. I'm happy to make those changes, but I'd request that it'd be in a follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I love the kubernetes_executor_x.py pattern
But I will note that I'd still name the module something quite similar to this, even in it's own dir. Since generic module names like utils.py
can be difficult to work with due to the shared namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That to me felt a bit invasive, especially since executors are public-ish (people modify them with plugins or base new executors off of them).
I think executor user-facing behavior is public but executor code is not public.
I think this was the consensus we arrived at, more or less, when @potiuk brought it up on the list, though I don't recall if it ever made it into "policy" via a PR.
So if you subclass an exectuor, you do so very much at your own risk. And therefore we can change path of executor or helpers without worrying about backcompat, and refactor / move executor code at will as long as it doesn't break user-facing behavior (e.g. some radical breaking change w.r.t. user-supplied executor_config).
My two cents: We have no policy for optimisation PR but we had issues lately with some of them breaking things unexpectedly, I am definitely in favour of what has been said -> optimisation PR goes into minor release from now on. @o-nikolas if you experience many annoying conflicts while maintaining this open, this is exactly what will happen for the RM. So we can either do what Jarek suggest with PARKING it, but then the burden is on you and that’s not fun, OR we can merge this knowing that it “might” cause painful cherry picking, and if it does release 2.7.0 earlier. (That would go in favour of release more often and ease the pain). Is there a specific reason that we only do a few minor release during the year ? Can we just make one on a “necessary” basis ? |
All for "Release more often" :). |
Hey @pierrejeambrun, thanks for weighing in! Most of the items have been discussed above, but here are some quick replies:
Yupp, I never from the beginning was trying to get this merged into a patch release. I think that was maybe assumed at some point but it was never my intention.
Yeah, as I said above, if I'm resolving conflicts anyway I'm more than happy to resolve any conflicts that RM encounter. In fact I'm more than happy to start doing releases altogether 😄
+1 to release early and release often! (as long as we're sure they're stable releases for our users of course). |
Did the rebase yesterday and resolved the conflicts. Looks like the build is still green 👍 |
+100 |
Re: #29055 (comment) Are folks here okay if we merge this now that 2.6.2 has been cut? |
Yes. |
Okay, I'll take silence from the other folks as approvals 😆 Looks like there's another conflict though, and I'm just about to head out on a vacation for a week. I'll rebase and resolve the conflict once I'm back and then merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
I have always assumed that patch releases primarily focus on bugfixes to enhance reliability and address issues. Consequently, non-fix changes that have the potential to cause problems should be deferred and not included in patch releases.
I respectfully disagree. IMO, internal enhancements should be welcomed in patch versions as well.
However, it is worth noting that until version 2.5.3, the executors were considered part of the public API. In version 2.6.0, we clarified that only the executor interface is public, while instances are not (#29200). For that, I think we should merge this PR when we cut 2.7.0 to avoid introducing a breaking change for some users.
Nice discussion :). I think I agree with both statements. I think the key is "Potential to cause problems". There is a class of improvements that should be welcome. Automated refactorings, build improvements, some speed optimizations that are easily verifiable and reviewable. Yes. But at the moment when you have one person scratching their head telling "You know, I am not sure if we thought of all consequences, I think this optimisation is a bit risky" - that's a clear sign of non-patchlevel release IMHO. This is really the matter of trust and communication we build with our users. If we want to convince our users to follow the SemVer approach - they do not expect there will be regressions in patchlevel, so we should make sure, to avoid them, and certainly there should be no regressions that are coming from somethig that is not a "fix" to an issue. And yes, it's blurry and mor quantum stuff where we are talking about probability of something happening not 0/1 decisions. Generally speaking if a change is very unlikely to cause a problem, it's fine to consider it.
Not sure if I understand? do you expect it to wait until AFTER we cut 2.7.0? I am not sure we need it. The clarification is not really tied to specific version. It was merely documenting what we did not have documented reallly but we implicitly knew it. We already "broke" a number of those in the past - for example DB structure changes which we never before explicitly stated as "internal detail" before. I think we never really said "you can extend Kubernetes Executor and it will remain as it is", we said "you can write your own executor" - those are different things. I think in case of the "Public Interface" docs we just clarified the intentions we had when it comes about compatibillty, but we never changed the intentions, they were just poorly (or rather not) documented. |
Hey @o-nikolas -> I think it's high time to merge that one. We are close-enough to 2.7.0 and 2.6.3 has just been released - we at most expect one more bugfix release 2.6.4 - but that one will be rather small and localized (and I will be running it so I am happy to handle the risks of the problems with cherry-picking) I want to move the Kubernetes executor and related code to the provider as a follow up to that one, but I will wait with it untol this one is merged and until Celery one is merged: #32526 |
@potiuk Yupp, agreed! I was on holiday last week and just back to the keyboard now. I'll try get to this rebase/conflict resolution today or tomorrow 😃 I'm excited to see the executors moving as well. I'll try to review the Celery PR |
Seems like perfect timing !
Absolutely! :). |
Import optimizations to decrease the amount of time it takes to load/import the kubernetes executor. Move some expensive typing related imports to be under TYPE_CHECKING. Refactor expensive classes (other than the KubernetesExecutor) out of the kubernetes_executor.py module into a utils modules so they are only loaded at runtime. Also move some imports to runtime imports, closer to their usage.
8153230
to
33b4cd1
Compare
We can also decide to extract the executor as is. Release it with provider wave then apply this optimization patch directly to the provider code. I think it minimize the challanges and will be handled like any provider code change (users will also have a fallback version if we decide to yank for some reason) |
It will be easier the other way round. I have done |
Nope. This code will be gone from Airflow. So there will be no going back. |
Great! |
scheduler_job_id: str, | ||
kube_config: Any, | ||
) -> str | None: | ||
self.log.info("Event: and now my watch begins starting at resource_version: %s", resource_version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i love this reference :)
Overview
This PR aims to improve the time it takes to load/import the kubernetes executor module.
Motivation
Executors are imported in more places now that various compatibility checks are in core Airflow code (re: AIP-51). Also, decreasing import times is very important for the work of executors vending CLI commands (see #29055), since the CLI code in Airflow is particularly sensitive to slow imports (because all code is loaded fresh each time you run an individual Airflow CLI command).
The changes
KubernetesExecutor
) out of thekubernetes_executor.py
module into a utils module so they are only loaded at runtime. Classes moved includeKubernetesJobWatcher
,AirflowKubernetesScheduler
,ResourceVersion
Testing
I benchmarked these changes by writing a script to import the executor module in a fresh python runtime and timing how long that takes (you can test this yourself quickly from a bash shell by doing something like
time python -c 'from airflow.executors.local_executor import KubernetesExecutor'
). Then doing that in a loop for several samples (with some randomness in the order for fairness) both on main and on my development branch.Results
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.