-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add "Optimizing" chapter to dynamic-dags section #25121
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is genius! 🚀
I will wait for others to improve the English wording probably :) |
Very interesting, I think this will help a lot of people 💪 |
76087c6
to
13a85f0
Compare
Hey @BasPH, @eladkal I took your comments in and rephrased it a bit. I removed "worker" and "k8s" references and replaced it with more generic "task execution". I also added a few improvements to make the description and tests more "robust":
|
ab8cccd
to
9eae1bf
Compare
ocnverted to draft to iterate a bit more on it |
Yep. This is what you get when you run "getproctitle()" in your task:
Maybe not perfect but it might do for now until we add some more robust way. |
9eae1bf
to
4305d5c
Compare
I think this will do for now, but in the futture we might add more robust way |
1e5251e
to
55d766c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little torn on documenting this in the official docs. Almost feels like if we are going to officially suggest it, we should have a better DAG authoring experience for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, nice that the example works for both new and forked processes. Made some smaller comments.
55d766c
to
968b998
Compare
I addressed all comments I think -> Would love one more pass @BasPH @jedcunningham |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few minor comments, otherwise LGTM!
ba3ef1b
to
9a34ded
Compare
9a34ded
to
6e0b4da
Compare
Actually I found (while implementing proper context) that another variang of proctitle is possible :) 🤯 |
6e0b4da
to
060c33e
Compare
There is a nice way to optimize dynamic DAG generation by our users. Adding a short chapter linking to example on how this can be done might guide them to do similar approach. We handle several cases here: * starting task via Python interpreter * starting task via forking * running "airflow tasks test" command The detection here is rather complex and in the follow up PR we will add a more robust detection mechanims (but it will be only available as of Airflow 2.4)
240feeb
to
4db24d8
Compare
I know @jedcunningham you had reservation about putting that out now. I thought a bit about it and I added one more chapter about "experimental" status of the practice - with lots of reservations and disclaimers. If you think this is fine now @jedcunningham @BasPH - I will look at the robust solution in #25161 (and then we can think if we want to add any kind of future approach. I think there is very little harm in adding the documentation about it even now with the "complex" code - as this will help some users and they will at least know that there is a possibility of such optimization. The problem is real and serious - the more dynamic dags people will use, the more similiar solution will be needed, so I prefer even to get some issues from people who tried and fail, so that we can possibly see if we can optimize it even further or maybe even add some other solutions in the future. |
The new paragraph looks good, and it does give us something to point at to say "we told you it was experimental", I'm not sure it changes much. I'm not exactly sure how to articulate my thoughts, but I'll try. This feels to me like we are helping hide the fact people didn't architect their DAGs properly. If you've hit the point where this is necessary, then you've already strayed out of the norm. Putting out a complex chuck of code that'll be copy pasted around, getting out of date, what have you is just adding an extra burden to maintenance and support. I can totally imagine down the road an upgrade breaking an early version of this and us getting hard to diagnose bug reports because of it. Worse, if the users DAGs are big/complex enough to need this, they likely can't/won't share it as a reproduction so that adds an extra burden. There are other ways to DRY up a DAG than to have them all come from a single loop 🤷♂️. I have no problem with folks opting in to this, knowing (hopefully) exactly the risks that come with it. However once we toss it in the official docs (even with a disclaimer paragraph), it becomes "officially blessed" to an extent, and I'm not sure we should do that for this iteration. tldr: very clever solution, but too esoteric/brittle to be officially documented for end users imo. |
I think you managed very well :).
Yeah I know exactly what you mean. I have reservations myself, looking brittleness of the solution. I actually think I will do it differently. I will - myself - write a short blog post acompanying the one from @itayB where I add a more complete solution and I also publish it in Airflow Publication.
Should we maybe then mention THIS instead? Maybe that is a better approach that we warn the users not to do it?
Question: woudl you also object if just get the "official" approach from #25161 ? Or is it mostly the brittleness ? |
I like this idea. We should do it, even if/when we do add a better short circuit for the loop approach.
👍 |
Closin this one then - blog post is being reviewed :) |
There is a nice way to optimize dynamic DAG generation by our
users. Adding a short chapter linking to example on how this can
be done might guide them to do similar approach.
We handle several cases here:
The detection here is rather complex and in the follow up PR
we will add a more robust detection mechanims (but it will be
only available as of Airflow 2.4)
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.