Convert DagFileProcessor.execute_callbacks to Internal API #28900

vincbeck · 2023-01-12T20:39:49Z

Closes #28269
Closes #28784

Add the following methods to the internal API component:

DAG.fetch_callback
DAG.fetch_dagrun
SerializedDagModel.get_serialized_dag
TaskInstance.get_task_instance
TaskInstance.fetch_handle_failure_context
TaskInstance.save_to_db

…e_context is not yet converted

vincbeck · 2023-01-12T20:40:26Z

@mhenc

airflow/api_internal/actions/dag.py

airflow/dag_processing/processor.py

vincbeck · 2023-01-25T20:42:43Z

@mhenc and @potiuk whenever you get a chance if you can take a look and give your thoughts please :) There is a lot of moving and refactoring code here so I'll definitely need your help :)

mhenc

The general approach looks good.

airflow/models/dag.py

This reverts commit f76bc56.

This reverts commit 975eba8.

This reverts commit b2d5bc9.

potiuk · 2023-09-22T04:40:54Z

airflow/models/taskinstance.py

    """
    Stop non-teardown tasks in dag.

    :meta private:
    """
-    tis = self.dag_run.get_task_instances(session=session)
+    assert task_instance.dag_run is not None


We only use asserts in in special circumstances (TYPE_CHECKING for example) - I suggest to change it to

if not task_instance_.dag_run: raise ValueError("task_instance must have dag_run set")

or similar.

potiuk

Trusting that the code extracted to methods has been unchanged (hard to check).

o-nikolas · 2023-09-22T17:11:36Z

Congrats on getting this one merged @vincbeck, it was a heavy lift! 🥳

dstandish · 2023-12-12T05:10:17Z

airflow/models/dag.py

    @provide_session
-    def handle_callback(self, dagrun, success=True, reason=None, session=NEW_SESSION):
+    def fetch_callback(


@vincbeck @potiuk @mhenc @uranusjr just curious what your thoughts are on backward compatibility on this one. technically it (handle_callback) was part of a public class and not marked :meta private: in the docstring so.... technically it was probably public and therefore subject to backcompat.

that said, considering it as part of the public API also seems absurd. do you think we should put in our "public API" some "cover your ass" type of language that sort of expresses that .... methods which are clearly not for public use, even if not marked internal, are internal. maybe we could add some language that explains what that means. like methods not related to the dag authoring interface etc -- not sure.

but in any case, we should mark fetch_callback as private by either prefixing with underscore or adding :meta private:.

It looks backward compatible to me, Github makes it confusing but handle_callback is not removed, look at line 1423

Agree it's not "disappearing" now so not an issue.

But I also would like to have a bit of philosophical rant here.

I thnk we are approaching backwards compatibility here as "0 / 1". And I keep on repeating it's totally wrong. Hyrum's right is totallly right here: https://www.hyrumslaw.com/ - in sufficiantly complex system ANY change is braaking. You would have to stop changing things to stop things from breaking. There is no way around it. And we cannot describe super precise rules about it upfront. We cannot really say:

any time we change this and that, we are technically breaking things

That would make us slow and very shortly we would lose any flexibliity. I think we will never achieve 100% correctness and sets of rules that will clearly say for each change "breaking/not breaking" in automated way. We can approach it and get closer to it by adding more and more rules and description - but we will at most asymptotically get closer to the certainty - never achieving it and at some point in time, adding more and more rules will make it more not less confusing and contradicting. So we should strive for 'good enough" and "pretty correct" set of rules - but also accept the fact that there will be exceptions and room for interpretation and even for arbitrary decisions that others (including some of our users) might not agree with.

As I see it (and what I think SemVer also explains) Backwards Compatiility ad Semver is NOT about following certain rules "adding a parameter is breaking, renamig any method no marked as private is breaking". IMHO this is about three things:

what is the INTENTION we had when we created the code - were we INTENDING to make it relied on? Was described and explained that users were supposed to rely on it ? Or was their reliance on certain methods and fields accidental and the fact that method was there was just "assumed" they can rely on it?

How likely it is tha many of our users made such assumptions if it was not clearly documented, and explained - or even if they could take the impression it was, how likely it is we are breaking something sersious.

How difficult it is to recover for our users. If the system is failing immediately and what the user needs to do is flipping the flag to bring back the old behaviour - is it breaking or not? If the system is not failing but the change in behaviour is not persistent nor dangerous and the user might bring it back with a flip of a flag - is it breaking or not?

And yes - it means we will sometimes have to make arbitrary decisions based on gut feelings not data nor precise rules followed. And yes - it means that sometimes there will be individual angry users who will tells us "but you promised backwards compatibiliity - bring it back NOW", and there will be cases where we disageree between ourselves - maintainers - what is backwards compatible and what is. not and we will have to vote on it eventually. And yes - sometimes it will mean we will take a wrong decisiion and break too many workflows of too many users and we will have to quickly release a bugfix that will revert it.

All this. And more. And we will remains humans making sometimes flawed and imperfect decisions based on our insticts and intentions and gut feelings not data and strict rules - rather than robots following precise rules and prescribed algorithms. I think this is why we - as maintainers are still needed in the project - to make such decisions.

Sorry If I've gotten a bit too philosophical, but I do think we are quite too often trying to make things crystal clear and be free of making the decisions so that we don't have to well, make decisions.

It's needed in many cases - that's why I am also adding a lot of rules on how we approach things - for example provider's maintenance lifecycale. But I treat it more as communication tool and write down our intentions and where possible leave enough room for interpretation and decision making.

Where we can - yes we should make clear rule. But when we can't we should state our intentions, communicate general principles, and simply try - as best as we can - to fulfill those stated intentions (but we should attempt to communicate those intentions so that our users are aware of them).

vandonr · 2024-03-15T13:18:08Z

airflow/models/taskinstance.py

+    if task_instance.next_method:
+        if task_instance.next_method:


hello, coming long after the fact, but I think that during the move, this check was duplicated

vandonr · 2024-03-15T13:19:14Z

airflow/models/taskinstance.py

-        # then we need to pick the right method to come back to, otherwise
-        # we go for the default execute
-        execute_callable_kwargs = {}
-        if self.next_method:


this is where the code from my comment above is coming from, there is only one check here

Yeah. No harm but would be great to fix it. Care for opening PR?

Fundamentally what's going on here is we need a TaskInstance object instead of a Row object when sending over the wire in RPC call. But the full story on this one is actually somewhat complicated. It was back in 2.2.0 in apache#25312 when we converted to query with the column attrs instead of the TI object (apache#28900 only refactored this logic into a function). The reason was to avoid locking the dag_run table since TI newly had a dag_run relationship attr. Now, this causes a problem with AIP-44 because the RPC api does not know how to serialize a Row object. This PR switches back to querying a TaskInstance object, but avoids locking dag_run by using lazy_load option. Meanwhile, since try_number is a horrible attribute (which gives you a different answer depending on the state), we have to switch it back to look at the underlying private attr instead of the public accesor.

Fundamentally what's going on here is we need a TaskInstance object instead of a Row object when sending over the wire in RPC call. But the full story on this one is actually somewhat complicated. It was back in 2.2.0 in #25312 when we converted to query with the column attrs instead of the TI object (#28900 only refactored this logic into a function). The reason was to avoid locking the dag_run table since TI newly had a dag_run relationship attr. Now, this causes a problem with AIP-44 because the RPC api does not know how to serialize a Row object. This PR switches back to querying a TaskInstance object, but avoids locking dag_run by using lazy_load option. Meanwhile, since try_number is a horrible attribute (which gives you a different answer depending on the state), we have to switch it back to look at the underlying private attr instead of the public accesor.

Start converting _execute_task_callbacks to internal API. get_templat…

609aebb

…e_context is not yet converted

vincbeck requested review from jedcunningham, ephraimbuddy, kaxil, XD-DENG and ashb as code owners January 12, 2023 20:39

boring-cyborg bot added the area:Scheduler including HA (high availability) scheduler label Jan 12, 2023

vincbeck marked this pull request as draft January 12, 2023 20:40

mhenc reviewed Jan 13, 2023

View reviewed changes

airflow/api_internal/actions/dag.py Outdated Show resolved Hide resolved

airflow/dag_processing/processor.py Outdated Show resolved Hide resolved

vincbeck and others added 8 commits January 16, 2023 15:21

Move get_serialized_dag and get_task_instance methods to model classes

ac48800

Convert handle_failure() method to internal API

b57ddca

Merge branch 'main' into vincbeck/execute_callbacks

917abfb

Remove comment

6dfebce

Migrate _execute_dag_callbacks to internal API

1867db4

Merge branch 'main' into vincbeck/execute_callbacks

6b110ea

Merge branch 'main' into vincbeck/execute_callbacks

c21fb81

Use cls.logger()

d92fa83

mhenc reviewed Feb 2, 2023

View reviewed changes

airflow/models/dag.py Outdated Show resolved Hide resolved

airflow/models/dag.py Outdated Show resolved Hide resolved

airflow/models/dag.py Outdated Show resolved Hide resolved

mhenc mentioned this pull request Feb 2, 2023

AIP-44 Migrate DagFileProcessorManager._fetch_callbacks to Internal API #28784

Closed

vincbeck and others added 9 commits February 2, 2023 16:34

Remove todo

cc52f47

Merge branch 'main' into vincbeck/execute_callbacks

ee27659

Add back if callbacks

68bfc72

Merge branch 'main' into vincbeck/execute_callbacks

839dc6c

Remove comments

30d6ef9

Merge branch 'main' into vincbeck/execute_callbacks

9fb937d

Refactore _fetch_callback

81ee61b

Fix TaskInstance.get_task_instance

3c300a7

Fix unit tests

dc6031e

vincbeck added 3 commits September 13, 2023 10:47

Revert "Replace save_to_db() to finish_task()"

989672e

This reverts commit f76bc56.

Revert "Fix set_end_date"

d51371e

This reverts commit 975eba8.

Revert "Remove _set_duration and introduce set_end_date"

2b9014b

This reverts commit b2d5bc9.

vincbeck force-pushed the vincbeck/execute_callbacks branch from beed957 to 2b9014b Compare September 13, 2023 19:22

Add dataset property to DatasetEventPydantic

c6573c7

potiuk reviewed Sep 22, 2023

View reviewed changes

potiuk approved these changes Sep 22, 2023

View reviewed changes

Remove assert

0653a33

vincbeck merged commit 541c9ad into apache:main Sep 22, 2023
1 check passed

vincbeck deleted the vincbeck/execute_callbacks branch September 22, 2023 16:36

ephraimbuddy added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Oct 3, 2023

ephraimbuddy modified the milestone: Airflow 2.8.0 Oct 3, 2023

dstandish reviewed Dec 12, 2023

View reviewed changes

vandonr reviewed Mar 15, 2024

View reviewed changes

dstandish mentioned this pull request Mar 27, 2024

Remove select_column option in TaskInstance.get_task_instance #38571

Merged

potiuk mentioned this pull request Jul 27, 2024

AIP-44 Test setup #41068

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert DagFileProcessor.execute_callbacks to Internal API #28900

Convert DagFileProcessor.execute_callbacks to Internal API #28900

vincbeck commented Jan 12, 2023 •

edited by eladkal

Loading

vincbeck commented Jan 12, 2023

vincbeck commented Jan 25, 2023

mhenc left a comment

potiuk Sep 22, 2023 •

edited

Loading

potiuk left a comment

o-nikolas commented Sep 22, 2023

dstandish Dec 12, 2023

vincbeck Dec 12, 2023

potiuk Dec 18, 2023 •

edited

Loading

vandonr Mar 15, 2024

vandonr Mar 15, 2024

potiuk Mar 20, 2024

vincbeck Mar 20, 2024

Convert DagFileProcessor.execute_callbacks to Internal API #28900

Convert DagFileProcessor.execute_callbacks to Internal API #28900

Conversation

vincbeck commented Jan 12, 2023 • edited by eladkal Loading

vincbeck commented Jan 12, 2023

vincbeck commented Jan 25, 2023

mhenc left a comment

Choose a reason for hiding this comment

potiuk Sep 22, 2023 • edited Loading

Choose a reason for hiding this comment

potiuk left a comment

Choose a reason for hiding this comment

o-nikolas commented Sep 22, 2023

dstandish Dec 12, 2023

Choose a reason for hiding this comment

vincbeck Dec 12, 2023

Choose a reason for hiding this comment

potiuk Dec 18, 2023 • edited Loading

Choose a reason for hiding this comment

vandonr Mar 15, 2024

Choose a reason for hiding this comment

vandonr Mar 15, 2024

Choose a reason for hiding this comment

potiuk Mar 20, 2024

Choose a reason for hiding this comment

vincbeck Mar 20, 2024

Choose a reason for hiding this comment

vincbeck commented Jan 12, 2023 •

edited by eladkal

Loading

potiuk Sep 22, 2023 •

edited

Loading

potiuk Dec 18, 2023 •

edited

Loading