Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert DagFileProcessor.execute_callbacks to Internal API #28900

Merged
merged 118 commits into from
Sep 22, 2023

Conversation

vincbeck
Copy link
Contributor

@vincbeck vincbeck commented Jan 12, 2023

Closes #28269
Closes #28784

Add the following methods to the internal API component:

  • DAG.fetch_callback
  • DAG.fetch_dagrun
  • SerializedDagModel.get_serialized_dag
  • TaskInstance.get_task_instance
  • TaskInstance.fetch_handle_failure_context
  • TaskInstance.save_to_db

@boring-cyborg boring-cyborg bot added the area:Scheduler including HA (high availability) scheduler label Jan 12, 2023
@vincbeck vincbeck marked this pull request as draft January 12, 2023 20:40
@vincbeck
Copy link
Contributor Author

@mhenc

airflow/api_internal/actions/dag.py Outdated Show resolved Hide resolved
airflow/dag_processing/processor.py Outdated Show resolved Hide resolved
@vincbeck
Copy link
Contributor Author

@mhenc and @potiuk whenever you get a chance if you can take a look and give your thoughts please :) There is a lot of moving and refactoring code here so I'll definitely need your help :)

Copy link
Contributor

@mhenc mhenc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general approach looks good.

airflow/models/dag.py Outdated Show resolved Hide resolved
airflow/models/dag.py Outdated Show resolved Hide resolved
airflow/models/dag.py Outdated Show resolved Hide resolved
@vincbeck vincbeck force-pushed the vincbeck/execute_callbacks branch from beed957 to 2b9014b Compare September 13, 2023 19:22
"""
Stop non-teardown tasks in dag.

:meta private:
"""
tis = self.dag_run.get_task_instances(session=session)
assert task_instance.dag_run is not None
Copy link
Member

@potiuk potiuk Sep 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only use asserts in in special circumstances (TYPE_CHECKING for example) - I suggest to change it to

if not task_instance_.dag_run:
   raise ValueError("task_instance must have dag_run set")

or similar.

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trusting that the code extracted to methods has been unchanged (hard to check).

@vincbeck vincbeck merged commit 541c9ad into apache:main Sep 22, 2023
1 check passed
@vincbeck vincbeck deleted the vincbeck/execute_callbacks branch September 22, 2023 16:36
@o-nikolas
Copy link
Contributor

Congrats on getting this one merged @vincbeck, it was a heavy lift! 🥳

@ephraimbuddy ephraimbuddy added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Oct 3, 2023
@ephraimbuddy ephraimbuddy modified the milestone: Airflow 2.8.0 Oct 3, 2023
@provide_session
def handle_callback(self, dagrun, success=True, reason=None, session=NEW_SESSION):
def fetch_callback(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vincbeck @potiuk @mhenc @uranusjr just curious what your thoughts are on backward compatibility on this one. technically it (handle_callback) was part of a public class and not marked :meta private: in the docstring so.... technically it was probably public and therefore subject to backcompat.

that said, considering it as part of the public API also seems absurd. do you think we should put in our "public API" some "cover your ass" type of language that sort of expresses that .... methods which are clearly not for public use, even if not marked internal, are internal. maybe we could add some language that explains what that means. like methods not related to the dag authoring interface etc -- not sure.

but in any case, we should mark fetch_callback as private by either prefixing with underscore or adding :meta private:.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks backward compatible to me, Github makes it confusing but handle_callback is not removed, look at line 1423

Copy link
Member

@potiuk potiuk Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree it's not "disappearing" now so not an issue.

But I also would like to have a bit of philosophical rant here.

I thnk we are approaching backwards compatibility here as "0 / 1". And I keep on repeating it's totally wrong. Hyrum's right is totallly right here: https://www.hyrumslaw.com/ - in sufficiantly complex system ANY change is braaking. You would have to stop changing things to stop things from breaking. There is no way around it. And we cannot describe super precise rules about it upfront. We cannot really say:

  • any time we change this and that, we are technically breaking things

That would make us slow and very shortly we would lose any flexibliity. I think we will never achieve 100% correctness and sets of rules that will clearly say for each change "breaking/not breaking" in automated way. We can approach it and get closer to it by adding more and more rules and description - but we will at most asymptotically get closer to the certainty - never achieving it and at some point in time, adding more and more rules will make it more not less confusing and contradicting. So we should strive for 'good enough" and "pretty correct" set of rules - but also accept the fact that there will be exceptions and room for interpretation and even for arbitrary decisions that others (including some of our users) might not agree with.

As I see it (and what I think SemVer also explains) Backwards Compatiility ad Semver is NOT about following certain rules "adding a parameter is breaking, renamig any method no marked as private is breaking". IMHO this is about three things:

  1. what is the INTENTION we had when we created the code - were we INTENDING to make it relied on? Was described and explained that users were supposed to rely on it ? Or was their reliance on certain methods and fields accidental and the fact that method was there was just "assumed" they can rely on it?

  2. How likely it is tha many of our users made such assumptions if it was not clearly documented, and explained - or even if they could take the impression it was, how likely it is we are breaking something sersious.

  3. How difficult it is to recover for our users. If the system is failing immediately and what the user needs to do is flipping the flag to bring back the old behaviour - is it breaking or not? If the system is not failing but the change in behaviour is not persistent nor dangerous and the user might bring it back with a flip of a flag - is it breaking or not?

And yes - it means we will sometimes have to make arbitrary decisions based on gut feelings not data nor precise rules followed. And yes - it means that sometimes there will be individual angry users who will tells us "but you promised backwards compatibiliity - bring it back NOW", and there will be cases where we disageree between ourselves - maintainers - what is backwards compatible and what is. not and we will have to vote on it eventually. And yes - sometimes it will mean we will take a wrong decisiion and break too many workflows of too many users and we will have to quickly release a bugfix that will revert it.

All this. And more. And we will remains humans making sometimes flawed and imperfect decisions based on our insticts and intentions and gut feelings not data and strict rules - rather than robots following precise rules and prescribed algorithms. I think this is why we - as maintainers are still needed in the project - to make such decisions.

Sorry If I've gotten a bit too philosophical, but I do think we are quite too often trying to make things crystal clear and be free of making the decisions so that we don't have to well, make decisions.

It's needed in many cases - that's why I am also adding a lot of rules on how we approach things - for example provider's maintenance lifecycale. But I treat it more as communication tool and write down our intentions and where possible leave enough room for interpretation and decision making.

Where we can - yes we should make clear rule. But when we can't we should state our intentions, communicate general principles, and simply try - as best as we can - to fulfill those stated intentions (but we should attempt to communicate those intentions so that our users are aware of them).

Comment on lines +401 to +402
if task_instance.next_method:
if task_instance.next_method:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hello, coming long after the fact, but I think that during the move, this check was duplicated

# then we need to pick the right method to come back to, otherwise
# we go for the default execute
execute_callable_kwargs = {}
if self.next_method:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is where the code from my comment above is coming from, there is only one check here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. No harm but would be great to fix it. Care for opening PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dstandish added a commit to astronomer/airflow that referenced this pull request Mar 30, 2024
Fundamentally what's going on here is we need a TaskInstance object instead of a Row object when sending over the wire in RPC call.  But the full story on this one is actually somewhat complicated.
It was back in 2.2.0 in apache#25312 when we converted to query with the column attrs instead of the TI object (apache#28900 only refactored this logic into a function).  The reason was to avoid locking the dag_run table since TI newly had a dag_run relationship attr.  Now, this causes a problem with AIP-44 because the RPC api does not know how to serialize a Row object.
This PR switches back to querying a TaskInstance object, but avoids locking dag_run by using lazy_load option.  Meanwhile, since try_number is a horrible attribute (which gives you a different answer depending on the state), we have to switch it back to look at the underlying private attr instead of the public accesor.
dstandish added a commit that referenced this pull request Apr 2, 2024
Fundamentally what's going on here is we need a TaskInstance object instead of a Row object when sending over the wire in RPC call.  But the full story on this one is actually somewhat complicated.
It was back in 2.2.0 in #25312 when we converted to query with the column attrs instead of the TI object (#28900 only refactored this logic into a function).  The reason was to avoid locking the dag_run table since TI newly had a dag_run relationship attr.  Now, this causes a problem with AIP-44 because the RPC api does not know how to serialize a Row object.
This PR switches back to querying a TaskInstance object, but avoids locking dag_run by using lazy_load option.  Meanwhile, since try_number is a horrible attribute (which gives you a different answer depending on the state), we have to switch it back to look at the underlying private attr instead of the public accesor.
@potiuk potiuk mentioned this pull request Jul 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AIP-44 Airflow Internal API area:Scheduler including HA (high availability) scheduler changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) use public runners Makes sure that Public runners are used even if commiters creates the PR (useful for testing)
Projects
None yet
8 participants