Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIP-51] Executors vending CLI commands #29055

Merged
merged 6 commits into from
Jul 29, 2023

Conversation

o-nikolas
Copy link
Contributor

@o-nikolas o-nikolas commented Jan 20, 2023

This PR addresses executor coupling 5a and 5b in Airflow CLI. See this issue (#27932) for more context.

Please read before reviewing:

Change summary:

  1. Updates to the executor interface (i.e. add stub methods to the base executor and the two hybrid executors which don't inherit directly from base) to vend cli commands and make small changes in the cli_parser.py module to read in commands from this new method. Add tests for the interface.

  2. Move executor related cli commands (for Celery and Kubernetes) to their respective executor modules utilizing the new interface from step 2. All original tests for these CLI commands still apply.

NOTE

After this PR, CLI commands will only be displayed for the currently configured executor. This solves some technical issues around parsing and import cycles, as well as finally puts to rest many strange behaviours people would have with using CLI commands for executors they do not have installed which often yielded broken behaviour due to missing dependencies.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added area:CLI provider:cncf-kubernetes Kubernetes provider related issues area:Scheduler including HA (high availability) scheduler labels Jan 20, 2023
@o-nikolas o-nikolas added the AIP-51 AIP-51: Remove executor coupling from Core label Jan 20, 2023
@o-nikolas o-nikolas changed the title Onikolas/aip 51/executor cli vending [AIP-51] Executor CLI command vending Jan 20, 2023
@o-nikolas o-nikolas changed the title [AIP-51] Executor CLI command vending [AIP-51] Executors vending CLI commands Jan 20, 2023
@o-nikolas o-nikolas marked this pull request as draft January 20, 2023 01:51
@o-nikolas o-nikolas marked this pull request as ready for review January 20, 2023 03:24
@o-nikolas o-nikolas marked this pull request as draft January 20, 2023 03:27
@o-nikolas o-nikolas force-pushed the onikolas/aip-51/executor_cli_vending branch from 31e4b9a to 689cd36 Compare January 20, 2023 03:46

airflow_commands = core_commands
for executor in ExecutorLoader.import_all_executor_classes(silent=True):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implication of moving the executor cli commands to the executor modules is that if the executor module fails to load, then its cli commands won't be included in Help output. I think this is actually a fair behaviour for third party executors and even the core executors that are built into airflow. However, it is not the current behavior we have today, which is that the executor cli commands will still show up in help if they are not importable, but you'll get either warnings or stacktraces when you actually try to use them (since at that point you'll hit the importing issues). We try to catch those traces and log in this DefaultHelpParser._check_value method, but it only looks for the celery dependency (and there could be many other ways that the module fails to load).

I'm not yet personally decided on what the right approach is here. I'd appreciate input from other folks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to mitigate this, in a generalized way, would be to catch import errors for the executors as we loop through them looking for CLI commands, and then log to the user that they could not be imported, which may mean some cli commands are not available for those executors and the user should first fix the issues if they wish to use those commands.

@potiuk @pierrejeambrun @eladkal any thoughts?

Copy link
Member

@potiuk potiuk Mar 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine. We already have precedent which is similar but even worse. When we have no CeleryExecutor configured, running "celery" command group yields this:

[jarek:~/code/airflow] [airflow-3.7] fix-infinite-nofiles-from-containerd+ 2s 130 ± airflow celery            
Usage: airflow [-h] GROUP_OR_COMMAND ...

Positional Arguments:
  GROUP_OR_COMMAND

    Groups:
      celery         Celery components
      config         View configuration

.....

Optional Arguments:
  -h, --help         show this help message and exit

airflow command error: argument GROUP_OR_COMMAND: celery subcommand works only with CeleryExecutor, CeleryKubernetesExecutor and executors derived from them, your current executor: SequentialExecutor, subclassed from: BaseExecutor, see help above.

I found it annoying when testing some stuff but I never got to a good solution. And when I think about it - I am more annoyed by the fact that I see the command group but I cannot actually see what subcommands it has than anything else. And this is more of a maintainer problem than user problem.

And I even thik in this case your proposal combined with moving the celery executors to providers - we could get rid of the above warning (or rather move it to each of the subcommands) and that would solve the "annoyance" I had. This way you would only see celery command when you have "celery provider" installed (once we move the executor to celery) and not when you have it configured (and I'd love it). This is IMHO much more consistent approach

I think this is also a nice way to actually finally decouple executors - this is a nice thing that the built-in executor CLIs will add new CLI commands.

We still have the documenttion generated for all the built-in executors, so even if they are missing in the main CLI that's more than enough IMHO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really appreciate the time to read the thread and send a thoughtful reply @potiuk 🙏

Unless anyone else disagrees I'll go ahead with this approach.
There will also need to be a followup PR which removes some of the error handling we attempt to do (basically the thing that produces that message you ran into). Core airflow should not be attempting to do this, I think we agree on that, but correct me if not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Core airflow should not be attempting to do this, I think we agree on that, but correct me if not.

Yes. we agree.

@o-nikolas o-nikolas force-pushed the onikolas/aip-51/executor_cli_vending branch 6 times, most recently from c47b170 to 18937e4 Compare January 27, 2023 17:49
@o-nikolas o-nikolas marked this pull request as ready for review January 28, 2023 01:11
@o-nikolas o-nikolas force-pushed the onikolas/aip-51/executor_cli_vending branch from 18937e4 to 7c6ab0d Compare January 30, 2023 21:30
@potiuk
Copy link
Member

potiuk commented Feb 18, 2023

(sorry for looong delay). Yes. All that is cool - and I think we should simply split it into three PRs to do a final review :)

@o-nikolas
Copy link
Contributor Author

(sorry for looong delay). Yes. All that is cool - and I think we should simply split it into three PRs to do a final review :)

Thanks for checking in on this one @potiuk. And that's a fair callout. I usually try to keep PRs small and tactical, but this one felt cohesive. BUT, I think I can at least separate the first comment from the second two and they would still make sense to merge that way.

BTW, did you have a read through the comment thread above about the CLI Help output? I'm curious to hear your perspective on that.

@potiuk
Copy link
Member

potiuk commented Mar 4, 2023

BTW. It's ok to merge it as a single squashed PR I think

@o-nikolas o-nikolas reopened this Jul 28, 2023
except Exception:
executor_name = ExecutorLoader.get_default_executor_name()
log.exception("Failed to load CLI commands from executor: %s", executor_name)
log.error("Ensure all dependencies are met and try again")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to at least print the mesage of the exception that caused it. Otherwise the user will be completely unaware what might be the reason and when the user raises issue "my airflow does not work, what should I do?" - not even the user but also whoever looks at it in GitHub issues will have no clue either, because the actual exception will be swallowed and we will have no clue how to help the user.

Ideally we should also separately handle ImportError specifically, separately. Having an ImportError is a clear indication that the user has no right provider installed or no provider at alll. This will be probably 9X% of import errors here, so we can give the user a much more specific error:

  1. first of all print the exact exception message that might give the exact import error.
  2. We can add a generic advice: "If your executor is based on Celery - you need to install 3.3.0+ version of provider, if it is Kubernetes - yoy need to install 7.4.0+ of cncf.kubernetes. While we cannot be sure if this is the reason, this will be it in vast majority of cases and might significantly incrase the number of cases where the users will be able to install provider and solve the problem on their own rather than opening an issue to Airflow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to at least print the mesage of the exception that caused it.

Yeah, that's fair, and I did try it with this configuration (as well as printing the full stack trace) but it really makes the output of the CLI commands ugly, and if the user doesn't even care about executor commands they're going to see the exception and a large warning every time which is a bad user experience as well.

But I can try add it back in again and see how it looks.

We can add a generic advice: "If your executor is based on Celery - you need to install 3.3.0+ version of provider, if it is Kubernetes - yoy need to install 7.4.0+ of cncf.kubernetes. While we cannot be sure if this is the reason, this will be it in vast majority of cases and might significantly incrase the number of cases where the users will be able to install provider and solve the problem on their own rather than opening an issue to Airflow.

It feels a bit wrong to me to add this much celery/kubernetes specific hardcoding into this generic executor code. I'll try update the general exception message I print here to mention the celery/kube versions, but I think we should remove those soon in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, actually, after reviewing this I'm realizing I actually already solved this by using log.exception. That includes the exception message along with the log message. So I think we're all good on that front.

But I'll add a bit more info to the following message I display to have some info about celery/kubernetes
@potiuk

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a screenshot of how it will look. Some things to note:

  1. I start by uninstalling a dependency of Celery
  2. Notice how if the executor is something like LocalExecutor, we get no warnings and everything works just fine 😃
  3. If we switch to CeleryExecutor, we get:
    1. A message saying we failed to load CLI commands from executor
    2. The traceback and Exception message
    3. And the helpful tip (now updated with more details from your suggestion @potiuk)
  4. The CLI command still works despite the issue

Let me know if you think this suffices
Screenshot from 2023-07-28 13-46-38

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. That loooks cool. Even if ti is a bit ugly, this is a very good message to act on by the user (and if the user will create an issue - which will inevitably happen - "my command failed, what do I do" - we can simply respond -" just follow what is written in the message" and close such issue. (And I am not complaining, this is a fact of life that no matter what you do, some users will not read those error messages - but at least they will cop&paste it so that you can read it and ask the user to read it)l

Copy link
Member

@potiuk potiuk Jul 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also: note @o-nikolas how much the "specific" message is needed.

Put yourself in the shoes of a new user, who read somewhere that in order to use Celery executor, they need to set the env variable (but did not have celery provider installed and has no idea it needs to be added - because they have not read and digested the docs, they are just experimenting).

If you just print the exception and no extra message, the user would be at a total loss on what to do because the import error would not be telling anythong to the user who would not know that they have to installl celery provider. They would just see import error and no further explanation. With the extra message, we make it very clear what the problem very likely is and give the user clear instruction what to do to fix the problem.

Without it, we invite the uer to simply open an issue to Airflow and say "Airflow does not work - I have that exception, what do I do ?". With this extra message we - in vast majority of cases - prevent that question even being asked because we provide an answer to it before the user asks it.

@potiuk
Copy link
Member

potiuk commented Jul 28, 2023

Added my comments - looks good, but I think we should slightly improve diagnostics of problems and typing hints.

@potiuk
Copy link
Member

potiuk commented Jul 28, 2023

BTW. I really like how it makes the core even more "lean" without provider's discovery. That's really good observation that we ONLY need chosen executor's command set and "other's" are not needed.

BTW2. You will also have to handle conditional imports of the cli_config which apparently was not present in Airflow 2.4. This is a bit unneeded because the _executor* classes are going to be only used in Airflow 2.7, so maybe the best way to handle that would be to exclude them from the import check we do when verifying provider's 2.4 compatibility.

The easiest way to handle it @o-nikolas - is to try/except Import error the imports and check if airfow version is < 2.7.0 and raise AirflowOptionalProviderFeatureException. The AirflowOptionalProviderFeatureException is ignored when we verify if providers are OK for old versions of airflow.

@o-nikolas
Copy link
Contributor Author

BTW. I really like how it makes the core even more "lean" without provider's discovery. That's really good observation that we ONLY need chosen executor's command set and "other's" are not needed.

😃

BTW2. You will also have to handle conditional imports of the cli_config which apparently was not present in Airflow 2.4. This is a bit unneeded because the _executor* classes are going to be only used in Airflow 2.7, so maybe the best way to handle that would be to exclude them from the import check we do when verifying provider's 2.4 compatibility.

The easiest way to handle it @o-nikolas - is to try/except Import error the imports and check if airfow version is < 2.7.0 and raise AirflowOptionalProviderFeatureException. The AirflowOptionalProviderFeatureException is ignored when we verify if providers are OK for old versions of airflow.

Okay, I'll try whip this up today. What message should we add in the AirflowOptionalProviderFeatureException? Or is this really only for our import checks and is unlikely to be hit by a user?

@potiuk potiuk merged commit fcbbf47 into apache:main Jul 29, 2023
@potiuk
Copy link
Member

potiuk commented Jul 29, 2023

Fantastic!

@potiuk
Copy link
Member

potiuk commented Jul 29, 2023

Okay, I'll try whip this up today. What message should we add in the AirflowOptionalProviderFeatureException? Or is this really only for our import checks and is unlikely to be hit by a user?

What you proposed was good :)

potiuk added a commit to potiuk/airflow that referenced this pull request Jul 29, 2023
Small follow-up after apache#29055 - for better diagnostic of potential
future problems, it would be good to re-raise the original import
error, otherwise if the Import error results from some other issue
than Airflow version, we will get quite a bit of head scratching
trying to diagnose some of the resulting aftermath.
potiuk added a commit that referenced this pull request Jul 29, 2023
Small follow-up after #29055 - for better diagnostic of potential
future problems, it would be good to re-raise the original import
error, otherwise if the Import error results from some other issue
than Airflow version, we will get quite a bit of head scratching
trying to diagnose some of the resulting aftermath.
potiuk added a commit to potiuk/airflow that referenced this pull request Aug 4, 2023
The apache#29055 moved relevant CLI command definition to executors but
it automatically removed the command from generated documentation.

This PR brings it back for both Celery and Cncf Kubernetes provider

Closes: apache#27932
potiuk added a commit that referenced this pull request Aug 4, 2023
The #29055 moved relevant CLI command definition to executors but
it automatically removed the command from generated documentation.

This PR brings it back for both Celery and Cncf Kubernetes provider

Closes: #27932
ephraimbuddy pushed a commit that referenced this pull request Aug 8, 2023
The #29055 moved relevant CLI command definition to executors but
it automatically removed the command from generated documentation.

This PR brings it back for both Celery and Cncf Kubernetes provider

Closes: #27932
(cherry picked from commit 879fd34)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AIP-51 AIP-51: Remove executor coupling from Core area:CLI area:Scheduler including HA (high availability) scheduler changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) provider:cncf-kubernetes Kubernetes provider related issues use public runners Makes sure that Public runners are used even if commiters creates the PR (useful for testing)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants