Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend papermill operator to support remote kernels #34840

Merged
merged 1 commit into from
Nov 13, 2023

Conversation

akshaychitneni
Copy link
Contributor


This PR adds support to run papermill operator that can connect to kernels managed externally by other systems. This would be useful to run the operator in cloud environments and would also be helpful to run spark or scala notebooks

It extends papermill to support new engine using the entry_points as described here

It adds unittest and also a system test to run in CI environments.

Validated using below steps in breeze environment:

* breeze ci-image build --upgrade-to-newer-dependencies
* breeze start-airflow
* pytest --system papermill tests/system/providers/papermill/example_papermill_remote_verify.py

@boring-cyborg
Copy link

boring-cyborg bot commented Oct 9, 2023

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: [email protected]
    Slack: https://s.apache.org/airflow-slack

@akshaychitneni akshaychitneni changed the title Add remote kernel support for papermill operator Extend papermill operator to support remote kernels Oct 9, 2023
Copy link
Contributor

@Taragolis Taragolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The situation around Papermill is not good, this provider might be suspended in the future (or even soon), see:

cc: @eladkal @potiuk

airflow/providers/papermill/operators/papermill.py Outdated Show resolved Hide resolved
airflow/providers/papermill/operators/__init__.py Outdated Show resolved Hide resolved
@eladkal
Copy link
Contributor

eladkal commented Oct 11, 2023

The situation around Papermill is not good, this provider might be suspended in the future (or even soon), see:

Yes but in the meantime we can accept PRs

@akshaychitneni akshaychitneni force-pushed the remote_kernel branch 2 times, most recently from 89b463c to a810cb1 Compare October 15, 2023 20:26
@akshaychitneni
Copy link
Contributor Author

The situation around Papermill is not good, this provider might be suspended in the future (or even soon), see:

cc: @eladkal @potiuk

I am happy to contribute/maintain papermill.

@akshaychitneni
Copy link
Contributor Author

@Taragolis @eladkal Could you please take a look?

@akshaychitneni akshaychitneni force-pushed the remote_kernel branch 3 times, most recently from acd40ab to 9594640 Compare October 24, 2023 17:46
@eladkal eladkal requested a review from Taragolis October 25, 2023 06:17
Copy link
Contributor

@Taragolis Taragolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Need to fix Static Checks
  • Move generic classes to appropriate places
  • Create Connection documentation
  • Additional unit tests

airflow/providers/papermill/operators/papermill.py Outdated Show resolved Hide resolved
airflow/providers/papermill/operators/papermill.py Outdated Show resolved Hide resolved

if TYPE_CHECKING:
from airflow.utils.context import Context

REMOTE_KERNEL_ENGINE = "remote_kernel_engine"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why it should be constant value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are registering a remote kernel engine(custom papermill engine) so the operator can work with remote kernels.

airflow/providers/papermill/provider.yaml Outdated Show resolved Hide resolved
Comment on lines 57 to 58
def shutdown_kernel(self, now: bool = False, restart: bool = False) -> None:
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why there is no implementation here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The operator is not managing the lifecycle of the kernel but only connecting to a remote kernel if configured via hook

airflow/providers/papermill/hooks/kernel.py Outdated Show resolved Hide resolved
airflow/providers/papermill/hooks/kernel.py Outdated Show resolved Hide resolved
airflow/providers/papermill/hooks/kernel.py Show resolved Hide resolved
airflow/providers/papermill/hooks/kernel.py Show resolved Hide resolved
airflow/providers/papermill/operators/papermill.py Outdated Show resolved Hide resolved
@akshaychitneni akshaychitneni force-pushed the remote_kernel branch 3 times, most recently from 05823c1 to 125ddbf Compare November 7, 2023 18:12
@akshaychitneni
Copy link
Contributor Author

@Taragolis Thanks for your comments and suggestions. I have updated by PR to address your feedback. Please take a look.

Copy link
Contributor

@Taragolis Taragolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you still need to fix Static Checks, most of them fixed by run pre-commit hooks

In additional I think some fixes might required into the documentation formatting, I've add my assumptions, but better to run
breeze build-docs papermill, have a look what the error happen and try to fix it. Some addition useful link which might help to setup local development environment:

If you have any problem you always could ask help in Slack channel #development-first-pr-support

@akshaychitneni akshaychitneni force-pushed the remote_kernel branch 2 times, most recently from 047ab5b to 14ca569 Compare November 7, 2023 21:03
@akshaychitneni
Copy link
Contributor Author

akshaychitneni commented Nov 7, 2023

I think you still need to fix Static Checks, most of them fixed by run pre-commit hooks

In additional I think some fixes might required into the documentation formatting, I've add my assumptions, but better to run breeze build-docs papermill, have a look what the error happen and try to fix it. Some addition useful link which might help to setup local development environment:

If you have any problem you always could ask help in Slack channel #development-first-pr-support

@Taragolis Updated and verified building docs. Thanks

@akshaychitneni akshaychitneni force-pushed the remote_kernel branch 4 times, most recently from 90f52a6 to dcf69aa Compare November 10, 2023 18:03
@bolkedebruin bolkedebruin merged commit 18dac61 into apache:main Nov 13, 2023
71 checks passed
Copy link

boring-cyborg bot commented Nov 13, 2023

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

@ephraimbuddy ephraimbuddy added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Nov 20, 2023
@ephraimbuddy ephraimbuddy added this to the Airflow 2.8.0 milestone Nov 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers area:system-tests changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) provider:papermill
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants