-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend papermill operator to support remote kernels #34840
Conversation
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
|
ca6a3ce
to
c105472
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The situation around Papermill is not good, this provider might be suspended in the future (or even soon), see:
Yes but in the meantime we can accept PRs |
89b463c
to
a810cb1
Compare
I am happy to contribute/maintain papermill. |
@Taragolis @eladkal Could you please take a look? |
acd40ab
to
9594640
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Need to fix Static Checks
- Move generic classes to appropriate places
- Create Connection documentation
- Additional unit tests
|
||
if TYPE_CHECKING: | ||
from airflow.utils.context import Context | ||
|
||
REMOTE_KERNEL_ENGINE = "remote_kernel_engine" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain why it should be constant value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are registering a remote kernel engine(custom papermill engine) so the operator can work with remote kernels.
def shutdown_kernel(self, now: bool = False, restart: bool = False) -> None: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why there is no implementation here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The operator is not managing the lifecycle of the kernel but only connecting to a remote kernel if configured via hook
05823c1
to
125ddbf
Compare
@Taragolis Thanks for your comments and suggestions. I have updated by PR to address your feedback. Please take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you still need to fix Static Checks, most of them fixed by run pre-commit
hooks
In additional I think some fixes might required into the documentation formatting, I've add my assumptions, but better to run
breeze build-docs papermill
, have a look what the error happen and try to fix it. Some addition useful link which might help to setup local development environment:
If you have any problem you always could ask help in Slack channel #development-first-pr-support
docs/apache-airflow-providers-papermill/connections/jupyter_kernel.rst
Outdated
Show resolved
Hide resolved
docs/apache-airflow-providers-papermill/connections/jupyter_kernel.rst
Outdated
Show resolved
Hide resolved
047ab5b
to
14ca569
Compare
@Taragolis Updated and verified building docs. Thanks |
90f52a6
to
dcf69aa
Compare
dcf69aa
to
6364081
Compare
Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions. |
This PR adds support to run papermill operator that can connect to kernels managed externally by other systems. This would be useful to run the operator in cloud environments and would also be helpful to run spark or scala notebooks
It extends papermill to support new engine using the entry_points as described here
It adds unittest and also a system test to run in CI environments.
Validated using below steps in breeze environment: