-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pandera integration using new plugin system #354
Pandera integration using new plugin system #354
Conversation
plugins/tests/pandera/test_wf.py
Outdated
import pandera | ||
import pytest | ||
|
||
import flytekitplugins.pandera |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this line pass lint? it's not needed in code right? it's just needed to trigger everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we actually have the same problem with the regular pandas library. Tasks and workflows that take in and return pandas dataframes need to trigger the SchemaTransformer, but unless they explicitly import flytekit.types.schema, the transformer isn't registered and it doesn't run. Can you think of a way around this?
@@ -13,10 +13,14 @@ | |||
from flytekit.annotated.task import reference_task, task | |||
from flytekit.annotated.workflow import WorkflowFailurePolicy, reference_workflow, workflow | |||
from flytekit.loggers import logger | |||
from flytekit.types import schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we actually have the same problem with the regular pandas library. Tasks and workflows that take in and return pandas dataframes need to trigger the SchemaTransformer, but unless they explicitly import flytekit.types.schema, the transformer isn't registered and it doesn't run. Can you think of a way around this?
@wild-endeavor this is one workaround: importing the relevant types in the top-level __init__.py
file such that the pandas dataframe transformer is loaded anytime user imports flytekit
.
As for plugins, see the import_plugins
function in plugins/__init__.py
, which is called in this file. It just tries to import all plugins available in the new plugin system. This won't work for third-party flytekit plugins that aren't part of the plugin microlib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ya, for schema we should be able to do this, as we have now made it a requirement
Codecov Report
@@ Coverage Diff @@
## master #354 +/- ##
=======================================
Coverage 96.00% 96.00%
=======================================
Files 2 2
Lines 75 75
Branches 8 8
=======================================
Hits 72 72
Misses 1 1
Partials 2 2 Continue to review full report at Codecov.
|
729364d
to
60b7068
Compare
hey can you give me write access to your fork? i want to make a PR into your PR. |
done! |
Thanks. let me know what you think. cosmicBboy#1 |
Pandera integration using new plugin system
This PR implements a pandera plugin to support
pandera.SchemaModel
s as an alternative way of expressing dataframe types in flyte tasks and workflows.Type
Are all requirements met?
Complete description
How did you fix the bug, make the feature etc. Link to any design docs etc
Tracking Issue
https://github.com/lyft/flyte/issues/
Follow-up issue
NA
OR
https://github.com/lyft/flyte/issues/