Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandera integration using new plugin system #354

Merged
merged 12 commits into from
Feb 6, 2021

Conversation

cosmicBboy
Copy link
Contributor

@cosmicBboy cosmicBboy commented Jan 29, 2021

Pandera integration using new plugin system

This PR implements a pandera plugin to support pandera.SchemaModels as an alternative way of expressing dataframe types in flyte tasks and workflows.

import flytekitplugins.pandera

import pandera as pa
from pandera.typing import DataFrame, Series

class Schema(pa.SchemaModel):
    col1: Series[int]
    col2: Series[float]


@task
def my_task(df: DataFrame[Schema]):
    ...

Type

  • Bug Fix
  • Feature
  • Plugin

Are all requirements met?

  • Code completed
  • Smoke tested
  • Unit tests added
  • Code documentation added
  • Any pending items have an associated Issue

Complete description

How did you fix the bug, make the feature etc. Link to any design docs etc

Tracking Issue

https://github.com/lyft/flyte/issues/

Follow-up issue

NA
OR
https://github.com/lyft/flyte/issues/

import pandera
import pytest

import flytekitplugins.pandera
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this line pass lint? it's not needed in code right? it's just needed to trigger everything.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we actually have the same problem with the regular pandas library. Tasks and workflows that take in and return pandas dataframes need to trigger the SchemaTransformer, but unless they explicitly import flytekit.types.schema, the transformer isn't registered and it doesn't run. Can you think of a way around this?

@@ -13,10 +13,14 @@
from flytekit.annotated.task import reference_task, task
from flytekit.annotated.workflow import WorkflowFailurePolicy, reference_workflow, workflow
from flytekit.loggers import logger
from flytekit.types import schema
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we actually have the same problem with the regular pandas library. Tasks and workflows that take in and return pandas dataframes need to trigger the SchemaTransformer, but unless they explicitly import flytekit.types.schema, the transformer isn't registered and it doesn't run. Can you think of a way around this?

@wild-endeavor this is one workaround: importing the relevant types in the top-level __init__.py file such that the pandas dataframe transformer is loaded anytime user imports flytekit.

As for plugins, see the import_plugins function in plugins/__init__.py, which is called in this file. It just tries to import all plugins available in the new plugin system. This won't work for third-party flytekit plugins that aren't part of the plugin microlib.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya, for schema we should be able to do this, as we have now made it a requirement

@codecov-io
Copy link

codecov-io commented Jan 31, 2021

Codecov Report

Merging #354 (b05f0f2) into master (84c6b52) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #354   +/-   ##
=======================================
  Coverage   96.00%   96.00%           
=======================================
  Files           2        2           
  Lines          75       75           
  Branches        8        8           
=======================================
  Hits           72       72           
  Misses          1        1           
  Partials        2        2           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 84c6b52...60b7068. Read the comment docs.

@wild-endeavor
Copy link
Contributor

hey can you give me write access to your fork? i want to make a PR into your PR.

@cosmicBboy
Copy link
Contributor Author

hey can you give me write access to your fork? i want to make a PR into your PR.

done!

@wild-endeavor
Copy link
Contributor

Thanks. let me know what you think. cosmicBboy#1

@wild-endeavor wild-endeavor merged commit cefbc80 into flyteorg:master Feb 6, 2021
@cosmicBboy cosmicBboy mentioned this pull request Feb 9, 2021
8 tasks
max-hoffman pushed a commit to dolthub/flytekit that referenced this pull request May 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants