Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix sql alchemy plugin error module 'pandas.io' has no attribute 'common' #2249

Merged

Conversation

Future-Outlier
Copy link
Member

@Future-Outlier Future-Outlier commented Mar 8, 2024

Tracking issue

flyteorg/flyte#749

Why and What?

  1. why we restrict sql_alchemy plugin < = 2.1.4?
image if you use latest pandas version, you will find that there's a bug or interface is changed when running example. (not sure which is the answer yet.) image
  1. why we use from pandas.io.common import is_fsspec_url?
    from @pingsutw 's advice, we found that some of pandas module need to be loaded entirely.

How was this patch tested?

  1. Use a python example
from flytekit import kwtypes, task, workflow
from flytekit.types.schema import FlyteSchema
from flytekitplugins.sqlalchemy import SQLAlchemyConfig, SQLAlchemyTask

DATABASE_URI = "postgresql://reader:[email protected]:5432/pfmegrnargs"

# Here we define the schema of the expected output of the query, which we then re-use in the `get_mean_length` task.
DataSchema = FlyteSchema[kwtypes(sequence_length=int)]

sql_task = SQLAlchemyTask(
    "rna",
    query_template="""
        select len as sequence_length from rna
        where len >= {{ .inputs.min_length }}
        and len <= {{ .inputs.max_length }}
        limit {{ .inputs.limit }}
    """,
    inputs=kwtypes(min_length=int, max_length=int, limit=int),
    output_schema_type=DataSchema,
    task_config=SQLAlchemyConfig(uri=DATABASE_URI),
    container_image="localhost:30000/flytekit:sql",
)

@task
def get_mean_length(data: DataSchema) -> float:
    dataframe = data.open().all()
    return dataframe["sequence_length"].mean().item()

@workflow
def my_wf(min_length: int, max_length: int, limit: int) -> float:
    return get_mean_length(data=sql_task(min_length=min_length, max_length=max_length, limit=limit))


if __name__ == "__main__":
    print(f"Running {__file__} main...")
    print(my_wf(min_length=50, max_length=200, limit=5))
  1. build an image to test it

Setup process

Screenshots

image image

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Docs link

@dosubot dosubot bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Mar 8, 2024
@Future-Outlier Future-Outlier marked this pull request as draft March 8, 2024 04:02
Copy link

codecov bot commented Mar 8, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.44%. Comparing base (64b8468) to head (ea0be7d).
Report is 4 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2249      +/-   ##
==========================================
- Coverage   86.01%   83.44%   -2.57%     
==========================================
  Files         322      290      -32     
  Lines       24376    22956    -1420     
  Branches     3689     3479     -210     
==========================================
- Hits        20966    19155    -1811     
- Misses       2754     3180     +426     
+ Partials      656      621      -35     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@eapolinario
Copy link
Collaborator

Can you more details to the description? For example, what's the error? also, can you link to a github issue on the sqlalchemy project?

@Future-Outlier
Copy link
Member Author

Can you more details to the description? For example, what's the error? also, can you link to a github issue on the sqlalchemy project?

Yes, will update more information here.

@Future-Outlier Future-Outlier changed the title Fix sql alchemy plugin error with specifiying pandas version > 2.1.4 Fix sql alchemy plugin error by specifiying pandas version <= 2.1.4 Mar 8, 2024
@Future-Outlier
Copy link
Member Author

image

@Future-Outlier
Copy link
Member Author

It is super weird that I can't reproduce the error in the container's python env.

error in the container by command pyflyte run

from flytekit import kwtypes, task, workflow
from flytekit.types.schema import FlyteSchema
from flytekitplugins.sqlalchemy import SQLAlchemyConfig, SQLAlchemyTask

DATABASE_URI = "postgresql://reader:[email protected]:5432/pfmegrnargs"

# Here we define the schema of the expected output of the query, which we then re-use in the `get_mean_length` task.
DataSchema = FlyteSchema[kwtypes(sequence_length=int)]

sql_task = SQLAlchemyTask(
    "rna",
    query_template="""
        select len as sequence_length from rna
        where len >= {{ .inputs.min_length }}
        and len <= {{ .inputs.max_length }}
        limit {{ .inputs.limit }}
    """,
    inputs=kwtypes(min_length=int, max_length=int, limit=int),
    output_schema_type=DataSchema,
    task_config=SQLAlchemyConfig(uri=DATABASE_URI),
    container_image="localhost:30000/flytekit:sql",
)

@task
def get_mean_length(data: DataSchema) -> float:
    dataframe = data.open().all()
    return dataframe["sequence_length"].mean().item()

@workflow
def my_wf(min_length: int, max_length: int, limit: int) -> float:
    return get_mean_length(data=sql_task(min_length=min_length, max_length=max_length, limit=limit))


if __name__ == "__main__":
    print(f"Running {__file__} main...")
    print(my_wf(min_length=50, max_length=200, limit=5))

example reference: https://docs.flyte.org/en/latest/flytesnacks/examples/sql_plugin/sql_alchemy.html#sql-alchemy

image

No error in the container (python terminal)

image

@Future-Outlier Future-Outlier changed the title Fix sql alchemy plugin error by specifiying pandas version <= 2.1.4 [WIP] Fix sql alchemy plugin error by specifiying pandas version <= 2.1.4 Mar 8, 2024
@Future-Outlier Future-Outlier changed the title [WIP] Fix sql alchemy plugin error by specifiying pandas version <= 2.1.4 [WIP] Fix sql alchemy plugin error module 'pandas.io' has no attribute 'common' Mar 8, 2024
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
@Future-Outlier Future-Outlier marked this pull request as ready for review March 10, 2024 08:46
@dosubot dosubot bot added size:XS This PR changes 0-9 lines, ignoring generated files. and removed size:S This PR changes 10-29 lines, ignoring generated files. labels Mar 10, 2024
Signed-off-by: Future-Outlier <[email protected]>
@dosubot dosubot bot added the lgtm This PR has been approved by maintainer label Mar 10, 2024
@Future-Outlier Future-Outlier changed the title [WIP] Fix sql alchemy plugin error module 'pandas.io' has no attribute 'common' Fix sql alchemy plugin error module 'pandas.io' has no attribute 'common' Mar 10, 2024
@pingsutw pingsutw merged commit d5bc878 into master Mar 10, 2024
43 of 44 checks passed
austin362667 pushed a commit to austin362667/flytekit that referenced this pull request Mar 16, 2024
fiedlerNr9 pushed a commit that referenced this pull request Jul 25, 2024
…mmon' ` (#2249)

Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm This PR has been approved by maintainer size:XS This PR changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants