-
Notifications
You must be signed in to change notification settings - Fork 674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] [flytekit] StructuredDataset handling fails when using Azure Blob Storage #2709
Comments
Unfortunately, quite a few of our workflows are blocked by this/unable to upgrade to a newer version of As this is a quite time-sensitive issue to us, we're willing to provide a patch ourselves if that'd help. Skimming through the flytekit I assume we might want to wait for flyteorg/flytekit#1107 to be merged/base our changes on top of that so the default protocol handling works for ABS as well. |
@MorpheusXAUT I'm trying to fix it ASAP. |
Yes, you are right. You can help us add @MorpheusXAUT To workaround the error, could you try to add below code to your workflow script and run again? from flytekit.types.structured import basic_dfs
from flytekit.types.structured.structured_dataset import StructuredDatasetTransformerEngine
StructuredDatasetTransformerEngine.register(basic_dfs.PandasToParquetEncodingHandler("abfs"), default_for_type=True, override=True)
StructuredDatasetTransformerEngine.register(basic_dfs.ParquetToPandasDecodingHandler("abfs"), default_for_type=True, override=True) |
@pingsutw Thanks for looking into this so quickly, appreciated!
Sure thing, will do! I'll open a PR once flyteorg/flytekit#1107 has been merged 🙂
👍 Works for our regression test. I'll pass that along as a workaround for our other workflows that are failing, thanks! |
@pingsutw got a fix ready (based on flyteorg/flytekit#1107) that solves this according to our internal tests. Would you prefer me to create a PR already that's based on flyteorg/flytekit#1107 or wait for the other PR to be merged before submitting it? |
@MorpheusXAUT Could you wait for the other PR to be merged before submitting it. will ping you once we merge it. |
merging the other pr soon. |
merged |
Describe the bug
The
StructuredDataset
implementation enabled by default in flyteorg/flytekit#885 lacks support for Azure Blob Storage, resulting in an error when trying to handle e.g.pd.DataFrame
while using Azure backed storage.The workflows in question use
flytekit v1.0.5
.Expected behavior
StructuredDatasets/pd.DataFrames
are handled correctly on Azure/using theabfs
protocol/adlfs
(viastow
).Looking at #2684, users should also not have to supply the protocol themselves using ABS, however I assume that's covered by the suggested changes of the other issue as well.
Additional context to reproduce
pd.DataFrames
usingflytekit v1.0.5
Example regression test we've added to our Flyte test suite, covering our common use cases:
All three tasks above will fail with the mentioned error message.
Screenshots
No response
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?
The text was updated successfully, but these errors were encountered: