Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas import error when merging tables #2112

Closed
trhaarman opened this issue Jan 23, 2024 · 2 comments · Fixed by #2116
Closed

Pandas import error when merging tables #2112

trhaarman opened this issue Jan 23, 2024 · 2 comments · Fixed by #2116
Labels
binding/python Issues for the Python package bug Something isn't working

Comments

@trhaarman
Copy link

Environment

Local

Delta-rs version:
v0.15.1

Binding:
Python

Environment:

  • OS: Windows

Bug

What happened:
Trying to get the merging example from the docs, but using pandas instead of pyarrow for the tabkle. This gives a NameError that pandas is not defined, as in the table.py pandas is only imported when type checking:

elif isinstance(source, pandas.DataFrame):

As per my limited understanding of this construct, pandas is only imported for static type checking. But seeing as it is actually used in the code, should it be imported normally?

What you expected to happen:
That the pandas dataframe gets merged in the existing delta table :).

How to reproduce it:
Using the example from the docs, but with a pandas DataFrame instead of a pyarrow table:

from deltalake import DeltaTable, write_deltalake
import pandas as pd

data = pd.DataFrame({"x": [1, 2, 3], "y": [4, 5, 6]})
write_deltalake("tmp", data)
dt = DeltaTable("tmp")
new_data = pd.DataFrame({"x": [1], "y": [7]})

(
     dt.merge(
         source=new_data,
         predicate="target.x = source.x",
         source_alias="source",
         target_alias="target")
     .when_matched_update(updates={"x": "source.x", "y": "source.y"})
     .execute()
)

dt.to_pandas()

results in:
NameError: name 'pandas' is not defined

More details:
I'm not very familiar with static type checking or this code, so please let me know if I did something wrong and this is not a bug!

@trhaarman trhaarman added the bug Something isn't working label Jan 23, 2024
@ion-elgreco
Copy link
Collaborator

ion-elgreco commented Jan 23, 2024

Yeah, the top-level import is missing apparently so if you import pandas without alias it works, but this needs to be fixed by adding the same lines as in the writer and then if _has_pandas and isinstance(df, pl.Dataframe):

try:
    import pandas as pd  # noqa: F811
except ModuleNotFoundError:
    _has_pandas = False
else:
    _has_pandas = True

If you want, you can open a PR for it :)

@trhaarman
Copy link
Author

Alright, thanks! I'll have a quick look at it tonight.

@ion-elgreco ion-elgreco added the binding/python Issues for the Python package label Jan 25, 2024
ion-elgreco added a commit that referenced this issue Jan 27, 2024
# Description
Pandas was only imported for static type checking, but was actually used
in the code of the module leading to import errors. Added the import
similarly to `writer.py`, and updated the pandas alias in the type hints
from 'pandas' to 'pd'.

# Related Issue(s)
- closes #2112 

# Documentation
x

Co-authored-by: Tim Haarman <[email protected]>
Co-authored-by: Ion Koutsouris <[email protected]>
RobinLin666 pushed a commit to RobinLin666/delta-rs that referenced this issue Feb 2, 2024
# Description
Pandas was only imported for static type checking, but was actually used
in the code of the module leading to import errors. Added the import
similarly to `writer.py`, and updated the pandas alias in the type hints
from 'pandas' to 'pd'.

# Related Issue(s)
- closes delta-io#2112 

# Documentation
x

Co-authored-by: Tim Haarman <[email protected]>
Co-authored-by: Ion Koutsouris <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants