-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to pandas 2.1.2 #108
Conversation
…ard for older versions of python
…that the immutable option does not become a problem
Did you file an issue with this to Pandas? Seems bad that this should happen with just a patch version increase (or at all really pandas shouldn't be doing anything in place unless you explicitly ask any more). Note that I didn't actually look at the code for why it failed I am just commenting on your comment here.
|
Thanks, good point! I did some debugging. The following raises an error (in pandas 2.1.2, prior to this PR): import pandas as pd
from strictly_typed_pandas import DataSet
class B:
id: int
age: int
df1 = pd.DataFrame(
dict(
id=[1, 2, 3],
name=['a', 'b', 'c'],
)
)
df2 = DataSet[B](
pd.DataFrame(
dict(
id=[1, 2, 3],
age=[10, 20, 30],
),
).set_index('age')
)
df3 = df1.merge(df2, on='id') Error:
Let's reproduce that without stp. And let's see if there are any side-effects (due to in-place modifications) in import pandas as pd
df1 = pd.DataFrame(
dict(
id=[1, 2, 3],
name=['a', 'b', 'c'],
)
)
df2 = pd.DataFrame(
dict(
id=[1, 2, 3],
age=[10, 20, 30],
),
).set_index('age')
df3 = df1.merge(df2, on='id')
df2 Appears like there are no side-effects. So I guess pandas does an inplace modification of a copy of the Anyhow, it seems like we got a workaround. I'll merge this now! |
* Update to pandas 2.1.2 * Update to pandas 2.1.2 * import from typing_extensions * remove args and kwargs type annotations, they do not work with typeguard for older versions of python * convert dataset to dataframe when calling a dataframe function, such that the immutable option does not become a problem
Main objective of this PR is to update the pandas requirement to 2.1.2.
Unfortunately, the CI failed initially. It seems that some functions within pandas 2.1.2 use in-place modification. When called on a
DataSet
, it raises an error. For more details, see for example this run.The solution I take here is to automatically convert a
DataSet
toDataFrame
when calling aDataFrame
function, such that the immutable option does not become a problem. Shouldn't be a problem, since allDataFrame
functions called on aDataSet
would convert theDataSet
to aDataFrame
anyway. Now we just do it upfront.