-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add compatibility for pandas 2 #1184
Add compatibility for pandas 2 #1184
Conversation
tests/integration/fixtures.py
Outdated
if cudf: | ||
# cudf doesn't have support for timezoned datetime data | ||
df = datetime_table.copy() | ||
df["timezone"] = df["timezone"].dt.tz_localize(None) | ||
df["utc_timezone"] = df["utc_timezone"].dt.tz_localize(None) | ||
return cudf.from_pandas(df) | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before this change, we were actually overwriting the datetime_table
CPU fixture such that it didn't contain timezone-aware data.
Now that we're no longer doing that, a failure has been exposed in test_filter_cast_timestamp
because filtering a timezone-aware column by a timestamp would require non-UTC timezone handling for literals on the Python end:
dask-sql/dask_sql/physical/rex/core/literal.py
Lines 181 to 182 in 8991706
if timezone and timezone != "UTC": | |
raise ValueError("Non UTC timezones not supported") |
There are a few ways we could handle this, but all of them generally involve front-facing changes to timestamp literal handling so figure it might make sense to break that off into a separate PR.
Okay with xfailing that test for now to get this in? cc @ayushdg @jdye64
Codecov Report
❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more. @@ Coverage Diff @@
## main #1184 +/- ##
==========================================
+ Coverage 81.72% 81.95% +0.23%
==========================================
Files 78 78
Lines 4519 4539 +20
Branches 831 837 +6
==========================================
+ Hits 3693 3720 +27
+ Misses 643 633 -10
- Partials 183 186 +3
... and 5 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor suggestions but nothing blocking for the merge.
Looks like the upstream constraints on pandas in our CI environment have been removed such that we're now pulling in pandas 2, which is raising some new errors.
Noting that after unblocking these failures, we should probably explore a combination of upstream testing for pandas along with erroring on certain warnings in our Python tests, which should allow us to more gradually follow these breaking changes as their introduced (IIUC this is the approach that Dask/Distributed take).