-
Notifications
You must be signed in to change notification settings - Fork 14k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6 #21002
chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6 #21002
Conversation
Codecov Report
@@ Coverage Diff @@
## master #21002 +/- ##
==========================================
- Coverage 66.34% 66.25% -0.10%
==========================================
Files 1767 1770 +3
Lines 67312 67526 +214
Branches 7144 7182 +38
==========================================
+ Hits 44656 44737 +81
- Misses 20828 20953 +125
- Partials 1828 1836 +8
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
@hughhhh @betodealmeida Can you review it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks awesome! But I'm very concerned with how in the unit tests some of the NaNs are now being returned as zeros, since it would lead to wrong results. Any idea why that is happening here?
How should I fix this test? |
Taking another look, I guess 0 makes sense from a contribution point of view. It should be fine in this case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took another look and have a few questions.
@betodealmeida @villebro Can you review again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for all the iterations!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks @EugeneTorap and @villebro
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! Thanks for the work, @EugeneTorap!
def get_example_url(filepath: str) -> str: | ||
return f"{BASE_URL}{filepath}?raw=true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
@@ -49,6 +49,9 @@ def contribution( | |||
""" | |||
contribution_df = df.copy() | |||
numeric_df = contribution_df.select_dtypes(include=["number", Decimal]) | |||
# TODO: copy needed due to following regression in 1.4, remove if not needed: | |||
# https://github.com/pandas-dev/pandas/issues/48090 | |||
numeric_df = numeric_df.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
Nice work! Going to test this out very soon. I know that there used to be the problem of and empty result set from SQLalchemy causing an Exception in pandas when using PyArrow 6.0 and higher, leading to unfriendly error messages in Explore (and charts on dashboards) instead of the friendly "No data" message. |
Fix #19986 issue when a user tries to install superset using Python 3.10 because pyarrow 5.0.0 doesn't have a wheel for Python 3.10
SUMMARY
In order to use Python 3.10 in superset we need to bump PyArrow (from 5.0.0 to 6.0.1)
Also bump Pandas to latest minor (from 1.3.4 to 1.4.3).
Pandas 1.4 added a wheel for Python 3.9, Apple Silicon
Pandas 1.4 introduced support for using pyarrow as an engine for reading CSVs, which brings performance improvements (see https://pandas.pydata.org/docs/whatsnew/v1.4.0.html#multi-threaded-csv-reading-with-a-new-csv-engine-based-on-pyarrow for details). Therefore engine="pyarrow" has been added everywhere we're calling pd.read_csv.
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
TESTING INSTRUCTIONS
ADDITIONAL INFORMATION