-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-13605] Add support for pandas 1.4.0 #16590
Conversation
Almost done. Will tag you for final review when finished. I opened this PR against master b/c I was having trouble pushing to your fork/branch. As a result, the title is the same as your original PR. 😆 Will work on pushing to branch to consolidate. |
oh, rather than trying to push to my branch, how about we just go ahead and merge #16571? If you LGTM it I can merge it. |
I see a bunch of I also see doctest errors for the Edit: I think I know why. It's because we haven't updated the version of pandas to 1.4.0 in the precommit tasks / |
Yeah I think these are safe to ignore. Sometimes the GHA checks flake with lots of
I think your edit is correct, but that is by design. We will still verify with pandas <1.4.0 in the Python PreCommit, because we want the API to work with multiple minor versions of pandas. Ideally we will find a way to modify the implementation that is still compatible with pandas <1.4.0 |
Fixed the backwards compatibility issue, and it now passes Python 3.6 and 3.7 now. The PR is now in a state for your review and/or other additions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This looks good, I just have a few suggestions around value_counts
. I'll work on the grouby.apply issue this week and we can put these together.
Alright! Now that #16706 is merged I think this is good to go! Could you rebase or merge master? This is probably something we should mention in CHANGES.md for the 2.37.0 release too |
Just rebased! Also added two small commits before merge:
|
Codecov Report
@@ Coverage Diff @@
## master #16590 +/- ##
==========================================
+ Coverage 74.60% 83.64% +9.03%
==========================================
Files 654 452 -202
Lines 82029 62168 -19861
==========================================
- Hits 61201 52001 -9200
+ Misses 19841 10167 -9674
+ Partials 987 0 -987
Continue to review full report at Codecov.
|
Sounds good!
That's right, it won't be in 2.36 since the branch was already cut. I'll be cutting the 2.37 release branch next Wednesday though. It's fine to do it in the same PR. It looks like
We could consider just dropping support for pyarrow < 1.0, but technically the non-dataframe ParquetIO will still work with it. So I think it's better to just skip this test. |
Thanks for your help on this! It's great that we'll have this ready for 2.37.0 |
Could this have broken Py3.8 postcommits? See: https://ci-beam.apache.org/job/beam_PostCommit_Python38/2251 ? |
Yes this must be the culprit for those failures. It looks a version mismatch, it should be resolved by upgrading pandas to 1.4.0 in python 3.8 containers. |
This PR includes all changes necessary for the
pandas==1.4.0
update, except theapply()
operation and its dependenciesdrop_duplicates()
andnunique()
.Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
ValidatesRunner
compliance status (on master branch)Examples testing status on various runners
Post-Commit SDK/Transform Integration Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.