-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
COMPAT: Pypy tweaks #17351
COMPAT: Pypy tweaks #17351
Conversation
pandas/tests/groupby/test_groupby.py
Outdated
@@ -2031,12 +2031,12 @@ def test_mutate_groups(self): | |||
|
|||
def f_copy(x): | |||
x = x.copy() | |||
x['rank'] = x.val.rank(method='min') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These warnings will get fixed by #17298
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good, I will remove 5425137 from the pull request, it wasn't really complete
Just for my understanding on 2744c5c , you're saying this would be true on PyPy? >>> a = 1.2
>>> b = 1.2
>>> a is b
False |
pandas/_libs/parsers.pyx
Outdated
@@ -454,7 +454,8 @@ cdef class TextReader: | |||
if callable(usecols): | |||
self.usecols = usecols | |||
else: | |||
self.usecols = set(usecols) | |||
self.usecols = list(set(usecols)) | |||
self.usecols.sort() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an existing test that highlights where this matters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, this commit is bogus. I'm not sure why but somehow this changes the dtype in one of the subtests in TestCParserHighMemory.test_usecols_with_parse_dates_and_usecol_names
on PyPy, col 'a' gets the correct datetime64[ns]
when usercols is [0, 2, 3] but object
dtype when usercols is [3, 2, 0]. Something else is going on, I will back this out and look for the real culprit.
@chris-b1 about is-ness, yes that is True
on cpython both are False |
6baebb2
to
fa5ba91
Compare
@@ -1714,6 +1714,7 @@ def _set_noconvert_columns(self): | |||
# A set of integers will be converted to a list in | |||
# the correct order every single time. | |||
usecols = list(self.usecols) | |||
usecols.sort() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.usecols is a set
of the original usecols. On PyPy set
s are not sorted so "every single time" requires a sort.
>>>> list(set([3, 0, 2]))
[3, 0, 2]
Running tests with this line replaces by ``usecols.reverse() will show the dependency on the list being sorted:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no averse to this, but why does this matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a test that exercises the sortedness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are tests, but they all pass on CPython since sets of ints are always sorted, ie pandas/tests/io/parsers/usecol.py, lines 199, 270, 296 which all failed on PyPy before this change. Not sure how I could construct a failing test for CPython
Codecov Report
@@ Coverage Diff @@
## master #17351 +/- ##
==========================================
- Coverage 91.03% 90.99% -0.05%
==========================================
Files 162 162
Lines 49567 49568 +1
==========================================
- Hits 45125 45105 -20
- Misses 4442 4463 +21
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #17351 +/- ##
==========================================
- Coverage 91.15% 91.14% -0.02%
==========================================
Files 163 163
Lines 49587 49588 +1
==========================================
- Hits 45203 45195 -8
- Misses 4384 4393 +9
Continue to review full report at Codecov.
|
doc/source/whatsnew/v0.21.0.txt
Outdated
@@ -332,6 +332,9 @@ Performance Improvements | |||
Bug Fixes | |||
~~~~~~~~~ | |||
|
|||
- Compatibility with PyPy increased by removing the assumption that sets are sorted in ``usecols`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don;t you make a separate bug fix sub-section for pypy fixes (IIRC a couple of more in here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you also don't need to be as detailed, give these from a user perspective, rather just refer to the issue (or PR in this case)
@@ -1714,6 +1714,7 @@ def _set_noconvert_columns(self): | |||
# A set of integers will be converted to a list in | |||
# the correct order every single time. | |||
usecols = list(self.usecols) | |||
usecols.sort() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no averse to this, but why does this matter?
@@ -1714,6 +1714,7 @@ def _set_noconvert_columns(self): | |||
# A set of integers will be converted to a list in | |||
# the correct order every single time. | |||
usecols = list(self.usecols) | |||
usecols.sort() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a test that exercises the sortedness?
Hello @mattip! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on September 06, 2017 at 18:50 Hours UTC |
@@ -3,8 +3,10 @@ | |||
import os | |||
import pandas.util.testing as tm | |||
|
|||
from pandas import read_csv, read_table | |||
from pandas import read_csv, read_table, DataFrame, Index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the Index
import isn't used: https://travis-ci.org/pandas-dev/pandas/jobs/270584864#L1803
|
||
|
||
class TestUnsortedUsecols(object): | ||
def test_override__set_noconvert_columns(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hah you are basically mocking the parser here....ok. I didn't mean go that far, but this is ok.
thanks @mattip always happy to have patches from pypy! |
I don't think this had any issues to close? |
a set of tweaks that I discovered when getting PyPY to pass tests, some could cause issues on CPython as well:
float('nan') is not np.nan
and like cases by implementation, on PyPy theis
comparison always checks the value of struct.pack('d', val) for equality.rank
andmean
as keys (the warnings are ignored?) and adds a trivial assert where a test was simply making a callI left these as seperate commits for ease of discussion and possible rejection, if desired I can squash them to a single commit