COMPAT: Pypy tweaks #17351

mattip · 2017-08-27T20:19:04Z

a set of tweaks that I discovered when getting PyPY to pass tests, some could cause issues on CPython as well:

496cc3a clears up an assumption that sets are sorted
d161b08 makes sure PyList_GET_SIZE is used only on PyListObject, if used on an ndarray it will return the value of the data pointer, which is not the intention at all
2744c5c separates the cases where float('nan') is not np.nan and like cases by implementation, on PyPy the is comparison always checks the value of struct.pack('d', val) for equality.
5425137 cleans up the use of rank and mean as keys (the warnings are ignored?) and adds a trivial assert where a test was simply making a call

I left these as seperate commits for ease of discussion and possible rejection, if desired I can squash them to a single commit

chris-b1 · 2017-08-28T21:26:53Z

pandas/tests/groupby/test_groupby.py

@@ -2031,12 +2031,12 @@ def test_mutate_groups(self):

        def f_copy(x):
            x = x.copy()
-            x['rank'] = x.val.rank(method='min')


These warnings will get fixed by #17298

Good, I will remove 5425137 from the pull request, it wasn't really complete

chris-b1 · 2017-08-28T21:30:00Z

Just for my understanding on 2744c5c , you're saying this would be true on PyPy?

>>> a = 1.2
>>> b = 1.2
>>> a is b
False

chris-b1 · 2017-08-28T21:31:20Z

pandas/_libs/parsers.pyx

@@ -454,7 +454,8 @@ cdef class TextReader:
            if callable(usecols):
                self.usecols = usecols
            else:
-                self.usecols = set(usecols)
+                self.usecols = list(set(usecols))
+                self.usecols.sort()


Is there an existing test that highlights where this matters?

sorry, this commit is bogus. I'm not sure why but somehow this changes the dtype in one of the subtests in TestCParserHighMemory.test_usecols_with_parse_dates_and_usecol_names on PyPy, col 'a' gets the correct datetime64[ns] when usercols is [0, 2, 3] but object dtype when usercols is [3, 2, 0]. Something else is going on, I will back this out and look for the real culprit.

mattip · 2017-08-28T21:58:37Z

@chris-b1 about is-ness, yes that is True

>>>> a = float('nan')
>>>> b = float('nan')
>>>> a is b
True
>>>> a=1.2
>>>> b=1.2
>>>> a is b
True

on cpython both are False

mattip · 2017-08-28T23:55:03Z

pandas/io/parsers.py

@@ -1714,6 +1714,7 @@ def _set_noconvert_columns(self):
            # A set of integers will be converted to a list in
            # the correct order every single time.
            usecols = list(self.usecols)
+            usecols.sort()


self.usecols is a set of the original usecols. On PyPy sets are not sorted so "every single time" requires a sort.

>>>> list(set([3, 0, 2])) [3, 0, 2]

Running tests with this line replaces by ``usecols.reverse() will show the dependency on the list being sorted:

no averse to this, but why does this matter?

can you add a test that exercises the sortedness?

there are tests, but they all pass on CPython since sets of ints are always sorted, ie pandas/tests/io/parsers/usecol.py, lines 199, 270, 296 which all failed on PyPy before this change. Not sure how I could construct a failing test for CPython

codecov · 2017-08-29T12:17:36Z

Codecov Report

Merging #17351 into master will decrease coverage by 0.04%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #17351      +/-   ##
==========================================
- Coverage   91.03%   90.99%   -0.05%     
==========================================
  Files         162      162              
  Lines       49567    49568       +1     
==========================================
- Hits        45125    45105      -20     
- Misses       4442     4463      +21

Flag	Coverage Δ
#multiple	`88.77% <100%> (-0.03%)`	⬇️
#single	`40.24% <0%> (-0.07%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/parsers.py	`95.46% <100%> (ø)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/plotting/_converter.py	`63.23% <0%> (-1.82%)`	⬇️
pandas/core/frame.py	`97.72% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6bab9d1...b1c79c4. Read the comment docs.

codecov · 2017-08-29T12:17:37Z

Codecov Report

Merging #17351 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #17351      +/-   ##
==========================================
- Coverage   91.15%   91.14%   -0.02%     
==========================================
  Files         163      163              
  Lines       49587    49588       +1     
==========================================
- Hits        45203    45195       -8     
- Misses       4384     4393       +9

Flag	Coverage Δ
#multiple	`88.92% <ø> (ø)`	⬆️
#single	`40.25% <ø> (-0.07%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/parsers.py	`95.46% <ø> (ø)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.72% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d457791...bb7ef63. Read the comment docs.

jreback · 2017-08-29T13:00:43Z

doc/source/whatsnew/v0.21.0.txt

@@ -332,6 +332,9 @@ Performance Improvements
 Bug Fixes
 ~~~~~~~~~

+- Compatibility with PyPy increased by removing the assumption that sets are sorted in ``usecols``


why don;t you make a separate bug fix sub-section for pypy fixes (IIRC a couple of more in here)

you also don't need to be as detailed, give these from a user perspective, rather just refer to the issue (or PR in this case)

jreback · 2017-08-29T13:02:48Z

pandas/io/parsers.py

@@ -1714,6 +1714,7 @@ def _set_noconvert_columns(self):
            # A set of integers will be converted to a list in
            # the correct order every single time.
            usecols = list(self.usecols)
+            usecols.sort()


no averse to this, but why does this matter?

jreback · 2017-08-29T13:03:25Z

pandas/io/parsers.py

@@ -1714,6 +1714,7 @@ def _set_noconvert_columns(self):
            # A set of integers will be converted to a list in
            # the correct order every single time.
            usecols = list(self.usecols)
+            usecols.sort()


can you add a test that exercises the sortedness?

pep8speaks · 2017-08-31T21:00:16Z

Hello @mattip! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on September 06, 2017 at 18:50 Hours UTC

mattip · 2017-09-05T15:00:37Z

@jreback ping, test committed in 2f653f1

TomAugspurger · 2017-09-05T23:52:17Z

pandas/tests/io/parser/test_parsers.py

@@ -3,8 +3,10 @@
 import os
 import pandas.util.testing as tm

-from pandas import read_csv, read_table
+from pandas import read_csv, read_table, DataFrame, Index


Looks like the Index import isn't used: https://travis-ci.org/pandas-dev/pandas/jobs/270584864#L1803

jreback · 2017-09-07T11:55:07Z

pandas/tests/io/parser/test_parsers.py

+
+
+class TestUnsortedUsecols(object):
+    def test_override__set_noconvert_columns(self):


hah you are basically mocking the parser here....ok. I didn't mean go that far, but this is ok.

jreback · 2017-09-07T11:56:46Z

thanks @mattip always happy to have patches from pypy!

jreback · 2017-09-07T11:57:24Z

I don't think this had any issues to close?

gfyoung added the Compat pandas objects compatability with Numpy or Python functions label Aug 27, 2017

chris-b1 reviewed Aug 28, 2017

View reviewed changes

mattip force-pushed the pypy-tweaks branch 2 times, most recently from 6baebb2 to fa5ba91 Compare August 28, 2017 23:48

mattip commented Aug 28, 2017

View reviewed changes

mattip force-pushed the pypy-tweaks branch from fa5ba91 to b1c79c4 Compare August 29, 2017 12:17

jreback reviewed Aug 29, 2017

View reviewed changes

mattip added 4 commits August 29, 2017 16:22

BUG, COMPAT: list(set(self.usecols)) may not be sorted

0e4b763

BUG, COMPAT: PyList_GET_SIZE works only by chance on ndarray

e01dd48

COMPAT: when "nan is nan" is not consistent across implementations

7003f2d

document the changes

2886a0e

mattip force-pushed the pypy-tweaks branch from b1c79c4 to 2886a0e Compare August 29, 2017 14:30

mattip force-pushed the pypy-tweaks branch from 8f54770 to 2f653f1 Compare August 31, 2017 21:02

TomAugspurger reviewed Sep 5, 2017

View reviewed changes

add test that fails before sorting usecols

bb7ef63

mattip force-pushed the pypy-tweaks branch from 2f653f1 to bb7ef63 Compare September 6, 2017 18:50

jreback reviewed Sep 7, 2017

View reviewed changes

jreback added this to the 0.21.0 milestone Sep 7, 2017

jreback merged commit ee6185e into pandas-dev:master Sep 7, 2017

jbrockmendel pushed a commit to jbrockmendel/pandas that referenced this pull request Sep 10, 2017

COMPAT: Pypy tweaks (pandas-dev#17351)

aafa941

alanbato pushed a commit to alanbato/pandas that referenced this pull request Nov 10, 2017

COMPAT: Pypy tweaks (pandas-dev#17351)

c142a61

No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017

COMPAT: Pypy tweaks (pandas-dev#17351)

655cace

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

COMPAT: Pypy tweaks #17351

COMPAT: Pypy tweaks #17351

mattip commented Aug 27, 2017

chris-b1 Aug 28, 2017

mattip Aug 28, 2017

chris-b1 commented Aug 28, 2017

chris-b1 Aug 28, 2017

mattip Aug 28, 2017

mattip commented Aug 28, 2017

mattip Aug 28, 2017

jreback Aug 29, 2017

jreback Aug 29, 2017

mattip Aug 29, 2017

codecov bot commented Aug 29, 2017

codecov bot commented Aug 29, 2017 •

edited

Loading

jreback Aug 29, 2017

jreback Aug 29, 2017

jreback Aug 29, 2017

jreback Aug 29, 2017

pep8speaks commented Aug 31, 2017 •

edited

Loading

mattip commented Sep 5, 2017

TomAugspurger Sep 5, 2017

jreback Sep 7, 2017

jreback commented Sep 7, 2017

jreback commented Sep 7, 2017



		class TestUnsortedUsecols(object):
		def test_override__set_noconvert_columns(self):

COMPAT: Pypy tweaks #17351

COMPAT: Pypy tweaks #17351

Conversation

mattip commented Aug 27, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chris-b1 commented Aug 28, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattip commented Aug 28, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Aug 29, 2017

Codecov Report

codecov bot commented Aug 29, 2017 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pep8speaks commented Aug 31, 2017 • edited Loading

Comment last updated on September 06, 2017 at 18:50 Hours UTC

mattip commented Sep 5, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Sep 7, 2017

jreback commented Sep 7, 2017

codecov bot commented Aug 29, 2017 •

edited

Loading

pep8speaks commented Aug 31, 2017 •

edited

Loading