Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COMPAT: Pypy tweaks #17351

Merged
merged 5 commits into from
Sep 7, 2017
Merged

COMPAT: Pypy tweaks #17351

merged 5 commits into from
Sep 7, 2017

Conversation

mattip
Copy link
Contributor

@mattip mattip commented Aug 27, 2017

a set of tweaks that I discovered when getting PyPY to pass tests, some could cause issues on CPython as well:

  • 496cc3a clears up an assumption that sets are sorted
  • d161b08 makes sure PyList_GET_SIZE is used only on PyListObject, if used on an ndarray it will return the value of the data pointer, which is not the intention at all
  • 2744c5c separates the cases where float('nan') is not np.nan and like cases by implementation, on PyPy the is comparison always checks the value of struct.pack('d', val) for equality.
  • 5425137 cleans up the use of rank and mean as keys (the warnings are ignored?) and adds a trivial assert where a test was simply making a call

I left these as seperate commits for ease of discussion and possible rejection, if desired I can squash them to a single commit

@gfyoung gfyoung added the Compat pandas objects compatability with Numpy or Python functions label Aug 27, 2017
@@ -2031,12 +2031,12 @@ def test_mutate_groups(self):

def f_copy(x):
x = x.copy()
x['rank'] = x.val.rank(method='min')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These warnings will get fixed by #17298

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good, I will remove 5425137 from the pull request, it wasn't really complete

@chris-b1
Copy link
Contributor

Just for my understanding on 2744c5c , you're saying this would be true on PyPy?

>>> a = 1.2
>>> b = 1.2
>>> a is b
False

@@ -454,7 +454,8 @@ cdef class TextReader:
if callable(usecols):
self.usecols = usecols
else:
self.usecols = set(usecols)
self.usecols = list(set(usecols))
self.usecols.sort()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an existing test that highlights where this matters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, this commit is bogus. I'm not sure why but somehow this changes the dtype in one of the subtests in TestCParserHighMemory.test_usecols_with_parse_dates_and_usecol_names on PyPy, col 'a' gets the correct datetime64[ns] when usercols is [0, 2, 3] but object dtype when usercols is [3, 2, 0]. Something else is going on, I will back this out and look for the real culprit.

@mattip
Copy link
Contributor Author

mattip commented Aug 28, 2017

@chris-b1 about is-ness, yes that is True

>>>> a = float('nan')
>>>> b = float('nan')
>>>> a is b
True
>>>> a=1.2
>>>> b=1.2
>>>> a is b
True

on cpython both are False

@mattip mattip force-pushed the pypy-tweaks branch 2 times, most recently from 6baebb2 to fa5ba91 Compare August 28, 2017 23:48
@@ -1714,6 +1714,7 @@ def _set_noconvert_columns(self):
# A set of integers will be converted to a list in
# the correct order every single time.
usecols = list(self.usecols)
usecols.sort()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.usecols is a set of the original usecols. On PyPy sets are not sorted so "every single time" requires a sort.

>>>> list(set([3, 0, 2]))
[3, 0, 2]

Running tests with this line replaces by ``usecols.reverse() will show the dependency on the list being sorted:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no averse to this, but why does this matter?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a test that exercises the sortedness?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are tests, but they all pass on CPython since sets of ints are always sorted, ie pandas/tests/io/parsers/usecol.py, lines 199, 270, 296 which all failed on PyPy before this change. Not sure how I could construct a failing test for CPython

@codecov
Copy link

codecov bot commented Aug 29, 2017

Codecov Report

Merging #17351 into master will decrease coverage by 0.04%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17351      +/-   ##
==========================================
- Coverage   91.03%   90.99%   -0.05%     
==========================================
  Files         162      162              
  Lines       49567    49568       +1     
==========================================
- Hits        45125    45105      -20     
- Misses       4442     4463      +21
Flag Coverage Δ
#multiple 88.77% <100%> (-0.03%) ⬇️
#single 40.24% <0%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/io/parsers.py 95.46% <100%> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/plotting/_converter.py 63.23% <0%> (-1.82%) ⬇️
pandas/core/frame.py 97.72% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6bab9d1...b1c79c4. Read the comment docs.

@codecov
Copy link

codecov bot commented Aug 29, 2017

Codecov Report

Merging #17351 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17351      +/-   ##
==========================================
- Coverage   91.15%   91.14%   -0.02%     
==========================================
  Files         163      163              
  Lines       49587    49588       +1     
==========================================
- Hits        45203    45195       -8     
- Misses       4384     4393       +9
Flag Coverage Δ
#multiple 88.92% <ø> (ø) ⬆️
#single 40.25% <ø> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/io/parsers.py 95.46% <ø> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.72% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d457791...bb7ef63. Read the comment docs.

@@ -332,6 +332,9 @@ Performance Improvements
Bug Fixes
~~~~~~~~~

- Compatibility with PyPy increased by removing the assumption that sets are sorted in ``usecols``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don;t you make a separate bug fix sub-section for pypy fixes (IIRC a couple of more in here)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you also don't need to be as detailed, give these from a user perspective, rather just refer to the issue (or PR in this case)

@@ -1714,6 +1714,7 @@ def _set_noconvert_columns(self):
# A set of integers will be converted to a list in
# the correct order every single time.
usecols = list(self.usecols)
usecols.sort()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no averse to this, but why does this matter?

@@ -1714,6 +1714,7 @@ def _set_noconvert_columns(self):
# A set of integers will be converted to a list in
# the correct order every single time.
usecols = list(self.usecols)
usecols.sort()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a test that exercises the sortedness?

@pep8speaks
Copy link

pep8speaks commented Aug 31, 2017

Hello @mattip! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on September 06, 2017 at 18:50 Hours UTC

@mattip
Copy link
Contributor Author

mattip commented Sep 5, 2017

@jreback ping, test committed in 2f653f1

@@ -3,8 +3,10 @@
import os
import pandas.util.testing as tm

from pandas import read_csv, read_table
from pandas import read_csv, read_table, DataFrame, Index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the Index import isn't used: https://travis-ci.org/pandas-dev/pandas/jobs/270584864#L1803



class TestUnsortedUsecols(object):
def test_override__set_noconvert_columns(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hah you are basically mocking the parser here....ok. I didn't mean go that far, but this is ok.

@jreback jreback added this to the 0.21.0 milestone Sep 7, 2017
@jreback jreback merged commit ee6185e into pandas-dev:master Sep 7, 2017
@jreback
Copy link
Contributor

jreback commented Sep 7, 2017

thanks @mattip always happy to have patches from pypy!

@jreback
Copy link
Contributor

jreback commented Sep 7, 2017

I don't think this had any issues to close?

jbrockmendel pushed a commit to jbrockmendel/pandas that referenced this pull request Sep 10, 2017
alanbato pushed a commit to alanbato/pandas that referenced this pull request Nov 10, 2017
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants