Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pandas 0.13.0 tests fail against numpy 1.8.x #5997

Closed
juliantaylor opened this issue Jan 18, 2014 · 41 comments
Closed

pandas 0.13.0 tests fail against numpy 1.8.x #5997

juliantaylor opened this issue Jan 18, 2014 · 41 comments
Labels
Testing pandas testing functions or related to the test suite
Milestone

Comments

@juliantaylor
Copy link

several test errors of this type when run against the maintenance/1.8.x branch of numpy.

ERROR: test_qcut_specify_quantiles (pandas.tools.tests.test_tile.TestCut)
  File "hashtable.pyx", line 776, in pandas.hashtable.PyObjectHashTable.unique (pandas/hashtable.c:12071)
TypeError: unhashable type: 'numpy.ndarray'

FAIL: test_series_describe_multikey (pandas.tests.test_groupby.TestGroupBy)
    assert a == b, "%s: %r != %r" % (msg.format(a,b), a, b)
AssertionError: attr is not equal [dtype]: dtype('O') != dtype('float64')

it seems you have coded a regression of 1.8 into the tests, it is intended to fix this in numpy 1.8.1 but now that pandas relies on it makes the situation a little tricky.

see numpy/numpy#4109

How bad would it be for pandas if we release 1.8.1 with the regression fixed?

@jreback
Copy link
Contributor

jreback commented Jan 18, 2014

we test again 1.8; these are failing because of changes in 1.8 (that will actually appear in 1.8.1)?

@seberg
Copy link
Contributor

seberg commented Jan 18, 2014

I just realized, this thing also fixed what I think is a bug with structured types. Since pandas makes a lot of use for that, this part is likely what breaks pandas and the regression fix is OK. So I think we can undo the structured type fix part for 1.8.x, but it would be nice to have input/know that you are aware that there should be a change for 1.9.x in numpy here.

I will have a look at it later.

@juliantaylor
Copy link
Author

pandas tests work with numpy 1.7.2, so there is probably something additional in that changeset that breaks it.

@seberg
Copy link
Contributor

seberg commented Jan 18, 2014

Can you try if this fixes it:

--- a/doc/sphinxext
+++ b/doc/sphinxext
@@ -1 +1 @@
-Subproject commit 447dd0b59c2fe91ca9643701036d3d04919ddc7e
+Subproject commit 447dd0b59c2fe91ca9643701036d3d04919ddc7e-dirty
diff --git a/numpy/core/src/multiarray/mapping.c b/numpy/core/src/multiarray/map
index cf03f83..d83686f 100644
--- a/numpy/core/src/multiarray/mapping.c
+++ b/numpy/core/src/multiarray/mapping.c
@@ -1129,7 +1129,7 @@ array_subscript_fromobject(PyArrayObject *self, PyObject *
                 return NULL;
             }
             PyArray_ENABLEFLAGS((PyArrayObject*)obj, NPY_ARRAY_WARN_ON_WRITE);
-            return obj;
+            return PyArray_Return(obj);
         }
     }

It should undo the non-regression part of the fix.

@juliantaylor
Copy link
Author

no the patch does not help

@ghost
Copy link

ghost commented Jan 19, 2014

Thanks for the heads-up @juliantaylor. I'm not sure what you mean by "coded a regression of 1.8 into the tests".
as our CI setup tests against 1.6.1, 1.7.1 and pypi 1.8.0 and so whatever changed broke
with all those numpy versions. That's the opposite meaning of regression as I know it.

I started looking at the damage here and found at least this change in numpy behaviour,
which off-hand seems like a bug:

# np 1.7.2
In [1]: a=np.array([1,2,3])
   ...: a[1.1]
Out[1]: 2
#1.8 tip 49cff2b
In [5]: a=np.array([1,2,3])
   ...: a[1.1]
Out[5]: array(2)

@ghost
Copy link

ghost commented Jan 19, 2014

We'll be releasing 13.1 in a couple of weeks, would be nice to sync up to minimize the breakage
window if there has to be one. Talk to us.

@juliantaylor
Copy link
Author

hm yes that is a bug
it seems to be a broken backport

with current git master unfortunately the tests just crash in:
test_combine_first_dt64 (pandas.sparse.tests.test_sparse.TestSparseSeries)
I'll check what causes that.

@ghost
Copy link

ghost commented Jan 19, 2014

It's a little troublesome to sort out which tests are affected by what. If you push a fix for
that, ping me and I'll make another pass to see if there's something we're actually misusing.

@jreback
Copy link
Contributor

jreback commented Jan 19, 2014

@y-p maybe we should have the 2.7 locale build run more tests and hit numpy master (rather than change another build)?

@ghost
Copy link

ghost commented Jan 19, 2014

just what I was thinking, but... we want to keep our build times short, we've put some effort into that.

@juliantaylor , we have a network location that stores wheel files of our compiled dependencies
including numpy to speedup travis builds. I'm sure numpy has a fancy CI setup of their own,
if you can put up a regularly updated git-master whl at a public http location, we can probably
move one of our builds to use that and give us both better confidence when making changes.

You'll then be able to check pandas integration easily through our new CI status page.

@ghost
Copy link

ghost commented Jan 19, 2014

In fact, if numpy puts up travis-compat wheels for all it's version branches that would
be useful to pydata projects at large, compilation times on travis are a killer.

@juliantaylor
Copy link
Author

the master failure is due to a recent np.where improvement, 10 times faster, but also apparently broken for object arrays :(
I'll see to get it fixed.

@seberg
Copy link
Contributor

seberg commented Jan 19, 2014

Jeesh... yes, subscript_simple needs to be channeled through PyArray_Return, because in some corner cases (probably exactly this one, a single non-integer but integer-like scalar) it can return a scalar array which should be converted to a scalar, too. So it needs the same stuff like fancy indexing has right now...

@cpcloud
Copy link
Member

cpcloud commented Jan 19, 2014

man numpy travis compat wheels would be friggin sweet

@jreback
Copy link
Contributor

jreback commented Jan 22, 2014

@juliantaylor related (prob the same) on mavericks, #6036

@ghost
Copy link

ghost commented Jan 23, 2014

@juliantaylor , we're getting ready to release 0.13.1 soon, any news on this?
otherwise any required changes in pandas will have to wait several months until
the next major release.

@seberg
Copy link
Contributor

seberg commented Jan 23, 2014

I am not sure if this is all about the a[0.] thing... But I have a pull request to fix that, but maybe we can check if this really covers things and the other changes are not a problem?

@ghost
Copy link

ghost commented Jan 23, 2014

Sure, what's the numpy PR for me to test?

@seberg
Copy link
Contributor

seberg commented Jan 23, 2014

Sorry, it is numpy/numpy#4223 (against 1.8.x branch)

@juliantaylor
Copy link
Author

tested 0.13 against he PR and it now works, so closing the issue.
If I understand correctly it was float indexing that broke, note that using floats as indices is deprecated. You will get warnings if you run your tests with -Wall

concerning wheels for CI, I would like this, maybe you can propose something to the numpy mailing list? I'd help out implementing something.

@jreback
Copy link
Contributor

jreback commented Jan 23, 2014

thanks @juliantaylor

we have the same float indexer deprecation (prob in 0.14) as well.....a source of lots of confusion

#4892

@ghost
Copy link

ghost commented Jan 23, 2014

Sorry, I actually ran the tests in another window against that PR a few hours ago
and forgotten about it. Green here too.

We used a vagrant ubuntu box as build env and if it were in the proper AWS region,
the (potentially heavy?) travis traffic would be gratis.

I'm loath to get into a long discussion on the ML, so go ahead. If you want help building
a useful resource for the community (numpy wheels, scipy wheels, whatever), give a shout.

@juliantaylor
Copy link
Author

I don't know much about wheels and how they are distributed, but I could provide a ubuntu PPA with a daily build relatively quickly if that would work too.

@ghost
Copy link

ghost commented Jan 23, 2014

wheels are a recent addition to python packaging, it's a binary format and stable pip supports it.
It's backed by PEP427. We've been using it succesfully for months.
a PPA would work too, we just found that making wheels is much simpler then deb/rpm packaging
and it's easier to integrate with our existing pip requirements, rather then resorting back to apt-get
since we manage multiple sets of requirements. All you need is a web server to serve them.

@ghost
Copy link

ghost commented Jan 28, 2014

cc @rgommers, @cgohlke.

You've raised the issue of making numpy wheels available on the ML recently.
The discussion above may interest you.

In short, we've been fetching wheels off a private webserver from travis
jobs to reduce CI runtime by more then half. We've been doing this since last July.

Travis runs on ubuntu, making numpy wheels (and associate packages such as scipy)
available from a public server would be beneficial to every package that relies on numpy
and makes use of travis.

@rgommers
Copy link
Contributor

rgommers commented Feb 3, 2014

@y-p that's a good point, reducing Travis runtime is worth spending some effort on.

Linux wheels aren't (yet) allowed on PyPi though, so we'd have to set up a separate server for it and document it in all the places people may look for it, because pip install --use-wheel numpy isn't going to cut it then.

@ghost
Copy link

ghost commented Feb 3, 2014

You do need a server, pypi will take a url to a server the lists directory contents
and work just fine if the wheels are there. That's what we've done.
Optimally, you'd need to provide a complete vertical stack of wheels
for things to work: numpy, scipy, numexpr, pytables and so on.

That's one thing conda does for you if you buy into it, though I still find wheels
simpler to use.

As I said, we've done this for a while. Look at http://cache27diy-cpycloud.rhcloud.com
#6227

@rgommers
Copy link
Contributor

rgommers commented Feb 4, 2014

Can't look at the rhcloud link (blocked by the great firewall) but the PR you link to is informative. To clarify my comment, I meant that the exact command pip install numpy won't work and that stuff like

PIP_ARGS+=" -I --use-wheel --find-links=$base_url/$wheel_box/ --allow-external --allow-insecure"

is what needs documenting then in order for people to find the option to use the wheel server and use it.

@rgommers
Copy link
Contributor

rgommers commented Feb 4, 2014

pandas is probably used as a dependency a lot also. So if what we want here is to provide numpy, scipy, pandas, etc. wheels, would it be possible to reuse what you have already set up?

@ghost
Copy link

ghost commented Feb 4, 2014

Reuse the wheels, sure. But another server is required, that one
runs on a free tier, on a tiny slice of a box, we won't want the world
banging on it.

It's not complicated to setup an ubuntu vm/lxc that will automatically generate
the wheels corresponding to requirements files and upload them to
wherever.

@rgommers
Copy link
Contributor

rgommers commented Feb 4, 2014

OK clear, thanks. Administering the wherever is the not-so-much-fun part, the rest is indeed not complicated.

@ghost
Copy link

ghost commented Feb 4, 2014

Generally true, but wherever could be s3 or similar, in which case things can be pretty simple
after initial setup (except billing).

@rgommers
Copy link
Contributor

rgommers commented Feb 4, 2014

I don't enjoy server admin, but if someone is willing to do this: we have a hosted mac mini that could be used (billing is taken care of). The plan was anyway to use that box for scipy-stack integration testing on OS X in addition to using it to debug mac issues, so this fits its purpose nicely.

@ghost
Copy link

ghost commented Feb 4, 2014

I offered to help early in the thread. Define "this".

@rgommers
Copy link
Contributor

rgommers commented Feb 4, 2014

Sorry missed that - help would be great. "this" --> set up the OS X box such that numpy/scipy/pandas/... devs can upload wheels to it when given pw/ssh access, and where pip can retrieve them when it is given the correct --find-links= address. Creating accounts for you and other devs who need them is something @cdeil can help with, he has been administering that machine for a while.

@ghost
Copy link

ghost commented Feb 4, 2014

I volunteer.

If we could also use the box as a low-frequency CI server for pandas
(we have no OSX coverage: ScatterCI), I vehemently volunteer.

@rgommers
Copy link
Contributor

rgommers commented Feb 4, 2014

Yes, definitely use it for CI if that helps you. Doesn't even have to be low-frequency, it's a dedicated hosted machine for use of core scipy stack projects and it's under-used right now. Multiple builds per day on a couple of Python versions should be no problem.

@cdeil can you give @y-p an admin account on the Mac Mini?

@ghost
Copy link

ghost commented Feb 4, 2014

Goodness all around. I'll go further and set up an example .travis repo for other projects
to quickly get up and running. Then we just need to spread the word.

@rgommers
Copy link
Contributor

rgommers commented Feb 4, 2014

Sounds great.

@cdeil
Copy link
Contributor

cdeil commented Feb 4, 2014

@y-p : please contact me at [email protected] and let me know what account name you'd like (or if you want several, e.g. for your CI and for you personally).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Testing pandas testing functions or related to the test suite
Projects
None yet
Development

No branches or pull requests

6 participants