Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python panda test failed. #3622

Closed
trivialfis opened this issue Aug 22, 2018 · 6 comments
Closed

Python panda test failed. #3622

trivialfis opened this issue Aug 22, 2018 · 6 comments

Comments

@trivialfis
Copy link
Member

commit: 4912c1f
Python: 3.6 vertualenv
pandas: 0.23.4
numpy: 1.15.0

Failed in this test:

assert cv.columns.equals(exp)

The actual result has 'train-' sorted before 'test-'.

I tried to go into training code and found this:

for k, v in sorted(cvmap.items(), key=lambda x: (x[0].startswith('test'), x[0])):

If x.startswith('test') is True, it will have greater value than False, sorted is meant to sort stuff in ascending order so key starts with "test" will be sorted after key starts with "train", which is the opposite of expected result.

It's not quite clear to me what these code do, so I could be way wrong. And I don't know why Travis didn't fail, I cleaned my virtualenv to ensure nothing else gets in the way.

Any insight? :)

@hcho3
Copy link
Collaborator

hcho3 commented Aug 28, 2018

Maybe we should re-write the tests so that they won't be flaky.

@trivialfis
Copy link
Member Author

trivialfis commented Aug 28, 2018

It's just I'm not sure that whether the problem is in tests or in the core code, or I misunderstood something important.
Which order of returned value is the true expectation?
I saw the last commit of these code is about 2~3 years ago, how is it possible that the tests keep passing? Is the problem on my side?

@hcho3
Copy link
Collaborator

hcho3 commented Aug 28, 2018

@trivialfis Let me look at this.

@ksangeek
Copy link

ksangeek commented Mar 25, 2019

I am also able to reproduce this problem in xgboost v0.82.

pytest tests/python
platform linux -- Python 3.7.1, pytest-4.0.2, py-1.7.0, pluggy-0.8.0

Failure details

>       assert cv.columns.equals(exp)
E       AssertionError: assert False
E        +  where False = <bound method Index.equals of Index(['train-error-mean', 'train-error-std', 'test-error-mean',\n       'test-error-std'],\n      dtype='object')>(Index(['test-error-mean', 'test-error-std', 'train-error-mean',\n       'train-error-std'],\n      dtype='object'))
E        +    where <bound method Index.equals of Index(['train-error-mean', 'train-error-std', 'test-error-mean',\n       'test-error-std'],\n      dtype='object')> = Index(['train-error-mean', 'train-error-std', 'test-error-mean',\n       'test-error-std'],\n      dtype='object').equals
E        +      where Index(['train-error-mean', 'train-error-std', 'test-error-mean',\n       'test-error-std'],\n      dtype='object') =    train-error-mean       ...        test-error-std\n0          0.046522       ...              0.007642\n1          0.0...    0.001092       ...              0.001338\n9          0.000682       ...              0.001338\n\n[10 rows x 4 columns].columns

tests/python/test_with_pandas.py:130: AssertionError

I see that the test was comparing cv.columns with exp of type pd.Index and the problem occurs because both the elements of lists are not in the same order!
I made a patch to sort the lists before comparison and was able to get past the issue

$ diff test_with_pandas.orig.py test_with_pandas.py
130c130
<         assert cv.columns.equals(exp)
---
>         assert sorted(cv.columns) == sorted(exp)
138c138
<         assert cv.columns.equals(exp)
---
>         assert sorted(cv.columns) == sorted(exp)
144c144
<         assert cv.columns.equals(exp)
---
>         assert sorted(cv.columns) == sorted(exp)

@hcho3 Do you think this is the right fix?

@trivialfis
Copy link
Member Author

@ksangeek let me try it later. Weird thing that it runs completely fine with Python 2 so I wasn't sure

@hcho3 hcho3 closed this as completed Apr 23, 2019
@hcho3
Copy link
Collaborator

hcho3 commented Apr 23, 2019

Addressed in #4395

@lock lock bot locked as resolved and limited conversation to collaborators Jul 22, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants