Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST: make tests involving use_numexpr more robust #44601

Merged
merged 17 commits into from
Nov 26, 2021

Conversation

jorisvandenbossche
Copy link
Member

There have been some failures recently in the "Test experimental data manager (not slow and not network and not clipboard)" build.

@jorisvandenbossche
Copy link
Member Author

Does anybody know how you can get print statements in the tests to be printed in the actual logs of github actions?
I added some prints to help debug why the tests are failing, but I don't see anything in the logs.

@mroeschke
Copy link
Member

Does anybody know how you can get print statements in the tests to be printed in the actual logs of github actions?

Just a guess: maybe specifying flush=True?

@jorisvandenbossche
Copy link
Member Author

Apparently not running the tests in parallel does it as well.

@jorisvandenbossche
Copy link
Member Author

jorisvandenbossche commented Nov 25, 2021

Now, what have I learned up to now:

  • The tests can also fail with BlockManager, so it's not ArrayManager specific it seems (we just don't run the normal tests in this specific environment)
  • In the last build, pandas/tests/computation/test_eval.py::test_equals_various failed, and there the file level USE_NUMEXPR is True, while the pandas global one is False. Which is what I assumed is what is happening, but so now the question is still why the value changed in some other test (and it's also strange this only happens in this build ..).

@jorisvandenbossche
Copy link
Member Author

So it is this test that changes the USE_NUMEXPR value:

@td.skip_if_no_ne
@pytest.mark.parametrize(
    ("use_numexpr", "expected"),
    (
        (True, "numexpr"),
        (False, "python"),
    ),
)
def test_numexpr_option_respected(use_numexpr, expected):
    # GH 32556
    from pandas.core.computation.eval import _check_engine

    with pd.option_context("compute.use_numexpr", use_numexpr):
        result = _check_engine(None)
        assert result == expected

From the log:

...
pandas/tests/computation/test_eval.py::test_invalid_engine 
DEBUG-START:  True True True
PASSED
DEBUG-END:  True True True
pandas/tests/computation/test_eval.py::test_numexpr_option_respected[True-numexpr] 
DEBUG-START:  True True True
PASSED
DEBUG-END:  False True True
pandas/tests/computation/test_eval.py::test_numexpr_option_respected[False-python] 
DEBUG-START:  False True True
PASSED
DEBUG-END:  False True True
...

For some reason, the with pd.option_context("compute.use_numexpr", True): is resetting it to False afterwards, which should mean that the default setting changed somehow after the test collection.

@jorisvandenbossche
Copy link
Member Author

Going trough our tests, I don't see any place where we use set_use_numexpr directly without setting it back, or without using it in a option_context. And going through the tests that were no longer skipped after #44571, I also don't see one involving numexpr. But, there is a test_downstream.py test involving dask no longer skipped, and dask.dataframe seems to set numexpr use to False ...

We have only one other CI env that includes dask, and that is actions-38-db.yaml, but, that is only running the pandas/tests/io tests, and thus not running into this.

@jorisvandenbossche
Copy link
Member Author

Oh boy, this is such a mess:

In [4]: from pandas.core.computation import expressions as expr

In [5]: expr.USE_NUMEXPR
Out[5]: True

In [6]: pd.set_option("compute.use_numexpr", False)

In [7]: expr.USE_NUMEXPR
Out[7]: False

In [8]: expr.set_use_numexpr(True)

In [9]: expr.USE_NUMEXPR
Out[9]: True

In [10]: pd.get_option("compute.use_numexpr")
Out[10]: False

In [11]: with pd.option_context("compute.use_numexpr", True):
    ...:     pass
    ...: 

In [12]: expr.USE_NUMEXPR
Out[12]: False

So setting the option through set_option triggers a callback to change the actual value (expr.USE_NUMEXPR), but the value itself is still stored in the options machinery (getting the value doesn't check it in expr). So if you then change it afterwards with the internal expr.set_use_expr (which is what we do in the tests), this changes the actual value, but not what is stored in the options. So get_option still gives the old one.
With as a result that using option_context (which restores the "old" value it gets with get_option) actually changes the true expr.USE_NUMEXPR through calling the callback with the wrong "old" value.

@jorisvandenbossche jorisvandenbossche changed the title [DON'T MERGE] Try debug CI (test_eval.py) TST: make tests involving use_numexpr more robust Nov 25, 2021
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm (and prob should make an issue in the dask side to use the option itself)

@jreback jreback added this to the 1.4 milestone Nov 25, 2021
@jreback jreback added the Testing pandas testing functions or related to the test suite label Nov 25, 2021
@jreback
Copy link
Contributor

jreback commented Nov 25, 2021

this is testing only or do we think there is an actual problem in user code?

@jorisvandenbossche
Copy link
Member Author

No, this is only our testing. Dask is actually using the public option.

@jorisvandenbossche
Copy link
Member Author

It was the mixed usage of the public option (pd.set_option("compute.use_numexpr", bool)) and the internal way (expr.set_use_numexpr(bool)) which caused the option and the internal setting to get out of sync (the option has a callback to change the internal setting, but not the other way around).

@jorisvandenbossche jorisvandenbossche merged commit 5ee8cf1 into pandas-dev:master Nov 26, 2021
@jorisvandenbossche jorisvandenbossche deleted the ci-eval-debug branch November 26, 2021 06:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants