Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_pandas is flaky #8438

Closed
juliangilbey opened this issue Jan 3, 2024 · 2 comments · Fixed by #8440
Closed

test_pandas is flaky #8438

juliangilbey opened this issue Jan 3, 2024 · 2 comments · Fixed by #8440
Labels
flaky test Intermittent failures on CI.

Comments

@juliangilbey
Copy link
Contributor

Describe the issue:

When building dask.distributed version 2023.12.1 (and same dask version) and running the tests with Python 3.11 or Python 3.12 in a clean environment, the test distributed/diagnostics/tests/test_memory_sampler.py::test_pandas appears to be flaky. The [False] parameterisation sometimes succeeds, sometimes fails, but the [True] one (almost) always fails. I've done a bit of digging and don't understand how it could ever succeed. Here is a typical error output:

=================================== FAILURES ===================================
______________________________ test_pandas[True] _______________________________

c = <Client: No scheduler connected>
s = <Scheduler 'tcp://127.0.0.1:39983', workers: 0, cores: 0, tasks: 0>
a = <Worker 'tcp://127.0.0.1:45483', name: 0, status: closed, stored: 0, running: 0/1, ready: 0, comm: 0, waiting: 0>
b = <Worker 'tcp://127.0.0.1:41405', name: 1, status: closed, stored: 0, running: 0/2, ready: 0, comm: 0, waiting: 0>
align = True

    @gen_cluster(client=True)
    @pytest.mark.parametrize("align", [False, True])
    async def test_pandas(c, s, a, b, align):
        pd = pytest.importorskip("pandas")
        pytest.importorskip("matplotlib")
    
        ms = MemorySampler()
        async with ms.sample("foo", measure="managed", interval=0.15):
            f = c.submit(lambda: 1)
            await f
            await asyncio.sleep(0.7)
    
        assert ms.samples["foo"][0][1] == 0
        assert ms.samples["foo"][-1][1] > 0
    
        df = ms.to_pandas(align=align)
        assert isinstance(df, pd.DataFrame)
        if align:
            assert isinstance(df.index, pd.TimedeltaIndex)
            assert df["foo"].iloc[0] == 0
            assert df["foo"].iloc[-1] > 0
            assert df.index[0] == pd.Timedelta(0, unit="s")
            assert pd.Timedelta(0, unit="s") < df.index[1]
            assert df.index[1] < pd.Timedelta(1.5, unit="s")
        else:
            assert isinstance(df.index, pd.DatetimeIndex)
            assert pd.Timedelta(0, unit="s") < df.index[1] - df.index[0]
            assert df.index[1] - df.index[0] < pd.Timedelta(1.5, unit="s")
    
>       plt = ms.plot(align=align, grid=True)

/usr/lib/python3/dist-packages/distributed/diagnostics/tests/test_memory_sampler.py:104: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/lib/python3/dist-packages/distributed/diagnostics/memory_sampler.py:173: in plot
    return df.plot(
/usr/lib/python3/dist-packages/pandas/plotting/_core.py:1032: in __call__
    return plot_backend.plot(data, kind=kind, **kwargs)
/usr/lib/python3/dist-packages/pandas/plotting/_matplotlib/__init__.py:71: in plot
    plot_obj.generate()
/usr/lib/python3/dist-packages/pandas/plotting/_matplotlib/core.py:453: in generate
    self._make_plot()
/usr/lib/python3/dist-packages/pandas/plotting/_matplotlib/core.py:1409: in _make_plot
    ax.set_xlim(left, right)
/usr/lib/python3/dist-packages/matplotlib/_api/deprecation.py:454: in wrapper
    return func(*args, **kwargs)
/usr/lib/python3/dist-packages/matplotlib/axes/_base.py:3686: in set_xlim
    return self.xaxis._set_lim(left, right, emit=emit, auto=auto)
/usr/lib/python3/dist-packages/matplotlib/axis.py:1137: in _set_lim
    _api.warn_external(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

message = 'Attempting to set identical low and high xlims makes transformation singular; automatically expanding.'
category = None

    def warn_external(message, category=None):
        """
        `warnings.warn` wrapper that sets *stacklevel* to "outside Matplotlib".
    
        The original emitter of the warning can be obtained by patching this
        function back to `warnings.warn`, i.e. ``_api.warn_external =
        warnings.warn`` (or ``functools.partial(warnings.warn, stacklevel=2)``,
        etc.).
        """
        frame = sys._getframe()
        for stacklevel in itertools.count(1):  # lgtm[py/unused-loop-variable]
            if frame is None:
                # when called in embedded context may hit frame is None
                break
            if not re.match(r"\A(matplotlib|mpl_toolkits)(\Z|\.(?!tests\.))",
                            # Work around sphinx-gallery not setting __name__.
                            frame.f_globals.get("__name__", "")):
                break
            frame = frame.f_back
>       warnings.warn(message, category, stacklevel)
E       UserWarning: Attempting to set identical low and high xlims makes transformation singular; automatically expanding.

/usr/lib/python3/dist-packages/matplotlib/_api/__init__.py:363: UserWarning

So the warning from matplotlib causes the test to fail.

I then put in some diagnostic output:

@gen_cluster(client=True)
@pytest.mark.parametrize("align", [False, True])
async def test_pandas(c, s, a, b, align):
    pd = pytest.importorskip("pandas")
    pytest.importorskip("matplotlib")

    ms = MemorySampler()
    async with ms.sample("foo", measure="managed", interval=0.15):
        f = c.submit(lambda: 1)
        await f
        await asyncio.sleep(0.7)

    assert ms.samples["foo"][0][1] == 0
    assert ms.samples["foo"][-1][1] > 0

    df = ms.to_pandas(align=align)
    print("ms.to_pandas:")
    print(df)
    df2 = df.resample("1s")
    print("resampled:")
    print(df2)
    df3 = df2.nearest()
    print("nearest:")
    print(df3)
    df4 = df3 / 2**30
    print("scaled:")
    print(df4)

    assert isinstance(df, pd.DataFrame)
    [...]

and then a final assert False so that it would always fail. An output from the [False] case when it succeeded (up to the final assert False was:

ms.to_pandas:
                               foo
0                                 
2024-01-03 11:17:00.436826112    0
2024-01-03 11:17:00.587768064   28
2024-01-03 11:17:00.736993024   28
2024-01-03 11:17:00.887495168   28
2024-01-03 11:17:01.037432064   28
resampled:
DatetimeIndexResampler [freq=<Second>, axis=0, closed=left, label=left, convention=start, origin=start_day]
nearest:
                     foo
0                       
2024-01-03 11:17:00    0
2024-01-03 11:17:01   28
scaled:
                              foo
0                                
2024-01-03 11:17:00  0.000000e+00
2024-01-03 11:17:01  2.607703e-08

and from the [True] case (or the [False] case when it failed):

ms.to_pandas:
                           foo
0                             
0 days 00:00:00              0
0 days 00:00:00.150902784   28
0 days 00:00:00.300422912   28
0 days 00:00:00.450218752   28
0 days 00:00:00.601036800   28
resampled:
TimedeltaIndexResampler [freq=<Second>, axis=0, closed=left, label=left, convention=start, origin=start_day]
nearest:
        foo
0          
0 days    0
scaled:
        foo
0          
0 days  0.0

Because the sampling is for less than 1 second (await asyncio.sleep(0.7)), it seems that the resampling will only ever get one sample, and therefore plotting the single sample will always cause this matplotlib warning. Changing the 0.7 to 1.5 (or even 1.2) causes both tests to succeed.

Environment:

  • Dask version: 2023.12.1
  • Python version: 3.11 or 3.12
  • Operating System: Debian unstable
  • Install method (conda, pip, source): source
@juliangilbey
Copy link
Contributor Author

I've found the source of this changed behaviour: 163165b
This introduces the resampling before graph plotting. Presumably there is some reason why on the GitHub CI, this test takes longer than 1 second to run and so the test passes there.

juliangilbey pushed a commit to juliangilbey/distributed that referenced this issue Jan 3, 2024
@hendrikmakait hendrikmakait added flaky test Intermittent failures on CI. and removed needs triage labels Jan 8, 2024
@hendrikmakait
Copy link
Member

This introduces the resampling before graph plotting. Presumably there is some reason why on the GitHub CI, this test takes longer than 1 second to run and so the test passes there.

For context, the CI workers aren't known to be fast, so I am not surprised that the test takes longer there and "accidentally" works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky test Intermittent failures on CI.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants