`test_pandas` is flaky #8438

juliangilbey · 2024-01-03T11:27:25Z

Describe the issue:

When building dask.distributed version 2023.12.1 (and same dask version) and running the tests with Python 3.11 or Python 3.12 in a clean environment, the test distributed/diagnostics/tests/test_memory_sampler.py::test_pandas appears to be flaky. The [False] parameterisation sometimes succeeds, sometimes fails, but the [True] one (almost) always fails. I've done a bit of digging and don't understand how it could ever succeed. Here is a typical error output:

=================================== FAILURES ===================================
______________________________ test_pandas[True] _______________________________

c = <Client: No scheduler connected>
s = <Scheduler 'tcp://127.0.0.1:39983', workers: 0, cores: 0, tasks: 0>
a = <Worker 'tcp://127.0.0.1:45483', name: 0, status: closed, stored: 0, running: 0/1, ready: 0, comm: 0, waiting: 0>
b = <Worker 'tcp://127.0.0.1:41405', name: 1, status: closed, stored: 0, running: 0/2, ready: 0, comm: 0, waiting: 0>
align = True

    @gen_cluster(client=True)
    @pytest.mark.parametrize("align", [False, True])
    async def test_pandas(c, s, a, b, align):
        pd = pytest.importorskip("pandas")
        pytest.importorskip("matplotlib")
    
        ms = MemorySampler()
        async with ms.sample("foo", measure="managed", interval=0.15):
            f = c.submit(lambda: 1)
            await f
            await asyncio.sleep(0.7)
    
        assert ms.samples["foo"][0][1] == 0
        assert ms.samples["foo"][-1][1] > 0
    
        df = ms.to_pandas(align=align)
        assert isinstance(df, pd.DataFrame)
        if align:
            assert isinstance(df.index, pd.TimedeltaIndex)
            assert df["foo"].iloc[0] == 0
            assert df["foo"].iloc[-1] > 0
            assert df.index[0] == pd.Timedelta(0, unit="s")
            assert pd.Timedelta(0, unit="s") < df.index[1]
            assert df.index[1] < pd.Timedelta(1.5, unit="s")
        else:
            assert isinstance(df.index, pd.DatetimeIndex)
            assert pd.Timedelta(0, unit="s") < df.index[1] - df.index[0]
            assert df.index[1] - df.index[0] < pd.Timedelta(1.5, unit="s")
    
>       plt = ms.plot(align=align, grid=True)

/usr/lib/python3/dist-packages/distributed/diagnostics/tests/test_memory_sampler.py:104: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/lib/python3/dist-packages/distributed/diagnostics/memory_sampler.py:173: in plot
    return df.plot(
/usr/lib/python3/dist-packages/pandas/plotting/_core.py:1032: in __call__
    return plot_backend.plot(data, kind=kind, **kwargs)
/usr/lib/python3/dist-packages/pandas/plotting/_matplotlib/__init__.py:71: in plot
    plot_obj.generate()
/usr/lib/python3/dist-packages/pandas/plotting/_matplotlib/core.py:453: in generate
    self._make_plot()
/usr/lib/python3/dist-packages/pandas/plotting/_matplotlib/core.py:1409: in _make_plot
    ax.set_xlim(left, right)
/usr/lib/python3/dist-packages/matplotlib/_api/deprecation.py:454: in wrapper
    return func(*args, **kwargs)
/usr/lib/python3/dist-packages/matplotlib/axes/_base.py:3686: in set_xlim
    return self.xaxis._set_lim(left, right, emit=emit, auto=auto)
/usr/lib/python3/dist-packages/matplotlib/axis.py:1137: in _set_lim
    _api.warn_external(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

message = 'Attempting to set identical low and high xlims makes transformation singular; automatically expanding.'
category = None

    def warn_external(message, category=None):
        """
        `warnings.warn` wrapper that sets *stacklevel* to "outside Matplotlib".
    
        The original emitter of the warning can be obtained by patching this
        function back to `warnings.warn`, i.e. ``_api.warn_external =
        warnings.warn`` (or ``functools.partial(warnings.warn, stacklevel=2)``,
        etc.).
        """
        frame = sys._getframe()
        for stacklevel in itertools.count(1):  # lgtm[py/unused-loop-variable]
            if frame is None:
                # when called in embedded context may hit frame is None
                break
            if not re.match(r"\A(matplotlib|mpl_toolkits)(\Z|\.(?!tests\.))",
                            # Work around sphinx-gallery not setting __name__.
                            frame.f_globals.get("__name__", "")):
                break
            frame = frame.f_back
>       warnings.warn(message, category, stacklevel)
E       UserWarning: Attempting to set identical low and high xlims makes transformation singular; automatically expanding.

/usr/lib/python3/dist-packages/matplotlib/_api/__init__.py:363: UserWarning

So the warning from matplotlib causes the test to fail.

I then put in some diagnostic output:

@gen_cluster(client=True)
@pytest.mark.parametrize("align", [False, True])
async def test_pandas(c, s, a, b, align):
    pd = pytest.importorskip("pandas")
    pytest.importorskip("matplotlib")

    ms = MemorySampler()
    async with ms.sample("foo", measure="managed", interval=0.15):
        f = c.submit(lambda: 1)
        await f
        await asyncio.sleep(0.7)

    assert ms.samples["foo"][0][1] == 0
    assert ms.samples["foo"][-1][1] > 0

    df = ms.to_pandas(align=align)
    print("ms.to_pandas:")
    print(df)
    df2 = df.resample("1s")
    print("resampled:")
    print(df2)
    df3 = df2.nearest()
    print("nearest:")
    print(df3)
    df4 = df3 / 2**30
    print("scaled:")
    print(df4)

    assert isinstance(df, pd.DataFrame)
    [...]

and then a final assert False so that it would always fail. An output from the [False] case when it succeeded (up to the final assert False was:

ms.to_pandas:
                               foo
0                                 
2024-01-03 11:17:00.436826112    0
2024-01-03 11:17:00.587768064   28
2024-01-03 11:17:00.736993024   28
2024-01-03 11:17:00.887495168   28
2024-01-03 11:17:01.037432064   28
resampled:
DatetimeIndexResampler [freq=<Second>, axis=0, closed=left, label=left, convention=start, origin=start_day]
nearest:
                     foo
0                       
2024-01-03 11:17:00    0
2024-01-03 11:17:01   28
scaled:
                              foo
0                                
2024-01-03 11:17:00  0.000000e+00
2024-01-03 11:17:01  2.607703e-08

and from the [True] case (or the [False] case when it failed):

ms.to_pandas:
                           foo
0                             
0 days 00:00:00              0
0 days 00:00:00.150902784   28
0 days 00:00:00.300422912   28
0 days 00:00:00.450218752   28
0 days 00:00:00.601036800   28
resampled:
TimedeltaIndexResampler [freq=<Second>, axis=0, closed=left, label=left, convention=start, origin=start_day]
nearest:
        foo
0          
0 days    0
scaled:
        foo
0          
0 days  0.0

Because the sampling is for less than 1 second (await asyncio.sleep(0.7)), it seems that the resampling will only ever get one sample, and therefore plotting the single sample will always cause this matplotlib warning. Changing the 0.7 to 1.5 (or even 1.2) causes both tests to succeed.

Environment:

Dask version: 2023.12.1
Python version: 3.11 or 3.12
Operating System: Debian unstable
Install method (conda, pip, source): source

The text was updated successfully, but these errors were encountered:

juliangilbey · 2024-01-03T18:00:20Z

I've found the source of this changed behaviour: 163165b
This introduces the resampling before graph plotting. Presumably there is some reason why on the GitHub CI, this test takes longer than 1 second to run and so the test passes there.

hendrikmakait · 2024-01-08T10:16:42Z

This introduces the resampling before graph plotting. Presumably there is some reason why on the GitHub CI, this test takes longer than 1 second to run and so the test passes there.

For context, the CI workers aren't known to be fast, so I am not surprised that the test takes longer there and "accidentally" works.

github-actions bot added the needs triage label Jan 3, 2024

juliangilbey pushed a commit to juliangilbey/distributed that referenced this issue Jan 3, 2024

Extend sleep in test_pandas (dask#8438)

97af828

juliangilbey mentioned this issue Jan 3, 2024

Extend sleep in test_pandas (#8438) #8440

Merged

2 tasks

hendrikmakait added flaky test Intermittent failures on CI. and removed needs triage labels Jan 8, 2024

hendrikmakait closed this as completed in #8440 Jan 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`test_pandas` is flaky #8438

`test_pandas` is flaky #8438

juliangilbey commented Jan 3, 2024

juliangilbey commented Jan 3, 2024

hendrikmakait commented Jan 8, 2024

test_pandas is flaky #8438

test_pandas is flaky #8438

Comments

juliangilbey commented Jan 3, 2024

juliangilbey commented Jan 3, 2024

hendrikmakait commented Jan 8, 2024

`test_pandas` is flaky #8438

`test_pandas` is flaky #8438