improve SteadyTimer tests #1129

dirk-thomas · 2017-08-10T16:20:06Z

Rebased and squashed #1120 in the hope that it further reduces the flakyness of the SteadyTimer test.

dirk-thomas · 2017-08-10T17:37:38Z

I triggered five builds of the PR job (266 - 270). Four out of the five passed but one still failed the test. While that already makes the test less flaky it would be nice to get it even less flaky...

instead of checking when the callback was actually called, check for when it was added to the callback queue. This *should* make the test more reliable.

dirk-thomas · 2017-08-10T18:42:17Z

I increased the late-threshold from 0.2s to 0.5s. All newly triggered builds (271 - 278) passed (one was unstable but due to a different flaky test). Calling it a day. Thanks @flixr for the original patch.

dirk-thomas · 2017-08-10T20:10:59Z

And the final 9th build failed again 😞 @flixr Please look into this test and propose a patch which passes reliably. This is rather urgent since Lunar is waiting for ros_comm to be re-released and I only want to do this with a green CI.

kmactavish · 2017-08-10T21:05:25Z

Is there any point of the late check now? It's relaxed so much it might as well be removed. I vote just get rid of it. The build farm is just not real-time enough.

The whole test is by nature stochastic, which is not ideal. An improved (but extreme) test would be to make sure that SteadyTimer didn't trigger early when the system time was intentionally jumped back.

dirk-thomas · 2017-08-10T21:09:02Z

With the introduction of SteadyTime the TimerManager has been switch to use that instead of WallTime. I would expect that the test coverage for the new SteadyTimer is at least as good as the existing test coverage for WallTimer. So just removing the test is imo not an option. In fact the tests for these two timers should even be identical (and pass) in order to justify that this change likely doesn't introduce any regressions.

kmactavish · 2017-08-10T21:25:26Z

Right, I meant just the late check, not the whole test.

I just saw that there was the WallTimerHelper, which is essentially the same. Ideally there would be less code duplication between those two test helpers so that they would be nearly idententical.

That begs the question. Why is SteadyTimer failing, and WallTimer not? The WallTimer test has been there for 7 years. Are the builds using docker, or virtual machines? (edit: I see it's docker, there goes that theory)

flixr · 2017-08-11T08:50:24Z

@dirk-thomas I could never reproduce a test failure on my machine, so this is a bit hard to investigate.
Is there a way to get to the more detailed test failure on jenkins?
Or where can I find the exact docker file and steps to run it in the same configuration as on the buildfarm?

dirk-thomas · 2017-08-11T15:15:42Z

Is there a way to get to the more detailed test failure on jenkins?

I am not sure what detailed information beside the console output and the junit result file you are referring to. Maybe you can clarify what you are looking for.

Or where can I find the exact docker file and steps to run it in the same configuration as on the buildfarm?

You can run the devel job locally following these instructions: https://github.com/ros-infrastructure/ros_buildfarm/blob/master/doc/jobs/devel_jobs.rst#run-the-devel-job-locally

flixr · 2017-08-11T15:39:10Z

I am not sure what detailed information beside the console output and the junit result file you are referring to. Maybe you can clarify what you are looking for.

I only found the standard output.
In the test are lines like
ROS_ERROR("Call came at wrong time (%f vs. %f)", expected_next_call_.toSec(), start.toSec());
that would help but I couldn't find this output anywhere...

You can run the devel job locally following these instructions: https://github.com/ros-infrastructure/ros_buildfarm/blob/master/doc/jobs/devel_jobs.rst#run-the-devel-job-locally

Thanks, will check this out...

mikepurvis · 2017-08-11T15:53:48Z

I don't have loads to contribute here, except to say that we run our tests on a heavily virtualized environment, and experience a ton of this kind of thing, particularly as the tests rise in complexity—
when your thread sleeps and you're suddenly at the mercy of both your Linux scheduler and an external hypervisor, you really have no idea when you're going to wake up again.

And yet, these kinds of tests still do have value. Basically, I think this reality has to be accepted, and should be fixed at the tooling layer: specifically, rostest should learn to execute N retries on a failing test to truly prove the failure, rather than giving up the first time.

dirk-thomas · 2017-08-11T16:34:08Z

@flixr I only found the standard output.
In the test are lines like ROS_ERROR("Call came at wrong time (%f vs. %f)", expected_next_call_.toSec(), start.toSec()); that would help but I couldn't find this output anywhere...

I think this is a problem with rostest which doesn't capture stderr correctly in all cases.

@mikepurvis rostest should learn to execute N retries on a failing test to truly prove the failure, rather than giving up the first time.

While that might be valuable in some cases I disagree that this should be applied generally. A test which even fails only sometime might be a real bug and shouldn't be swept under the carpet. Imo the tests should be written in a way that they are resilient to these kind of timing differences.

kmactavish · 2017-08-11T17:48:45Z

@mikepurvis Hypervisor scheduling + clock adjustment was my thought if the slaves were VMs, but docker is pretty bare metal... What is strange is that there has never been a problem with CLOCK_REALTIME (WallTime), only with CLOCK_MONOTONIC (SteadyTime) in nearly identical tests. You would think that WallTime should be just as susceptible if it's coming from slow wakeup from the sleep. It does seem likely that it has to do with virtualization / containerization, since the tests seem to pass fine on PCs.

@flixr If you're going to try to debug locally, you could try testing under high load (to simulate parallel builds) with stress, and tighten the thresholds to be more sensitive.

mikepurvis · 2017-08-11T18:23:35Z

@kmactavish Yeah, but the dockerized buildfarm slaves are themselves running on Amazon's infrastructure: http://build.ros.org/computer/

So yeah, tests that are perfectly reliable running on someone's laptop or a robot computer can absolutely fall to pieces when attempted by the buildfarm machines. Having at least a per-test option in CMake for "please retry N times" would be great, and would help avoid what's happened here, where a test's timing constraints are relaxed to the point where it is no longer all that meaningful.

flixr · 2017-08-11T18:45:16Z

I think this is a problem with rostest which doesn't capture stderr correctly in all cases.

@dirk-thomas Is there an issue for this? Any idea how to fix that?
It is impossible to tell what/where it is going wrong without that info.
If we can get #1130 to fail with this info it should at least give us some more insight.

In general I think it is also impossible to make these tests 100% reliable as @mikepurvis pointed out.
But it is indeed curious that there didn't seem to be a problem with the multipleWallTimeCallbacks so far...

dirk-thomas · 2017-08-11T19:01:19Z

Is there an issue for this?

No.

Any idea how to fix that?

No.

It is impossible to tell what/where it is going wrong without that info.
If we can get #1130 to fail with this info it should at least give us some more insight.

I have looked at it in the past and it didn't seem like an easy to address problem.

In general I think it is also impossible to make these tests 100% reliable as @mikepurvis pointed out.
But it is indeed curious that there didn't seem to be a problem with the multipleWallTimeCallbacks so far...

The fact that it did work and test before and doesn't now is alarming enough for me to not release the current state. Maybe it is not that important and only a "test issue" but I don't want to take a risk with this.

kmactavish · 2017-08-11T19:03:30Z

Update: I can get the tests to fail for both multipleSteadyTimeCallbacks and multipleWallTimeCallbacks on my i7 laptop:

by reducing the threshold to ~0.012s late, under normal load,
by reducing the threshold to ~0.07s late (only 30% tighter than now), and running:
stress --cpu 8 --io 4 --vm 4 --vm-bytes 128M.

In short, these tests have always been flaky, SteadyTimer just doubled the number of them, increasing the chance of failure. It also runs first of the two, so it's the first failure you see.

Proposed fix: Keep the tight check for the timer being called early, change the late check for both helpers to 1s or even 10s, since it's unreliable in high-load, containerized test environments.

P.S. @flixr I added a cout above the ROS_ERROR to see this. It looks like ROS_ERROR is only captured for the first test, and not subsequent ones.

dirk-thomas · 2017-08-11T21:29:58Z

Replaced by #1132.

dirk-thomas mentioned this pull request Aug 10, 2017

improve SteadyTimer tests #1120

Closed

improve SteadyTimer tests

a371c9f

instead of checking when the callback was actually called, check for when it was added to the callback queue. This *should* make the test more reliable.

dirk-thomas force-pushed the improve_steady_timer_tests branch from 8b13dbb to a371c9f Compare August 10, 2017 17:44

flixr mentioned this pull request Aug 11, 2017

add time when timer expired to timer events #1130

Merged

This was referenced Aug 11, 2017

More tolerant and unified timer tests #1131

Closed

Improve steady timer tests #1132

Merged

dirk-thomas closed this Aug 11, 2017

dirk-thomas deleted the improve_steady_timer_tests branch August 11, 2017 21:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve SteadyTimer tests #1129

improve SteadyTimer tests #1129

dirk-thomas commented Aug 10, 2017

dirk-thomas commented Aug 10, 2017

dirk-thomas commented Aug 10, 2017

dirk-thomas commented Aug 10, 2017

kmactavish commented Aug 10, 2017

dirk-thomas commented Aug 10, 2017

kmactavish commented Aug 10, 2017 •

edited

Loading

flixr commented Aug 11, 2017

dirk-thomas commented Aug 11, 2017

flixr commented Aug 11, 2017

mikepurvis commented Aug 11, 2017

dirk-thomas commented Aug 11, 2017

kmactavish commented Aug 11, 2017 •

edited

Loading

mikepurvis commented Aug 11, 2017

flixr commented Aug 11, 2017 •

edited

Loading

dirk-thomas commented Aug 11, 2017

kmactavish commented Aug 11, 2017 •

edited

Loading

dirk-thomas commented Aug 11, 2017

improve SteadyTimer tests #1129

improve SteadyTimer tests #1129

Conversation

dirk-thomas commented Aug 10, 2017

dirk-thomas commented Aug 10, 2017

dirk-thomas commented Aug 10, 2017

dirk-thomas commented Aug 10, 2017

kmactavish commented Aug 10, 2017

dirk-thomas commented Aug 10, 2017

kmactavish commented Aug 10, 2017 • edited Loading

flixr commented Aug 11, 2017

dirk-thomas commented Aug 11, 2017

flixr commented Aug 11, 2017

mikepurvis commented Aug 11, 2017

dirk-thomas commented Aug 11, 2017

kmactavish commented Aug 11, 2017 • edited Loading

mikepurvis commented Aug 11, 2017

flixr commented Aug 11, 2017 • edited Loading

dirk-thomas commented Aug 11, 2017

kmactavish commented Aug 11, 2017 • edited Loading

dirk-thomas commented Aug 11, 2017

kmactavish commented Aug 10, 2017 •

edited

Loading

kmactavish commented Aug 11, 2017 •

edited

Loading

flixr commented Aug 11, 2017 •

edited

Loading

kmactavish commented Aug 11, 2017 •

edited

Loading