Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix flakiness in TestTimersManager unit-test #2468

Merged
merged 2 commits into from
Mar 28, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 41 additions & 9 deletions rclcpp/test/rclcpp/test_timers_manager.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -367,22 +367,23 @@ TEST_F(TestTimersManager, check_one_timer_cancel_doesnt_affect_other_timers)
auto timers_manager = std::make_shared<TimersManager>(
rclcpp::contexts::get_global_default_context());

size_t t1_runs = 0;
std::atomic<size_t> t1_runs = 0;
const size_t cancel_iter = 5;
std::shared_ptr<TimerT> t1;
// After a while cancel t1. Don't remove it though.
// Simulates typical usage in a Node where a timer is cancelled but not removed,
// since typical users aren't going to mess around with the timer manager.
t1 = TimerT::make_shared(
1ms,
[&t1_runs, &t1]() {
[&t1_runs, &t1, cancel_iter]() {
t1_runs++;
if (t1_runs == 5) {
if (t1_runs == cancel_iter) {
t1->cancel();
}
},
rclcpp::contexts::get_global_default_context());

size_t t2_runs = 0;
std::atomic<size_t> t2_runs = 0;
auto t2 = TimerT::make_shared(
1ms,
[&t2_runs]() {
Expand All @@ -397,11 +398,42 @@ TEST_F(TestTimersManager, check_one_timer_cancel_doesnt_affect_other_timers)
// Start timers thread
timers_manager->start();

std::this_thread::sleep_for(15ms);
// Wait for t1 to be canceled
auto loop_start_time = std::chrono::high_resolution_clock::now();
while (!t1->is_canceled()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put an overall timeout on this, something like 30 seconds? It just ensures that if there is a bug, this test won't hang our CI forever.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in e06bd91

auto now = std::chrono::high_resolution_clock::now();
if (now - loop_start_time >= std::chrono::seconds(30)) {
FAIL() << "timeout waiting for t1 to be canceled";
break;
}
std::this_thread::sleep_for(3ms);
}

EXPECT_TRUE(t1->is_canceled());
EXPECT_FALSE(t2->is_canceled());
EXPECT_EQ(t1_runs, cancel_iter);

// Verify that t2 is still being invoked
const size_t start_t2_runs = t2_runs;
const size_t num_t2_extra_runs = 6;
loop_start_time = std::chrono::high_resolution_clock::now();
while (t2_runs < start_t2_runs + num_t2_extra_runs) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

auto now = std::chrono::high_resolution_clock::now();
if (now - loop_start_time >= std::chrono::seconds(30)) {
FAIL() << "timeout waiting for t2 to do some runs";
break;
}
std::this_thread::sleep_for(3ms);
}

EXPECT_TRUE(t1->is_canceled());
EXPECT_FALSE(t2->is_canceled());
// t1 hasn't run since before
EXPECT_EQ(t1_runs, cancel_iter);
// t2 has run the expected additional number of times
EXPECT_GE(t2_runs, start_t2_runs + num_t2_extra_runs);
// the t2 runs are strictly more than the t1 runs
EXPECT_GT(t2_runs, t1_runs);

// t1 has stopped running
EXPECT_NE(t1_runs, t2_runs);
// Check that t2 has significantly more calls
EXPECT_LT(t1_runs + 5, t2_runs);
timers_manager->stop();
}