-
Notifications
You must be signed in to change notification settings - Fork 913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix nodes spins rather than blocking and waiting, using 100% CPU [melodic] #1651
Conversation
@ros-pull-request-builder retest this please |
This fixes #1558 |
@dirk-thomas I am not sure, but I think the remaining unit test failures in http://build.ros.org/job/Mpr_ds__ros_comm__debian_stretch_amd64/669/ for package On the other hand I can reproduce the same test failures in
But I do not see how this should be connected to the patch. There have been quite some patches to For my local test run, it looks like two instances of rostest-test_random_record.xml are running in parallel (why two?) and both write to the same output bag at If run the unit tests single-threaded with
Does the build farm strictly serialize unit tests or can different tests run in parallel? |
It seems, that this is an issue of cmake: The above mentioned mailing suggest to serialize the two targets, which depend on the same target.
to Anyway, adding this dependency might also fix occasional failures of the |
…lel builds See ros#1651 (comment) for details.
@ros-pull-request-builder retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@dirk-thomas friendly ping.. |
Seems like the fixes are still open? any plan to check them in? I encountered the same issue in our ROS system. Would be nice to have them in. |
On my embedded linux system (custom kernel), when running an empty Obviously, this isn't a supported ROS platform or anything, I'm just sharing further anecdotal evidence that this patch has significant improvements to performance even with Boost > 1.62. Looking forward to having these changes merged in. |
@dirk-thomas Is there anything outstanding that you'd like to be done before this PR can be merged? |
@dirk-thomas friendly ping again... could we please get this merged? |
@dirk-thomas another gentle ping request to ask if this can be merged ... I'm also seeing the same issue here (w/Boost 1.58) and this pull request fixes everything nicely. |
This is fixing issues on my systems as well. +1 for merging |
…lel builds See #1651 (comment) for details.
Do you mean compiling roscpp + your stuff with clang, or compiling roscpp with default settings (i.e. gcc) and your stuff with clang? |
compiling roscpp + my stuff with clang |
…lel builds (#1877) See #1651 (comment) for details. Co-authored-by: Johannes Meyer <[email protected]>
…tion_variable (fix ros#1343) ros#1014 and ros#1250 introduced a backported version of boost::condition_variable, where support for steady (monotonic) clocks has been added in version 1.61. But the namespace of the backported version was not changed and the symbol names might clash with the original version. Because the underlying clock used for the condition_variable is set in the constructor and must be consistent with the the expectations within member variables. The compiler might choose to inline one or the other or both, and is more likely to do so for optimized Release builds. But if it does not, the symbol ends up in the symbol table of roscpp and depending on which other libraries will be linked into the process it is unpredictable which of the two versions will be actually called at the end. In case the constructor defined in `/usr/include/boost/thread/pthread/condition_variable.hpp` was called and did not configure the internal pthread condition variable for monotonic clock, each call to the backported do_wait_until() method with a monotonic timestamp will return immediately and hence causes `CallbackQueue::callOne(timeout)` or `CallbackQueue::callAvailable(timeout)` to return immediately. This patch changes the namespace of the backported condition_variable implementation to boost_161. This removes the ambiguity with the original definition if both are used in the same process.
…ed timed_wait() This fixes ROS timers in combination with 2c18b9f. The timer callbacks were not called because the TimerManager's thread function blocked indefinitely on boost::condition_variable::timed_wait(). Relative timed_wait() uses the system clock (boost::get_system_time()) unconditionally to calculate the absolute timestamp for do_wait_until(). If the condition variable has been initialized with BOOST_THREAD_HAS_CONDATTR_SET_CLOCK_MONOTONIC, it compares this timestamp with the monotonic clock and therefore blocks. This issue has been reported in https://svn.boost.org/trac10/ticket/12728 and will not be fixed. The timed_wait interface is apparently deprecated.
…eadFunc() in steady_timer.cpp The updated generic definition in timer_manager.h should do the same with a minor update. In all cases we can call boost::condition_variable::wait_until() with an absolute time_point of the respective clock. The conversion from system_clock to steady_clock for Time and WallTime is done internally in boost::condition_variable::wait_until(lock_type& lock, const chrono::time_point<Clock, Duration>& t).
find_package(Boost) has to come before checking the Boost version. Otherwise BOOST_THREAD_HAS_CONDATTR_SET_CLOCK_MONOTONIC was not defined which triggered the assertion in timer_manager.h:240. Since Boost 1.67 BOOST_THREAD_HAS_CONDATTR_SET_CLOCK_MONOTONIC became the default if the platform supports it and the macro is not defined anymore. Instead, check for BOOST_THREAD_INTERNAL_CLOCK_IS_MONO.
…ITION_VARIABLE_HEADER macros by a typedef in internal_condition_variable.h
05f86d9
to
b9e8f3a
Compare
I separated the test_rosbag test dependency into its own PR (#1877) and rebased this patch on top of the latest state of the target branch. |
While this might not be a blocker to get this important fix merged into Melodic I would still like to consider the effect of this PR to API/ABI compatibility. Can someone comment on the ABI compatibility of the changed condition variable type in the various headers. Is this expected to be ABI compatible? If yes or no, for which Boost versions? |
We are typically building downstream packages which depend on I did a quick test with Note that with Boost 1.65 (>= 1.61)
So instead of changing the namespace and selecting which implementation of |
REP 3 specifies the targeted platforms and for Melodic that is:
Since that means all relevant platforms have Boost 1.61 or above we should do exactly that for the The question still stands: does that change pose an ABI break or not? If it does the next question would be is it possible to fix the problem in an ABI compatible way?
My main focus is to get the current default branch `melodic-devel into a releasable state and release a new version for Melodic (which will also be the base for forking for Noetic). Only after that I will try to look at what changes should be backport (which can be completely different patches like in this case) to Kinetic. |
On Ubuntu 18.04 with gcc 7.4.0-1ubuntu1~18.04.1This is the result of running the abi-compliance-checker to compare both versions: compat_report.html The renaming of the Other platforms with Boost 1.60 or lowerProbably not an issue for |
After all ABI-stability is a tricky thing. From my experience I'd say that this change is probably (likely) safe for users of the released binaries (If a user build roscpp in Debug mode , that's a whole different story). Inline-functions are ODR-violation-prone. There might be edge-cases where this change breaks a user, but I don't think that they are likly to occure. But note, that this actually change might actually have an impact on the behaviour of users, which rebuild their code with this changes if they are using |
Done: #1878 |
Yes I mispelled : it was boost 1.65.1 👍 |
This is now a duplicate of #1878 but I am keeping it open since it might serve as a template for a Kinetic-specific patch. |
This PR is obsolete now with #1878 and #1932. My original PR targeting |
Rebase of #1557, a proposed fix for #1343.
From #1557 (comment) (@dirk-thomas):
The patch also applies to
melodic-devel
with some minor merge conflict resolutions. I tested this branch with C++14 enabled on Ubuntu 16.04 with Boost 1.58. All unit tests in test_roscpp still pass. I have not tested the patch in Ubuntu 18.04 or other distros yet, neither whether it actually fixes the original issue causing spinning at 100% CPU after the rebase.I am not sure whether this patch should be applied tomelodic-devel
at all. If melodic only targets distributions which normally have at least Boost 1.62 according to REP-3, the backportedcondition_variable
implementation from 1.61 is not used at all and the respective cmake and preprocessor checks can be removed. Some changes could still be merged, like the removal of the unnecessary specialization ofTimerManager<SteadyTime, WallDuration, SteadyTimerEvent>::threadFunc()
in steady_timer.cpp.Edit: According to #1651 (comment) the patch has a positive effect on ROS timers to deal with system time changes (#1558). The reason is probably that
BOOST_THREAD_HAS_CONDATTR_SET_CLOCK_MONOTONIC
is defined unconditionally for roscpp as a whole now, while before it was only set in certain compilation units (but not intimer.cpp
).