-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add EventsExecutor #256
Add EventsExecutor #256
Conversation
This pull request has been mentioned on ROS Discourse. There might be relevant details there: https://discourse.ros.org/t/ros2-middleware-change-proposal/15863/20 |
@eboasson @k0ekk0ek William assumes CycloneDDS has the plumbing to support this. Can you check what's required? |
@irobot-ros I think you will need to create a listener when e.g. the datareader is created (see https://github.com/eclipse-cyclonedds/cyclonedds/blob/75e3bbb3e2d849b3d7c6f61a95076159fdcde45d/src/core/ddsc/include/dds/dds.h#L1481). The listener callback can receive a custom type erased argument (passed at creation), and here you can set the "on data available" listener. The rest of the code should be similar to the |
Even better, I think you don't need to create a listener when creating the reader, but you can attach one afterwards. |
Oh dear, vacation and too much on my mind and this managed to get away from me. I'm afraid the next couple of days won't give me much time either ... But I do want to confirm what @ivanpauno writes. There are a few things I would like to note, so that if anyone is kind enough to work on this in these next couple of days, he or she has a bit more context for a few behaviours that might not be immediately obvious:
Other than that, I can't think of any complications. |
Thanks for providing a more detailed answer @eboasson !! |
Hi! Thanks @ivanpauno and @eboasson for the detailed guides, very useful! We have implemented listeners for all entities (with the exception of events) so now we can make use of the EventsExecutor both with inter/intra-process communication. That is, subscriptions, clients, services and guard conditions are supported. Some notes about the implementation:
A solution could be that the common listener (for both subscription and event), has the subscription setting the Another solution could be having a different data reader for the event. This can be done outside the DDS layers (rmw,rcl,rclcpp), and would simplify everything. But would involve "structural changes" outside of the scope of this PR. Let us know if you have some ideas about how to address the events situation, and if the general implementation looks correct. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all LGTM, except for the memory leak by failing to call dds_delete_listener
, but I do have a few overall comments:
- The "data on readers" event fires on a DDS Subscriber, not on a DataReader, and indeed, in the DDS specs you can't set it on a reader. It works in Cyclone by accident, because of how I implemented the propagating of events up towards the participant (as described in the spec) by propagating listeners down towards the readers and writers, combined with not having thought of preventing this. I might decide to fix this one day, so it is probably better to use "data available" in all cases.
dds_set_listener
copies the listener definitions and there is no value in keeping it around. The guard condition case is of course a bit different, but at least today the function invoked as guaranteed to bedds_listener_callback
. You could get away with not callingdds_lget_data_available
.- The code for the guard condition case has made me wonder if Cyclone shouldn't introduce a listener that is invoked whenever a guard, read or query conditions triggers. In the C API of Cyclone, this is pretty straightforward, but before committing to introducing it, it is wise to check what it would mean for the C++ API at least (which faithfully implements the corresponding spec).
- You can probably eliminate the mutex in
user_callback_data_t
because Cyclone serializes the listener invocations and serializes it withdds_set_listener
calls. That is, it may be possible to first remove the listener, then update the callback in, e.g.,rmw_subscription_set_listener_callback
, then install it again. On the flip side, because it serializes those invocations, it is an uncontended mutex anyway, and those are cheap enough that the straightforwardness of the current code has its advantages.
Thanks for the review @eboasson
|
The
IIUC this comment is still valid. Everything else looked fine to me. |
rmw_cyclonedds_cpp/src/rmw_node.cpp
Outdated
dds_listener_t * listener = nullptr; | ||
dds_get_listener(sub->enth, listener); | ||
dds_delete_listener(listener); | ||
dds_set_listener(sub->enth, NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one call suffices for removing the listeners, the preceding three lines have no effect (this applies for all occurrences).
In the latest commits we adddressed previous comments as:
We re-introduced the mutex in the entities (which were removed in a previous PR). The mutex is supposed to be always uncontended, thus the overhead should be negligible. Having this mutex back makes things safer, as we don't have to unset an entity's listener while we update the data. Before, we risked loosing an event happened just while when we were updating the entities listener data. This won't happen now. Before, the listeners were attached to a reader/writer after the creation of them. This means we'd lose events that are generated right at reader/writer creation time. So now, the listener is attached to the reader/writers at creation time, so we can generate and handle those events properly. Issue 1:
The Issue 2: Thanks for your time reviewing the PR! |
edit: I see the issue now
mmm that sounds weird, sounds like a bug in the implementation here .... |
Thinking about this again, this won't be an issue if the In that case you cannot get previous events though (at least, not in the same fashion) |
I do suspect Cyclone's incorrect (in terms of spec'd behaviour) in not setting the status, but even if it is correct there's still the point that a "do nothing" listener consuming the status is also not (always) what you'd want to happen. Not installing a listener by default might work around the problem, but I think it needs to be addressed in Cyclone. (Which I am happy to do, just not entirely certain of some the details yet.)
I agree, and I don't quite see why this would be the case from simply looking at the code. Is there a simple way to reproduce the problem? If so I can probably look at it tomorrow morning. |
The problem is
I'll check about this, don't know how to reproduce it without the EventsExecutor. But I checked that the incompatible QoS listener callback was called, pushing an event. Trying to execute the event gave me a different kind of event type. |
@mauropasse I figured I could try building everything from irobot/add-events-executor branches where I could find them (and master for others) and then try it, but I am running into a build error that makes me think I have mismatched commits:
So I suppose that plan is not going to work out quite so well. From looking at the code I get the impression that the change from one type of event to another type of event is something that doesn't even involve DDS, but I can't image you'd be deleting and creating RMW events and getting memory aliasing here. What I can imagine is that these two event types come in very rapid succession, but that requires (or should require) that in addition to the reader/writer with a QoS mismatch, there is also one that does match. Does that ring any bell? |
Hi @eboasson, we are a PR behind on |
That gives me other errors:
I'm sure it is just a matter of getting the correct set of commits, but with so many packages that is not trivial. You don't happen to have a |
@eboasson I updated my branch. Are you using |
Ah yes, this is the current version of Apple’s gadgets (clang, macOS, even the CPU is the fancy new Apple thingummy). I’ll try your updated version, if I then still run into problems, I’ll try gcc on Linux first :) |
@mauropasse is this the lingering issue you were talking about in the MWWG meeting? I think other DDS API's I've used have a notion of "start disabled", where you can create an entity in the disabled state, attach things to it, reconfigure it, etc., and then you can enable it. Perhaps it was meant to solve cases you describe here. |
@mauropasse I had to make one change to get it to build/work (on macOS, that is) and now the rclcpp
Which I think means I can now reproduce the issue. 🎉 The one code change I needed is this: diff --git a/rclcpp/src/rclcpp/executors/timers_manager.cpp b/rclcpp/src/rclcpp/executors/timers_manager.cpp
index be630988..5c3cfa11 100644
--- a/rclcpp/src/rclcpp/executors/timers_manager.cpp
+++ b/rclcpp/src/rclcpp/executors/timers_manager.cpp
@@ -62,7 +62,11 @@ void TimersManager::start()
}
timers_thread_ = std::thread(&TimersManager::run_timers, this);
+#ifdef __APPLE__
+ pthread_setname_np("TimersManager");
+#else
pthread_setname_np(timers_thread_.native_handle(), "TimersManager");
+#endif
}
void TimersManager::stop() I still get (as I think I mentioned) some compiler warnings: stdout_stderr.log |
@mauropasse, the problem is essentially what @ivanpauno spotted
Except in this test it isn't so much that it is being overridden but that the same value is being used for different events. The When setting the DDS listener on the subscriber, all listeners get installed. The liveliness_changed event does get triggered by the disappearance of the remote writer (I don't see it on matching, I guess that's simply down to the order in which things happen, I didn't dive into that), but because there is only a single When dequeues the event and calls
there's never been a QoS mismatch, hence I'd say it has to distinguish between the different event types. Even then you'll have a problem if multiple It is not my greatest work but the following implements that idea and makes the incompatible QoS warning go away (beware: I'm not user my change to diff --git a/rmw_cyclonedds_cpp/src/rmw_node.cpp b/rmw_cyclonedds_cpp/src/rmw_node.cpp
index 519f610b..de62406f 100644
--- a/rmw_cyclonedds_cpp/src/rmw_node.cpp
+++ b/rmw_cyclonedds_cpp/src/rmw_node.cpp
@@ -340,7 +340,7 @@ struct user_callback_data_t
rmw_listener_callback_t callback {nullptr};
const void * user_data {nullptr};
size_t unread_count {0};
- const void * event_data {nullptr};
+ const void * event_data[DDS_STATUS_ID_MAX+1] {nullptr};
size_t events_unread_count {0};
};
@@ -480,7 +480,7 @@ static void dds_listener_callback(dds_entity_t entity, void * arg)
}
}
-#define MAKE_DDS_EVENT_CALLBACK_FN(event_type) \
+#define MAKE_DDS_EVENT_CALLBACK_FN(event_type, EVENT_TYPE) \
static void on_ ## event_type ## _fn( \
dds_entity_t entity, \
const dds_ ## event_type ## _status_t status, \
@@ -490,24 +490,24 @@ static void dds_listener_callback(dds_entity_t entity, void * arg)
(void)entity; \
auto data = static_cast<user_callback_data_t *>(arg); \
std::lock_guard<std::mutex> guard(data->mutex); \
- if (data->callback && data->event_data) { \
- data->callback(data->event_data); \
+ if (data->callback && data->event_data[DDS_ ## EVENT_TYPE ## _STATUS_ID]) { \
+ data->callback(data->event_data[DDS_ ## EVENT_TYPE ## _STATUS_ID]); \
} else { \
data->events_unread_count++; \
} \
}
// Define event callback functions
-MAKE_DDS_EVENT_CALLBACK_FN(requested_deadline_missed)
-MAKE_DDS_EVENT_CALLBACK_FN(liveliness_lost)
-MAKE_DDS_EVENT_CALLBACK_FN(offered_deadline_missed)
-MAKE_DDS_EVENT_CALLBACK_FN(requested_incompatible_qos)
-MAKE_DDS_EVENT_CALLBACK_FN(sample_lost)
-MAKE_DDS_EVENT_CALLBACK_FN(offered_incompatible_qos)
+MAKE_DDS_EVENT_CALLBACK_FN(requested_deadline_missed, REQUESTED_DEADLINE_MISSED)
+MAKE_DDS_EVENT_CALLBACK_FN(liveliness_lost, LIVELINESS_LOST)
+MAKE_DDS_EVENT_CALLBACK_FN(offered_deadline_missed, OFFERED_DEADLINE_MISSED)
+MAKE_DDS_EVENT_CALLBACK_FN(requested_incompatible_qos, REQUESTED_INCOMPATIBLE_QOS)
+MAKE_DDS_EVENT_CALLBACK_FN(sample_lost, SAMPLE_LOST)
+MAKE_DDS_EVENT_CALLBACK_FN(offered_incompatible_qos, OFFERED_INCOMPATIBLE_QOS)
// Events of type RMW_EVENT_LIVELINESS_CHANGED are wrongly
// taken as RMW_EVENT_REQUESTED_QOS_INCOMPATIBLE events.
// So, lets temporarily disable this event type:
-// MAKE_DDS_EVENT_CALLBACK_FN(liveliness_changed)
+MAKE_DDS_EVENT_CALLBACK_FN(liveliness_changed, LIVELINESS_CHANGED)
static void listener_set_event_callbacks(dds_listener_t * l)
{
@@ -517,7 +517,7 @@ static void listener_set_event_callbacks(dds_listener_t * l)
dds_lset_liveliness_lost(l, on_liveliness_lost_fn);
dds_lset_offered_deadline_missed(l, on_offered_deadline_missed_fn);
dds_lset_offered_incompatible_qos(l, on_offered_incompatible_qos_fn);
- // dds_lset_liveliness_changed(l, on_liveliness_changed_fn);
+ dds_lset_liveliness_changed(l, on_liveliness_changed_fn);
}
extern "C" rmw_ret_t rmw_subscription_set_listener_callback(
@@ -628,6 +628,7 @@ extern "C" rmw_ret_t rmw_guard_condition_set_listener_callback(
template<typename T>
static void event_set_listener_callback(
T event,
+ rmw_event_type_t event_type,
rmw_listener_callback_t callback,
const void * user_data,
bool use_previous_events)
@@ -636,9 +637,23 @@ static void event_set_listener_callback(
std::lock_guard<std::mutex> guard(data->mutex);
+ dds_status_id_t status_id = static_cast<dds_status_id_t>(-1);
+ switch (event_type)
+ {
+ case RMW_EVENT_INVALID: return;
+ case RMW_EVENT_LIVELINESS_CHANGED: status_id = DDS_LIVELINESS_CHANGED_STATUS_ID; break;
+ case RMW_EVENT_REQUESTED_DEADLINE_MISSED: status_id = DDS_REQUESTED_DEADLINE_MISSED_STATUS_ID; break;
+ case RMW_EVENT_LIVELINESS_LOST: status_id = DDS_LIVELINESS_LOST_STATUS_ID; break;
+ case RMW_EVENT_OFFERED_DEADLINE_MISSED: status_id = DDS_OFFERED_DEADLINE_MISSED_STATUS_ID; break;
+ case RMW_EVENT_REQUESTED_QOS_INCOMPATIBLE: status_id = DDS_REQUESTED_INCOMPATIBLE_QOS_STATUS_ID; break;
+ case RMW_EVENT_OFFERED_QOS_INCOMPATIBLE: status_id = DDS_OFFERED_INCOMPATIBLE_QOS_STATUS_ID; break;
+ case RMW_EVENT_MESSAGE_LOST: status_id = DDS_SAMPLE_LOST_STATUS_ID; break;
+ }
+ assert(status_id != static_cast<dds_status_id_t>(-1));
+
// Set the user callback data
data->callback = callback;
- data->event_data = user_data;
+ data->event_data[status_id] = user_data;
if (callback && use_previous_events) {
// Push events happened before having assigned a callback
@@ -660,7 +675,7 @@ extern "C" rmw_ret_t rmw_event_set_listener_callback(
{
auto sub_event = static_cast<CddsSubscription *>(rmw_event->data);
event_set_listener_callback(
- sub_event, callback, user_data, use_previous_events);
+ sub_event, rmw_event->event_type, callback, user_data, use_previous_events);
break;
}
@@ -668,7 +683,7 @@ extern "C" rmw_ret_t rmw_event_set_listener_callback(
{
auto pub_event = static_cast<CddsPublisher *>(rmw_event->data);
event_set_listener_callback(
- pub_event, callback, user_data, use_previous_events);
+ pub_event, rmw_event->event_type, callback, user_data, use_previous_events);
break;
}
} |
Hi @eboasson, I tested your changes and the liveliness change event doesn't happen anymore. Nevertheless I'd like to ask you some questions to make sure that this approach will work for other events.
I'd like to give a little more context about the
In a situation where a new message arrives and also two different event happens on the subscription, the events queue will have 3 events:
Notice that all kind of events (QoS mismatch, liveliness change, etc) pushes the same event into the queue.
My assumption was that the Event Waitable should somehow identify which event to take when calling With your new approach, we would have a So, wouldn't the issue still be present? Or a modification is still needed to identify which event was pushed into the queue? Like
so
Sorry for such a long reply but wanted to make sure that we're on the same page! What do you guys think? @ivanpauno |
Maybe this is what I got wrong, and it would be:
So, is there a waitable for each type of event? Currently I see that for publishers and subscribers only these types of events are set when creating them: |
I'm not sure, @mauropasse 😟 I assumed it would end up like:
but I don't know where the waitable_ptr comes from. I'd assumed it would simply be the With my quick hack, the different DDS listeners map to different slots in the And again, if you'd create two rmw_events for the same event type on the same entity and call |
HI, is there any update with respect to the failing test? |
Signed-off-by: Mauro Passerino <[email protected]>
Use QoS depth on subscriptions to limit callback events
No more than that I have been thinking about it and how I am going to address the shortcomings in my https://github.com/eboasson/cyclonedds/tree/waitset-listener-interaction branch. In other words, it has slowly bubbled up to near the top of my to-do list. |
Any update on the issue blocking this pull request? |
Yes. I have something I am reasonably happy with that should solve the problems without introducing a significant performance degradation. It hasn't been turned into a PR yet, but I expect to do so shortly. There is an interesting detail that is at least of some consequence: in doing this, thinking about it and double-checking what the spec says on a related matter, I discovered that I had misremembered the spec and that this particular problem isn't caused by a bug (as in "deviation from the spec") in Cyclone but rather by the behaviour required by the spec:
The way I read this means that, based on the spec, there should be no expectation that a waitset will trigger if the application installed a listener, and that consequently the current behaviour of Cyclone matches the spec. My guess as to why I misremembered is this:
suggesting you can meaningfully combine them. Given the above, it seems the set of waitset objects to be signalled will necessarily be empty, and means this paragraph really is noise. In any case, I don't see a good alternative for the (proposed/expected/desired) change, short of installing the listeners much later with various complications in getting the initial event queue correct, and/or reimplementing the DDS waitsets in the RMW layer. (Given the RMW waitset interface, that would perhaps be good for performance, but that's a change I don't want to contemplate.) With no spec to point to, I can hardly claim that the change would constitute a bug fix. Therefore, not automatically resetting the status upon invoking the listener will likely have to become opt-in. That is something I haven't done yet. |
Signed-off-by: Mauro Passerino <[email protected]>
Use new Cyclone APIs to coexist listeners/waitset
per @eboasson: Cyclone master has got everything for this. The RMW PR depends on updating the version of cyclone that ROS relies on. That’s why the PR CI fails. |
That's great! Is this new feature something that should only be released to rolling? |
@irobot-ros can you resolve the conflicts? |
Signed-off-by: Mauro Passerino <[email protected]>
Merged master into irobot/add-events-executor
I think we're waiting on ros2/ros2#1174 before the needed changes are available in rolling. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes lgtm, just waiting on update to which version of cyclone is used on rolling, then I can start testing again.
|
Signed-off-by: Mauro Passerino <[email protected]>
Fix linter issues
Signed-off-by: Alberto Soragna <[email protected]>
* add RMW listener APIs Rename Event_callback to ExecutorEventCallback Update name Rename ExecutorEventCallback -> EventsExecutorCallback Rename set_events_executor_callback->set_listener_callback Use data types when setting callbacks Restore name Move rcutils/executor_event_types.h to rmw/ rename event types Rename executor_context->callback_context Rename callback_context->user_data Reorder APIs arguments rename rmw_listener_cb_t->rmw_listener_callback_t Events executor subscriptions support Events executor guard conditions support Add clients&services support Support events (still work to do) Ctest fixes Use dds_entity_t instead of auto. Remove comments PR suggestions Only check callback pointer validity This is the only item that is used in the RMW layer, while the others are simply forwarded. use void * to pass executor ptr Implement use previous events for guard conditions Push events happened before setting callback Add support for events unread count Rework unread count Rework all Proper init gc, clients and services callbacks Remove not needed subscription listener init Address PR comments Crete apis for clarity Rework executor callback data Use RMW renamed file Add support for events Do not set listener event callbacks check user_data before calling callback Enable support for events Fix: set listener to srv/cli -> sub, not pub. * Have callback & data per each event type Signed-off-by: Mauro Passerino <[email protected]> * remove comments Signed-off-by: Mauro Passerino <[email protected]> * Delete listeners after use, since they're copied Signed-off-by: Mauro Passerino <[email protected]> * Remove rmw_event_data_type_t Signed-off-by: Mauro Passerino <[email protected]> * Remove use_previous_event Signed-off-by: Mauro Passerino <[email protected]> * Use unread_count as arg Signed-off-by: Mauro Passerino <[email protected]> * Remove guard condition listener Signed-off-by: Mauro Passerino <[email protected]> * refactor to remove listener term Signed-off-by: William Woodall <[email protected]> * use correct callback for dds qos event listeners Signed-off-by: Alberto Soragna <[email protected]> * Use QoS depth on subscriptions before call callback Signed-off-by: Mauro Passerino <[email protected]> * Use new Cyclone APIs to coexist listeners/waitset Signed-off-by: Mauro Passerino <[email protected]> * Fix linter issues Signed-off-by: Mauro Passerino <[email protected]> * fix linter errors Signed-off-by: Alberto Soragna <[email protected]> Co-authored-by: Mauro <[email protected]> Co-authored-by: William Woodall <[email protected]> Co-authored-by: Alberto Soragna <[email protected]>
This PR introduces the changes required to implement the
EventsExecutor
design in rmw_cyclonedds.See design and Discourse post.
The new executor uses an events queue and a timers manager as opposed to waitsets, to efficiently execute entities with work to do.
This new executor greatly reduces CPU usage of a ROS 2 application.
See the blog post for more details on the tests that we run.
The bulk of the changes for this implementation are in the rclcpp layer, with some minor changes in other repositories (rcl, rmw, rmw_implementation) for forwarding entities, the declaration of some data types in rcutils, and finally some additional changes in the vendor specific rmw implementations..
We currently implemented this only on top of the default ROS middleware fastrtps, while we provided stubs for other middlewares.
See the main PR to rclcpp ros2/rclcpp#1416.
The current implementation does not support ROS 2 actions, which will be added in a follow up PR.
Developed by iRobot
Mauro Passerino
Lenny Story
Alberto Soragna
Connects to: