Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rosout logging to rclrs #422

Merged
merged 14 commits into from
Nov 24, 2024
Merged

Add rosout logging to rclrs #422

merged 14 commits into from
Nov 24, 2024

Conversation

geoff-imdex
Copy link
Contributor

@geoff-imdex geoff-imdex commented Oct 21, 2024

The purpose of this PR is to add logging macros that replicate the logging interface provided by the rclcpp and rclpy interfaces, while aiming to provide an idiomatic rust interface.

This work is based on the initial work from here and here.

Examples of the interfaces provided:

log_debug!(&node.name(), "Simple message");
log_debug!(&node.name(), "Simple message {some_variable}");
log_fatal!(&node.name(), "Simple message from {}", node.name());
log_warn!(&node.name(), LogConditions::once(), "Only log this the first time");
log_error!(&node.name(),
           LogConditions::skip_first_throttle(Duration::from_millis(1000)),
           "Noisy error that we expect the first time");

log_info!(&node.name(), LogConditions { occurs: LoggingOccurrence::Always,
                                        publish_interval: Duration::from_millis(1000),
                                        log_if_true: count % 10 == 0, },
          "Manually constructed LogConditions");

The macros make use of the LogConditions struct which provides the control over the logging, e.g. throttling, occurrence (once, skip first), etc.
The most common control constructs can be instantiated using the provided builder functions, e.g.

/// Only log the message on the first occurrence
LogConditions::once() 

/// Only log the message at the specified duration
LogConditions::throttle(Duration)

/// Only log the message if the condition is true.
LogConditions::log_if_true(<some expression>)

Known Issues

1. The is currently an issue on node closure (as we process the rcl_node_fini call) where the "impl" pointer appears to be "null" when we try too fini the rosout publisher (in rcl_logging_rosout_fini_publisher_for_node). The rest of the node closure occurs successfully. Note: the rosout usage can be disabled with --ros-args --disable-rosout-logs and the error will not occur.
This issue has been addressed in the latest commit. I believe that the issue was caused by the rcl_node variable in builder.rs going out of scope during the "drop" sequence. The fix is to allocate the rcl_node on the heap (using Box).
(As a side note: technically, we may have the same issue with the Context object).
2. We use static variables to facilitate the "once", "skip_first", and throttling functionality, however, I believe it would be preferred
to use the lazy_static. At this moment, our rust compiler is 1.79 and therefore the lazy_static is not regarded as stable.

@jhdcs jhdcs requested a review from maspe36 October 23, 2024 11:45
@esteve
Copy link
Collaborator

esteve commented Oct 30, 2024

@geoff-imdex thank you so much for your PR. Given that the code is based on the work done at r2r and at rules_ros2, it'd be good to have some feedback from the authors of r2r, and maybe we could a shared crate with this code so as to not duplicate work. Legally speaking we can copy that code (both r2r and ros2-rust are license as Apache 2.0), but it's always nice to ask.

@m-dahl @kristoferB is it ok with you guys if we copy this code from r2r into ros2-rust? We'd of course attribute credit. Moreover, would you be interested in spinning off the logging code of r2r into a shared crate that both projects can use?

///
/// [1]: <https://doc.rust-lang.org/reference/destructors.html>
pub(crate) struct NodeHandle {
pub(crate) rcl_node: Mutex<rcl_node_t>,
pub(crate) rcl_node: Mutex<Box<rcl_node_t>>,
Copy link
Collaborator

@mxgrey mxgrey Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I check why this was put into a Box? The contents of rcl_node_t are just pointers so there shouldn't be any issues with moving the address of the rcl_node_t (and its parent struct is managed by an Arc, so its address shouldn't change anyway).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question - without this change it appears as though the node handle goes out of scope before we get chance to clear the logging publisher (in humble this occurs in the rcl_node_fini() call). The behaviour without the change is that you will get a segfault on node closure because the object pointed to by the rcl_node_t is no longer valid. Took a bit of debugging to get to the bottom of it - and honestly, I don't particularly like the change with the extra de-refs.
Looking at Jazzy, I believe that the issue will no longer occur due the logging publisher being managed outside the rcl_node_init/fini calls.
For reference the failure occurs here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. I think we do want to keep supporting humble for now, so I'll try to reproduce the seg fault in humble and see if there may be alternative solution.

I'm still pretty lost on how boxing the rcl_node_t would affect its lifetime, but if I can reproduce the problem then I should be able to make sense of it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh actually if rcl itself is storing the rcl_node_t* value somewhere and using it without consideration for its lifetime then I can see how this would happen. I'll check if that's the case, and maybe we can set things up so that we don't initialize the rcl_node_t until it's already inside the NodeHandle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's it in a nutshell - the rosout code keeps a record of the node used to create it (this line specifically) and this pointed to the stack variable in NodeHandle (here). During the rcl_node_fini we call rcl_logging_rosout_fini_publisher_for_node which tries to reference the stack pointer, however, this area of memory it points to has now been mangled, i.e. the pointer still points to the same memory location but the values at that spot are garbage.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gone ahead and put together an alternative fix here: geoff-imdex#5

Comment on lines 6 to 8
use std::ffi::CString;
use std::sync::Mutex;
use std::time::Duration;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
use std::ffi::CString;
use std::sync::Mutex;
use std::time::Duration;
use std::{ffi::CString, sync::Mutex, time::Duration};

Comment on lines 199 to 202
/// use rclrs::{log_debug, log_error, log_fatal, log_info, log_warn, LogConditions, LoggingOccurrence};
/// use std::sync::Mutex;
/// use std::time::Duration;
/// use std::env;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Condense the use declarations here, and hide them from the output docs

Suggested change
/// use rclrs::{log_debug, log_error, log_fatal, log_info, log_warn, LogConditions, LoggingOccurrence};
/// use std::sync::Mutex;
/// use std::time::Duration;
/// use std::env;
/// # use std::{env, sync::Mutex, time::Duration};
/// # use rclrs::*;
///
/// # let context = Context::new(env::args()).unwrap();
/// # let node = Node::new(&context, "test_node").unwrap();

The # is used to hide lines from the output documentation but also allows the lines to be executed so the doc test will pass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks for the tip.

///
/// [1]: <https://doc.rust-lang.org/reference/destructors.html>
pub(crate) struct NodeHandle {
pub(crate) rcl_node: Mutex<rcl_node_t>,
pub(crate) rcl_node: Mutex<Box<rcl_node_t>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked this code out locally, reverted this change, and I was able to pass all the tests and run the examples. What problem specifically were you seeing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've replied above but as a minimal example of how I was able to cause the segfault:

use rclrs::{
    log_debug, log_error, log_fatal, log_info, log_warn, LogConditions, LoggingOccurrence, Node,
};
static NODE_NAME: &str = "test_node";

// Minimal test code
use std::env;
use std::thread::sleep;
fn main() -> Result<(), rclrs::RclrsError> {
    log_warn!(NODE_NAME, "Starting");
    let context = rclrs::Context::new(env::args()).unwrap();
    let node = rclrs::Node::new(&context, NODE_NAME).unwrap();
    sleep(Duration::from_secs(1));
 
    log_warn!(NODE_NAME, "Finishing");
    Ok(())
}

Note: some of the use statements are unnecessary in the code - I've just hacked up an existing test app. We also had a similar test in our CI that showed the same behaviour.
Happy to go in to more detail if needed.

@mxgrey
Copy link
Collaborator

mxgrey commented Oct 30, 2024

In rclcpp there's a Logger class that bundles up parameters like the name of the logger and severity filter. It also allows hierarchical structures where you can create a Logger which is a child of another Logger and it will inherit settings like a name prefix and severity level (which can be overridden).

In this PR the name and severity seem to be loose values. In fact I don't see where severity gets filtered at all (e.g. we wouldn't want DEBUG severity being printed out unless the user asks for DEBUG level logging).

Would we be interested in having a Logger struct similar to rclcpp's? I could draft that up.

I do like the LogConditions of this implementation, though. That's a very nice mechanism.

.expect("Valid c style string required, e.g. check for extraneous null characters");
let severity = severity.to_native();

let _guard = LOG_GUARD.lock().unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is rclrs should use ENTITY_LIFECYCLE_MUTEX, which is the mutex that rclrs uses to ensure that global variables within the rcl layer and lower are protected.

rcl logging is notoriously not thread safe, so it's important to make sure we're locking appropriately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good comment. The thought had crossed my mind on this one - I thought it more akin to the mutex in rclpp logging, but I'm happy to change it to the ENTITY_LIFECYCLE_MUTEX

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taken care of by geoff-imdex#6

// sequence is unnecessarily complex
let allocator = rcutils_get_default_allocator();

rcl_logging_configure(&rcl_context.global_arguments, &allocator).ok()
Copy link
Collaborator

@mxgrey mxgrey Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will get called every time a Context is created, but it's meant to be called only once. It's possible for users to create multiple Contexts in one application (we do this a lot in tests), so we'd be making redundant calls.

Similar to ros2/rclcpp#998 we should guard this from being called multiple times. We also need to make sure that rcl_logging_fini is only called after all Contexts are destructed.

This is a tricky problem to resolve in a sound way. I might suggest something like this:

struct LoggingConfiguration {
    lifecycle: Mutex<Weak<LoggingLifecycle>>
}

impl LoggingConfiguration {
    fn configure(args: &rcl_arguments_t) -> Result<Arc<LoggingLifecycle>, RclError> {
        static CONFIGURATION: LazyLock<Self> = LazyLock::new(|| Self { lifecycle: Mutex::new(Weak::new()) });
        let mut lifecycle = CONFIGURATION.lifecycle.lock().unwrap();
        if let Some(arc_lifecycle) = lifecycle.upgrade() {
            return Ok(arc_lifecycle);
        }
        let arc_lifecycle = Arc::new(LoggingLifecycle::new(args)?);
        *lifecycle = arc_lifecycle.downgrade();
        Ok(arc_lifecycle);
    }
}

struct LoggingLifecycle;
impl LoggingLifecycle {
    fn new(args: &rcl_arguments_t) -> Result<Self, RclError> {
        let _lifecycle_lock = ENTITY_LIFECYCLE_MUTEX.lock().unwrap();
        let allocator = rcutils_get_default_allocator();
        rcl_logging_configure(args, &allocator).ok()?;
        Self
    }
}

impl Drop for LoggingLifecycle {
    fn drop(&mut self) {
        let _lifecycle_lock = ENTITY_LIFECYCLE_MUTEX.lock().unwrap();
        rcl_logging_fini();
    }
}

Then we would add an Arc<LoggingLifecycle> to the ContextHandle. Once all ContextHandles are gone, the LoggingLifecycle will drop and call rcl_logging_fini. If a new context is made after that, rcl_logging_configure will be called again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really good pick-up and suggestion @mxgrey - thank you. Will update the code to make use of your suggestion.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took the liberty of opening a PR for this: geoff-imdex#6

It's almost exactly what I recommended above.

@m-dahl
Copy link
Collaborator

m-dahl commented Oct 30, 2024

@geoff-imdex thank you so much for your PR. Given that the code is based on the work done at r2r and at rules_ros2, it'd be good to have some feedback from the authors of r2r, and maybe we could a shared crate with this code so as to not duplicate work. Legally speaking we can copy that code (both r2r and ros2-rust are license as Apache 2.0), but it's always nice to ask.

@m-dahl @kristoferB is it ok with you guys if we copy this code from r2r into ros2-rust? We'd of course attribute credit. Moreover, would you be interested in spinning off the logging code of r2r into a shared crate that both projects can use?

Thanks for the mention, happy to see it used and improved. In the long term I would be in favor of a shared crate based on this work (perhaps it could be a starting point to for more code sharing), but right now I don't have much time to work on ros stuff so for now I think you should just go ahead and do what makes most sense for this crate.

@geoff-imdex
Copy link
Contributor Author

In rclcpp there's a Logger class that bundles up parameters like the name of the logger and severity filter. It also allows hierarchical structures where you can create a Logger which is a child of another Logger and it will inherit settings like a name prefix and severity level (which can be overridden).

As you can tell, I avoided this extra complexity with this implementation as I wasn't sure (naivety/ignorance) what benefit/use it would bring and was a little outside the scope of what I needed.

In this PR the name and severity seem to be loose values. In fact I don't see where severity gets filtered at all (e.g. we wouldn't want DEBUG severity being printed out unless the user asks for DEBUG level logging).

It's not obvious, but the filtering occurs in the rcutils_log function. I'm sure you're across this already, but just to be sure: the rcutils logging code basically has a look-up table that tracks the logger name and what logging level is enabled for it. It does mean that we do a certain amount of function calls for levels that aren't enabled, however, they're never published. The rclcpp implementation uses some very smart macro construction to include logs at compile time, but I'm not sure we would (or could?) want to do the same thing.

Would we be interested in having a Logger struct similar to rclcpp's? I could draft that up.

I do like the LogConditions of this implementation, though. That's a very nice mechanism.
Thanks!

@geoff-imdex
Copy link
Contributor Author

@mxgrey , @maspe36 , really appreciate you taking the time to review the code and provide some helpful feedback. I'll try and incorporate the suggestions over the next few days or so.
As to the segfault we observed locally, happy to provide more details as needed - would be interesting to note if you still the same behaviour with a minimal test app (create node + context, and shutdown).

@mxgrey
Copy link
Collaborator

mxgrey commented Nov 1, 2024

the rcutils logging code basically has a look-up table that tracks the logger name and what logging level is enabled for it.

I've never dug into how logging works at the rcl layer, so I actually didn't know this detail of the implementation. Thanks for highlighting this.

That actually makes me feel much more strongly that we should have a firm Logger struct rather than using &str. The current API in the PR would allow rclrs users to pass arbitrary names into the macros which might not be names of actual loggers that rcl recognizes, and it sounds like that could lead to misbehavior in the rcl layer. If we have the macros require a Logger object to be passed in then we can enforce that the user is always using a valid logger that rcl will know about.

I have some ideas for how to give rclrs a Logger struct that aligns with rclcpp's while also covering the capabilities of LogCondition, but implementing the ideas will take about the same amount of time as explaining them so I'll open a PR for that which targets your branch if you don't mind @geoff-imdex .

@geoff-imdex
Copy link
Contributor Author

geoff-imdex commented Nov 1, 2024

That actually makes me feel much more strongly that we should have a firm Logger struct rather than using &str. The current API in the PR would allow rclrs users to pass arbitrary names into the macros which might not be names of actual loggers that rcl recognizes, and it sounds like that could lead to misbehavior in the rcl layer. If we have the macros require a Logger object to be passed in then we can enforce that the user is always using a valid logger that rcl will know about.

Not disagreeing with the suggestion to create a logging struct, but it is worth highlighting that the rcl logging interface explicitly allows passing in arbitrary log names, e.g. the logging macros throughout the rcl code pass in a simple string. Perhaps a better way to phrase - the logging system supports existing without requiring a node to be associated - in this scenario the rcl layer uses the default log values (INFO level and logging to stdout/stderr). It is only when we want to publish to /rosout that the logging is associated with a node (via the implementation in logging_rosout.c).
To be clear, I think the suggestion to create a logging struct will add a lot of value for the nodes that use the logging interface.

I have some ideas for how to give rclrs a Logger struct that aligns with rclcpp's while also covering the capabilities of LogCondition, but implementing the ideas will take about the same amount of time as explaining them so I'll open a PR for that which targets your branch if you don't mind @geoff-imdex .

Please feel free! And thank you!

@mxgrey
Copy link
Collaborator

mxgrey commented Nov 2, 2024

it is worth highlighting that the rcl logging interface explicitly allows passing in arbitrary log names

That's good to know. In that case I'll tweak the macro so that its first argument can be anything that implements Borrow<str>. Then we can implement Borrow<str> for Node as well as the new Logger struct that I'll introduce, and users will be able to pass in any of those things, as well as &str and &String and basically anything stringy.

@mxgrey
Copy link
Collaborator

mxgrey commented Nov 3, 2024

For visibility I'll mention here that I've opened a PR targeting the branch of this one that introduces a Logger struct, adds the ability to set severity levels for the loggers, and streamlines the syntax of the logging macros, especially for setting conditions: geoff-imdex#7

@geoff-imdex
Copy link
Contributor Author

@mxgrey , the iron builds are throwing some linting errors which I'll fix up. As the changes are based from your fork, my plan is to merge in your PRs (starting with the unboxed_logging fix) and then address the linting issues in dev branches from this fork. Hopefully this process works OK for you, but please let me know if that's not the case and we can work out a way forward.

The logging params PR is really nice. As a rust newbie there's some really nice techniques in there that I'll be sure to use in the future.

@luca-della-vedova
Copy link
Collaborator

CI failures look real, specifically using LazyCell that was only stabilized in Rust 1.80 while our MSRV is lower

@geoff-imdex
Copy link
Contributor Author

CI failures look real, specifically using LazyCell that was only stabilized in Rust 1.80 while our MSRV is lower

@mxgrey has done some great work to fix the build issues (among many other improvements), unfortunately, I've not had time to review and merge in. Hoping to get to that shortly.

@mxgrey
Copy link
Collaborator

mxgrey commented Nov 8, 2024

The latest and greatest version can be found here: geoff-imdex#11

The CI is green over there since I've replaced every use of LazyLock with OnceLock which was stabilized in 1.70.

Signed-off-by: Michael X. Grey <[email protected]>
@mxgrey
Copy link
Collaborator

mxgrey commented Nov 11, 2024

@esteve @maspe36 @geoff-imdex @luca-della-vedova

Following up on the discussion in the working group meeting, would there be any preference for or against renaming interval to cooldown to make the behavior more clear?

The concern was that interval could be interpreted as "keep publishing this message with the given period" instead of "don't republish this message until the given period has passed", so the word cooldown might help make that more clear.

@geoff-imdex
Copy link
Contributor Author

@esteve @maspe36 @geoff-imdex @luca-della-vedova

Following up on the discussion in the working group meeting, would there be any preference for or against renaming interval to cooldown to make the behavior more clear?

The concern was that interval could be interpreted as "keep publishing this message with the given period" instead of "don't republish this message until the given period has passed", so the word cooldown might help make that more clear.

No strong opinions on this from my perspective. I had thought I'd followed the rcutils macro wording, but looking again I can see they use throttle - so potentially a small argument for consistency, but cooldown works just as well.

@mxgrey
Copy link
Collaborator

mxgrey commented Nov 12, 2024

The reason I moved away from throttle is because in my mind the word "throttle" implies we should provide a rate (frequency) but we actually provide a duration (cooldown interval).

That being said it's true that throttle is the terms used in ROS 1 and rclcpp, so I don't mind going back to throttle for the sake of consistency with other client libraries. I'd propose a casual vote via reaction emojis on this comment:

❤️ throttle
🚀 cooldown

Signed-off-by: Michael X. Grey <[email protected]>
Signed-off-by: Michael X. Grey <[email protected]>
Signed-off-by: Michael X. Grey <[email protected]>
Co-authored-by: Nathan Wiebe Neufeldt <[email protected]>
@mxgrey
Copy link
Collaborator

mxgrey commented Nov 18, 2024

All the intended features for logging are finished, along with tests. I've also updated the API based on the working group discussion last week.

This PR should be ready for maintainer reviews now.

Copy link
Collaborator

@jhdcs jhdcs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a single typo, otherwise this looks good to me! Thank you so much for this!

///
/// [1]: <https://doc.rust-lang.org/reference/destructors.html>
pub(crate) struct NodeHandle {
pub(crate) rcl_node: Mutex<rcl_node_t>,
pub(crate) context_handle: Arc<ContextHandle>,
/// In the humbe distro, rcl is sensitive to the address of the rcl_node_t
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// In the humbe distro, rcl is sensitive to the address of the rcl_node_t
/// In the humble distro, rcl is sensitive to the address of the rcl_node_t

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, fixed by 622e887

Copy link
Collaborator

@luca-della-vedova luca-della-vedova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you merge main into this PR? There are a lot of spurious changes from other PRs that make it a bit difficult to review.
Another high level thing I would like to see is updating the examples to make sure of logging since we have it (i.e. here)
I'll try to give it a more thorough review when the diff is smaller

Comment on lines 24 to 27
pub struct Logger {
name: Box<str>,
c_name: CString,
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it was mentioned in the WG meeting that it can be fairly useful to be able to clone the Logger so it can be used from outside the node that defined it. In that case should we derive Clone?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point, and I can change Box<str> to Arc<str> and CString to Arc<CString> for better efficiency.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in geoff-imdex#13 and just waiting to be merged in that PR

Comment on lines 127 to 134
// If we have a throttle duration then check if we're inside or outside
// of that interval.
let throttle = params.get_throttle();
if throttle > std::time::Duration::ZERO {
static LAST_LOG_TIME: OnceLock<Mutex<SystemTime>> = OnceLock::new();
let last_log_time = LAST_LOG_TIME.get_or_init(|| {
Mutex::new(std::time::SystemTime::now())
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too sure about this implementation, from what I see it hardcodes the time to a std::time::SystemTime but in ROS 2 the clock could be a ROS time.
The rclcpp implementation also seems to allow specifying arbitrary clocks and has unit tests to verify that passing a ROS clock correctly throttles it.
I would suggest either making the clock to be used configurable or maybe easier just defaulting to the node's clock, so at least when running on simulated time we will behave consistently regardless of the speed of the simulation.

Copy link
Collaborator

@mxgrey mxgrey Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, the idea of syncing it against ROS time didn't even occur to me. That strikes me as a weird use case, but I'm not opposed to it.

We can't rely on the node clock without making it mandatory to have access to a node or at least its clock. That could make logging unusable in code regions where it shouldn't have to be.

I don't mind making this configurable by adding a .throttle_clock( _ ) option to LogParams that takes in an enum like

pub enum ThrottleClock {
    Clock(Clock),
    SystemTime,
    SteadyTime,
}

and this would default to SteadyTime (which I think is what 99.99% of use cases would want).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in geoff-imdex#13 and just waiting to be merged in that PR

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I admit I have never used the throttling API but when I'm in doubt on "what is the usage in the ecosystem" I usually just throw a global github search and glance at the code out there.

It is probably mandated by the fact that the macro in C++ requires you to pass a clock and it is probably easier to pass the node clock rather than create a new steady_clock just for that purpose, but it seems the majority of applications do indeed pass in the node clock (which would be running in either system time or simulated time depending on parameters).
I also wonder, what's the reason for redefining a ThrottleClock enum rather than using Clock? It also offers steady time / system time enum variants

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems the majority of applications do indeed pass in the node clock

Are you suggesting that we make it mandatory, (e.g. instead of .throttle(duration) we only offer .throttle(clock, duration))? If so, what do you perceive as the benefit of mandating an explicit clock instead of using steady clock by default and letting the user override that?

In my mind ROS time exists primarily to make sure that system behavior can be consistent across real world usage and simulated usage. Log throttling doesn't (..or at least shouldn't..) have an effect on the behavior of the system beyond preventing log spam. It's very hard for me to imagine a case where using ROS time is preferable for reducing log spam, so I would much prefer minimizing API noise and providing a sensible default which will probably always be what users actually want.

In any case the throttle time source type is an optional parameter in rclpy's logging API with a default of steady time, which is exactly what I've done in geoff-imdex#13, so we'll be matching the API of rclpy even if we don't match the API of rclcpp.

what's the reason for redefining a ThrottleClock enum rather than using Clock? It also offers steady time / system time enum variants

The difference is ThrottleClock borrows a clock so there's no need to clone a clock. Cloning a clock would involve incrementing and decrementing an atomic int, which isn't a terrible amount of overhead but it does involve cache syncs which should be avoided whenever possible for hot paths.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting that we make it mandatory, (e.g. instead of .throttle(duration) we only offer .throttle(clock, duration))? If so, what do you perceive as the benefit of mandating an explicit clock instead of using steady clock by default and letting the user override that?

Not really, I see the benefit of the sensible default, which is why I mentioned that probably people pass in the node clock just because the API forces them to and that's the most easily available clock. I'm fine with the current API.

The difference is ThrottleClock borrows a clock so there's no need to clone a clock. Cloning a clock would involve incrementing and decrementing an atomic int, which isn't a terrible amount of overhead but it does involve cache syncs which should be avoided whenever possible for hot paths.

I see, my main point was about the underlying implementation as well, I am not quite sure but would it make sense to use a Clock with ClockKind::SystemTime / ClockKind::SteadyTime for the underlying implementation in the macro, rather than relying on Rust builtin methods?
I can see how the Rust method is probably cleaner and faster since it doesn't need to build an rcl clock or go through bindings. I however do not honestly know if the behavior of the two is guaranteed to be the same or not, and whether that might be an issue (I suppose both will do the same underlying system calls to get a steady / system time so it's probably OK?).
I'm also slightly put off by having different ways to deal with time internally (rcl / rust std), but I suppose this being purely an implementation detail and not exposed to users it is probably not a big deal?

Copy link
Collaborator

@mxgrey mxgrey Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see how the Rust method is probably cleaner and faster since it doesn't need to build an rcl clock or go through bindings.

Yes, all of this, plus the fact that going through the rcl bindings also forces a heap allocation (creating the Arc<Mutex<rcl_clock_t>>) which is something to avoid for hot paths, and the whole point of throttling a log is because that log exists in a hot path.

I do agree that there's a somewhat silly amount of duplication of clock abstractions going on here, but here's the way I would look at it:

  • ThrottleClock allows a choice of clock to be made at the point in the code where the logging is happening
  • ThrottleClock::Clock allows a choice of clock to be made elsewhere (encapsulated by a rclrs::Clock), and the logging will just abide by whatever that choice was

I suppose both will do the same underlying system calls to get a steady / system time so it's probably OK?

I would expect so as well. I'd be very surprised if they're using different syscalls.

I'm also slightly put off by having different ways to deal with time internally (rcl / rust std)

Yeah I don't love it myself. It mostly comes from rclrs trying to accommodate the full breadth of rcl's clock interface. Personally I wouldn't mind if we narrow down on what we support from the API of rcl_clock_t.. maybe even not use rcl_clock_t at all and just implement the whole abstract clock concept on the Rust side without any rcl baggage..? But that's definitely out of scope for this PR.

Signed-off-by: Michael X. Grey <[email protected]>
Signed-off-by: Michael X. Grey <[email protected]>
Signed-off-by: Michael X. Grey <[email protected]>
Co-authored-by: Nathan Wiebe Neufeldt <[email protected]>
@luca-della-vedova
Copy link
Collaborator

The diff seems to still be messed up? I wonder if updating upstream main and rebasing is the only way to solve it?

@mxgrey
Copy link
Collaborator

mxgrey commented Nov 20, 2024

@luca-della-vedova I believe the diff should be good now. Let me know if it still looks wonky for you.

@luca-della-vedova
Copy link
Collaborator

I also mentioned in my previous comment that it would be nice to use it in the examples rather than what we do currently which is just doing println.

Anyway I gave it a spin by applying the following patch locally:

$ git diff
diff --git a/examples/minimal_pub_sub/src/minimal_publisher.rs b/examples/minimal_pub_sub/src/minimal_publisher.rs
index 7200869..5931327 100644
--- a/examples/minimal_pub_sub/src/minimal_publisher.rs
+++ b/examples/minimal_pub_sub/src/minimal_publisher.rs
@@ -1,6 +1,7 @@
 use std::env;
 
 use anyhow::{Error, Result};
+use rclrs::{log, ToLogParams};
 
 fn main() -> Result<(), Error> {
     let context = rclrs::Context::new(env::args())?;
@@ -16,7 +17,7 @@ fn main() -> Result<(), Error> {
 
     while context.ok() {
         message.data = format!("Hello, world! {}", publish_count);
-        println!("Publishing: [{}]", message.data);
+        log!(node.info().skip_first().throttle(std::time::Duration::from_millis(1000)), "Publishing: [{}]", message.data);
         publisher.publish(&message)?;
         publish_count += 1;
         std::thread::sleep(std::time::Duration::from_millis(500));

Then running the publisher:

ros2 run examples_rclrs_minimal_pub_sub minimal_publisher

It seems to do the right thing

[INFO] [1732159768.496156067] [minimal_publisher]: Publishing: [Hello, world! 4]
[INFO] [1732159769.496589258] [minimal_publisher]: Publishing: [Hello, world! 6]

Do you think it's worth changing the examples here or should we just do it in a followup PR to avoid making the diff even larger? I'm happy either way but I'd rather this TODO doesn't get lost

Comment on lines +13 to +18
/// Specify a publication throttling interval for the message. A value of ZERO (0) indicates that the
/// message should not be throttled. Otherwise, the message will only be published once the specified
/// interval has elapsed. This field is typically used to limit the output from high-frequency messages,
/// e.g. if `log!(logger.throttle(Duration::from_secs(1)), "message");` is called every 10ms, it will
/// nevertheless only be published once per second.
throttle: Duration,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, can this be an Option<Duration> instead to be a bit more idiomatic about whether a throttling setting exists or not?

Copy link
Collaborator

@mxgrey mxgrey Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did consider this, but I decided to stick with simply Duration for the following reasons:

  • This field is private, so whether it's Duration or Option<Duration> is an implementation detail that should be considered invisible to users.
  • Users never need to know to set Duration::new(0, 0) to get no throttle because no throttle is implied by default. Users only need to specify positive throttle. Therefore as far as users are concerned, there is no special rule that they need to know about.
  • We need to check for zero or negative throttle anyway, even if we have an Option<Duration> because a negative duration would be invalid and a zero duration should have us skip throttling, just like we currently do, so Option<Duration> gives us no advantage for the implementation.
  • A zero duration throttle is mathematically equivalent to no throttle, so we're not introducing a nonsense meaning for the value, in contrast to a simulator using 0 mass to mean an object is unmovable, where 0 is being given the opposite of its actual physical meaning.
  • This is a very very minor concern, but Option<Duration> has a slightly larger memory footprint than Duration.

I'm open to reconsidering, but overall I would still lean towards just using Duration.

Comment on lines -1369 to +1372
let node = create_node(&ctx, "param_test_node").unwrap();
let node = create_node(&ctx, &format!("param_test_node_{}", line!())).unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are those changes to make sure we don't have name clashes when running tests?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the tests were producing lots of spam warnings because of clashing node names while tests ran in parallel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some extra context, pre-Jazzy, the rcl node was responsible for creating the rosout publisher and the rosout logging code would warn if it had already created a publisher for a node with the same name.
In the Jazzy (and rolling) releases, the creation of the rosout publisher is no longer the responsibility of the rcl node and so the issue doesn't occur.
It does also mean that we will need to have Jazzy (and on) specific code for explicitly creating the rosout publisher. Following the rclcpp/rclpy implementations, this can easily be added to the node builder code. I'm happy to create a ticket for this so the work doesn't get lost if that helps?

@mxgrey
Copy link
Collaborator

mxgrey commented Nov 21, 2024

Do you think it's worth changing the examples here or should we just do it in a followup PR to avoid making the diff even larger?

I'd like to update the examples in a follow-up PR so we don't keep inflating the size of this one. I think the existing tests give enough demonstration for the reviewers of what the logging API looks like and evidence that it works as intended. Examples can be something we give more thought to after the implementation is in.

I'll open an issue for this so we don't lose sight of it.

Copy link
Collaborator

@luca-della-vedova luca-della-vedova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating, I only tested it locally to verify the API / behavior is correct and looked through the code at a high level but I agree that this functionality is very important and I don't want to block it on minor nits here and there

@jhdcs jhdcs self-requested a review November 21, 2024 13:16
@mvukov mvukov mentioned this pull request Nov 24, 2024
@maspe36 maspe36 merged commit 217f42c into ros2-rust:main Nov 24, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants