NVIDIA-accelerated, deep learned stereo disparity estimation
Learn how to use this package by watching our on-demand webinar: Using ML Models in ROS 2 to Robustly Estimate Distance to Obstacles
The vision depth perception problem is generally useful in many fields of robotics such as estimating the pose of a robotic arm in an object manipulation task, estimating distance of static or moving targets in autonomous robot navigation, tracking targets in delivery robots and so on. Isaac ROS DNN Stereo Depth is targeted at two Isaac applications, Isaac Manipulator and Isaac Perceptor. In Isaac Manipulator application, ESS is deployed in Isaac ROS cuMotion package as a plug-in node to provide depth perception maps for robot arm motion planning and control. In this scenario, multi-camera stereo streams of industrial robot arms on a table task are passed to ESS to obtain corresponding depth streams. The depth streams are used to segment the relative distance of robot arms from corresponding objects on the table; thus providing signals for collision avoidance and fine-grain control. Similarly, the Isaac Perceptor application uses several Isaac ROS packages, namely, Isaac ROS Nova, Isaac ROS Visual Slam, Isaac ROS Stereo Depth (ESS), Isaac ROS Nvblox and Isaac ROS Image Pipeline.
ESS is deployed in Isaac Perceptor to enable Nvblox to create 3D voxelized images of the robot surroundings. Specifically, the Nova developer suite provides 3x stereo-camera streams to Isaac Perceptor. Each stream corresponds to the front, left, and right cameras. In both Isaac Manipulator and Isaac Perceptor, a camera-specific image processing pipeline consisting of GPU-accelerated operations, provides rectification and undistortion of the input stereo images. All stereo stream image pair are time synchronized before before passing them to ESS. ESS node outputs corresponding depth maps for all three preprocessed image streams and combines the depth images with motion signals provided by cuVSLAM module. The combined depth and motion integrated signals are fed to Nvblox module to produce a dense 3D volumetric scene reconstruction of the surrounding scene.
Above, ESS node is used in a graph of nodes to provide a disparity prediction from an input left and right stereo image pair. The rectify and resize nodes pre-process the left and right frames to the appropriate resolution. The aspect ratio of the image is recommended to be maintained to avoid degrading the depth output quality. The graph for DNN encode, DNN inference, and DNN decode is included in the ESS node. Inference is performed using TensorRT, as the ESS DNN model is designed with optimizations supported by TensorRT.
This package is powered by NVIDIA Isaac Transport for ROS (NITROS), which leverages type adaptation and negotiation to optimize message formats and dramatically accelerate communication between participating nodes.
Sample Graph |
Input Size |
AGX Orin |
Orin NX |
x86_64 w/ RTX 4090 |
---|---|---|---|---|
DNN Stereo Disparity Node Full |
576p |
103 fps 12 ms @ 30Hz |
42.1 fps 26 ms @ 30Hz |
350 fps 2.3 ms @ 30Hz |
DNN Stereo Disparity Node Light |
288p |
306 fps 5.6 ms @ 30Hz |
143 fps 9.4 ms @ 30Hz |
350 fps 1.6 ms @ 30Hz |
DNN Stereo Disparity Graph Full |
576p |
33.5 fps 25 ms @ 30Hz |
35.2 fps 34 ms @ 30Hz |
350 fps 5.6 ms @ 30Hz |
DNN Stereo Disparity Graph Light |
288p |
179 fps 14 ms @ 30Hz |
126 fps 15 ms @ 30Hz |
350 fps 4.4 ms @ 30Hz |
Please visit the Isaac ROS Documentation to learn how to use this repository.
Update 2024-09-26: Updated for ESS 4.1 trained on additional samples