Source Build Configuration Options #4132

VasuAgrawal · 2019-06-02T04:23:13Z

Required Info
Camera Model	D435 / D435i
Firmware Version	5.11.1.100
Operating System & Version	Linux (Ubuntu 18.04)
Kernel Version (Linux Only)	4.15, 4.9.108-tegra
Platform	PC, Nvidia Xavier
SDK Version	v2.22.0
Language	C++, Python
Segment	Robotics

Issue Description

Hi,

I'm trying to understand the tradeoffs for enabling / disabling each of the configuration options for librealsense, as defined in the CMake/lrs_options.cmake file. The majority of the options seem largely self-explanatory (or not related to my use case), but there are a few interesting ones that I'd like to better understand the tradeoffs for, ultimately to determine if I should enable the feature.

The ones that I'd like more information on are as follows:

BUILD_WITH_CUDA: Which functions in librealsense will be accelerated by using CUDA? Does the use of CUDA tend to improve performance of said functions, or does the performance remain comparable with pure-CPU implementations due to memory copy overhead? Does the answer change when switching to a Jetson platform (ARM CPU and integrated GPU / CPU memory)?
BUILD_GLSL_EXTENSIONS: How does the performance of the GLSL shader-based acceleration compare to the pure CUDA implementations on systems that support CUDA? Is there a reason to use the GLSL extensions if CUDA is available? Additionally, are all of the same features that are accelerated by CUDA accelerated by the GLSL shaders?
BUILD_WITH_OPENMP: Similarly, what features does OpenMP accelerate, and by how much (typically)?
ENABLE_ZERO_COPY: Typically, zero copy transport is good for performance - I'm curious why this option is disabled by default. What are the tradeoffs I should understand when enabling zero copy? Also, what data uses zero copy transport with this flag? Is there additional work that a user needs to do to access the data with zero copy transport?
HWM_OVER_XU: What are HWM commands? Didn't find anything relevant on Google.

More generally, perhaps it's worth mentioning the additional configuration options for the source build in the documentation for the source build. Maybe this question could be linked as well, for additional information on the options?

The text was updated successfully, but these errors were encountered:

dorodnic · 2019-06-02T05:15:46Z

Hi @VasuAgrawal
There is some documentation on the wiki but perhaps not sufficient.

CUDA optimizations were specifically introduced to combat performance bottlenecks on the Jetson. There, especially when you enable high-performance mode, they make huge difference. On regular setups this might or may not result in significant boots, and is mostly not needed. While many operations can certainly benefit from being executed on the GPU, the GPU is more suited for maximizing bandwidth and not minimizing latency. Sending frames back and forth is not always the best solution. If you search Pull Requests section, every optimization pull-request was submitted with performance measurements.
We currently have CUDA optimizations for color conversions (YUY to RGB), as well as pointcloud and align processing blocks.
GLSL extensions are new. I didn't run comparisons with CUDA yet, but the goal is slightly different. These give some limited GPU speed-up in a relatively vendor neutral way, but mostly designed to inter-operate with rendering operations (for instance in the Viewer). There is also rs-gl example. Unlike CUDA optimizations that just take over when available, GLSL must be invoked explicitly.
OpenMP is currently used for YUY to RGB conversions and alignment. Results vary depending on your system, but generally it will increase CPU utilization significantly while reducing latency.
Zero Copy feature is for now not functional. The idea was that rs2::frame object could track the underlying Kernel resource instead of making a copy, but this does not always play well with the rest of the SDK. We might re-enable it at some point, but for now there seem to be little need for it.
HWM_OVER_XU is somewhat "internal". HWM or Hardware Monitor is the protocol for controlling all currently supported RealSense cameras. These commands, however, can be sent to the camera in different ways. In production, we use UVC Extension Unit (XU) to pass hardware monitor commands. When there are problems at the UVC level, there is an option to switch to direct USB command transfers. This mostly happens internally when debugging new platforms / devices.

VasuAgrawal · 2019-06-02T05:59:05Z

@dorodnic,
Thanks for the rapid and detailed response! I had missed the wiki section entirely. I've added a link to it in this PR. It looks like I should just stick to enabling CUDA for use on the Xavier (at least for my application).

For any future readers, you can find the list of PRs related to CUDA here. This PR in particular adds the CUDA implementation for alignment. I'm looking forward to benchmarking on the Xavier as well!

dorodnic added question software labels Jun 2, 2019

RealSenseCustomerSupport closed this as completed Jun 24, 2019

MartyG-RealSense mentioned this issue Dec 2, 2020

Any tips for keeping frames in GPU when using the python wrapper? #7824

Closed

RealSenseSupport mentioned this issue Mar 18, 2021

Questions about FORCE_RSUSB_BACKEND and HWM_OVER_XU, Connection issues. #8526

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Source Build Configuration Options #4132

Source Build Configuration Options #4132

VasuAgrawal commented Jun 2, 2019

dorodnic commented Jun 2, 2019 •

edited

Loading

VasuAgrawal commented Jun 2, 2019

Source Build Configuration Options #4132

Source Build Configuration Options #4132

Comments

VasuAgrawal commented Jun 2, 2019

Issue Description

dorodnic commented Jun 2, 2019 • edited Loading

VasuAgrawal commented Jun 2, 2019

dorodnic commented Jun 2, 2019 •

edited

Loading