Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source Build Configuration Options #4132

Closed
VasuAgrawal opened this issue Jun 2, 2019 · 2 comments
Closed

Source Build Configuration Options #4132

VasuAgrawal opened this issue Jun 2, 2019 · 2 comments

Comments

@VasuAgrawal
Copy link
Contributor


Required Info
Camera Model D435 / D435i
Firmware Version 5.11.1.100
Operating System & Version Linux (Ubuntu 18.04)
Kernel Version (Linux Only) 4.15, 4.9.108-tegra
Platform PC, Nvidia Xavier
SDK Version v2.22.0
Language C++, Python
Segment Robotics

Issue Description

Hi,

I'm trying to understand the tradeoffs for enabling / disabling each of the configuration options for librealsense, as defined in the CMake/lrs_options.cmake file. The majority of the options seem largely self-explanatory (or not related to my use case), but there are a few interesting ones that I'd like to better understand the tradeoffs for, ultimately to determine if I should enable the feature.

The ones that I'd like more information on are as follows:

  • BUILD_WITH_CUDA: Which functions in librealsense will be accelerated by using CUDA? Does the use of CUDA tend to improve performance of said functions, or does the performance remain comparable with pure-CPU implementations due to memory copy overhead? Does the answer change when switching to a Jetson platform (ARM CPU and integrated GPU / CPU memory)?
  • BUILD_GLSL_EXTENSIONS: How does the performance of the GLSL shader-based acceleration compare to the pure CUDA implementations on systems that support CUDA? Is there a reason to use the GLSL extensions if CUDA is available? Additionally, are all of the same features that are accelerated by CUDA accelerated by the GLSL shaders?
  • BUILD_WITH_OPENMP: Similarly, what features does OpenMP accelerate, and by how much (typically)?
  • ENABLE_ZERO_COPY: Typically, zero copy transport is good for performance - I'm curious why this option is disabled by default. What are the tradeoffs I should understand when enabling zero copy? Also, what data uses zero copy transport with this flag? Is there additional work that a user needs to do to access the data with zero copy transport?
  • HWM_OVER_XU: What are HWM commands? Didn't find anything relevant on Google.

More generally, perhaps it's worth mentioning the additional configuration options for the source build in the documentation for the source build. Maybe this question could be linked as well, for additional information on the options?

@dorodnic
Copy link
Contributor

dorodnic commented Jun 2, 2019

Hi @VasuAgrawal
There is some documentation on the wiki but perhaps not sufficient.

  • CUDA optimizations were specifically introduced to combat performance bottlenecks on the Jetson. There, especially when you enable high-performance mode, they make huge difference. On regular setups this might or may not result in significant boots, and is mostly not needed. While many operations can certainly benefit from being executed on the GPU, the GPU is more suited for maximizing bandwidth and not minimizing latency. Sending frames back and forth is not always the best solution. If you search Pull Requests section, every optimization pull-request was submitted with performance measurements.
    We currently have CUDA optimizations for color conversions (YUY to RGB), as well as pointcloud and align processing blocks.

  • GLSL extensions are new. I didn't run comparisons with CUDA yet, but the goal is slightly different. These give some limited GPU speed-up in a relatively vendor neutral way, but mostly designed to inter-operate with rendering operations (for instance in the Viewer). There is also rs-gl example. Unlike CUDA optimizations that just take over when available, GLSL must be invoked explicitly.

  • OpenMP is currently used for YUY to RGB conversions and alignment. Results vary depending on your system, but generally it will increase CPU utilization significantly while reducing latency.

  • Zero Copy feature is for now not functional. The idea was that rs2::frame object could track the underlying Kernel resource instead of making a copy, but this does not always play well with the rest of the SDK. We might re-enable it at some point, but for now there seem to be little need for it.

  • HWM_OVER_XU is somewhat "internal". HWM or Hardware Monitor is the protocol for controlling all currently supported RealSense cameras. These commands, however, can be sent to the camera in different ways. In production, we use UVC Extension Unit (XU) to pass hardware monitor commands. When there are problems at the UVC level, there is an option to switch to direct USB command transfers. This mostly happens internally when debugging new platforms / devices.

@VasuAgrawal
Copy link
Contributor Author

@dorodnic,
Thanks for the rapid and detailed response! I had missed the wiki section entirely. I've added a link to it in this PR. It looks like I should just stick to enabling CUDA for use on the Xavier (at least for my application).

For any future readers, you can find the list of PRs related to CUDA here. This PR in particular adds the CUDA implementation for alignment. I'm looking forward to benchmarking on the Xavier as well!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants