Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running multiple Realsense D455 Cameras causes computer to crash/freeze #12016

Closed
jasatron9000 opened this issue Jul 19, 2023 · 11 comments
Closed

Comments

@jasatron9000
Copy link


Required Info
Camera Model D455
Firmware Version 5.13.0.50/5.15.0.2
Operating System & Version Ubuntu 20.04
Kernel Version (Linux Only) 5.15.0-76-generic
Platform PC
SDK Version 2.51.1/2.54.1
Language C++ ROS
Segment Robot

Issue Description

We are running 3 Realsense D455 cameras on our Autonomous Vehicle Platform. The computer will become unresponsive locally and over the network after a few hours of continous operation, but it stays powered up. Whenever this happens, we need to manually do a power cycle of the entire system. We suspect that this issue might be due to the Realsense Cameras as the computer doesn't experience any crashes like this when the cameras are not turned on.

We are running the cameras with an external sync that uses an Arduino that sends a pulse at 10 Hz to all three cameras. The configurations of the inter_sync_cam_mode is set to GenLock (4). We have not tested it without the sync, but we wish to run it with this external sync.

The cameras are connected to a 4-port USB-C to USB 3.2 Hub.

These are the computer's specifications:

  • OS: Ubuntu 20.04
  • Kernel: 5.15.0-76-generic
  • CPU: Intel i7-12700F
  • Motherboard: MSI MEG Z690I UNIFY
  • RAM: 32GB

We've tried two different versions of the SDK, firmware and ROS wrapper. The first was building the librealsense2 SDK from source. These are the versions that we used which did not succeed:

  • librealsense2: 2.51.1
  • firmware: 5.13.0.50
  • ROS: noetic
  • ROS Wrapper: 2.3.2

For building this from source, we followed these instructions: https://github.com/IntelRealSense/librealsense/blob/v2.51.1/doc/installation.md

We then tried use the updated version, however using the pre-built binaries instead, which were:

  • librealsense2: 2.54.1
  • firmware: 5.15.0.2
  • ROS: humble
  • ROS Wrapper: 4.54.1

For the pre-built binaries, we followed these instructions to install it. https://github.com/IntelRealSense/librealsense/blob/master/doc/distribution_linux.md

Prior to these crashes, these are the messages we typically see:

$ journalctl -b -1 -e
Jul 19 13:35:43 pc04 iio-sensor-prox[938]: Failed to open 'iio:device0' at /dev/iio:device0: Device or resource busy
Jul 19 13:35:44 pc04 iio-sensor-prox[938]: Failed to open 'iio:device0' at /dev/iio:device0: Device or resource busy
Jul 19 13:35:44 pc04 kernel: usb 2-5.2: Failed to query (GET_CUR) UVC control 1 on unit 3: -32 (exp. 1024).
Jul 19 13:35:44 pc04 iio-sensor-prox[938]: Failed to open 'iio:device0' at /dev/iio:device0: Device or resource busy
Jul 19 13:35:45 pc04 iio-sensor-prox[938]: Failed to open 'iio:device0' at /dev/iio:device0: Device or resource busy
Jul 19 13:35:46 pc04 iio-sensor-prox[938]: Failed to open 'iio:device0' at /dev/iio:device0: Device or resource busy
Jul 19 13:35:46 pc04 iio-sensor-prox[938]: Failed to open 'iio:device0' at /dev/iio:device0: Device or resource busy
Jul 19 13:35:47 pc04 iio-sensor-prox[938]: Failed to open 'iio:device0' at /dev/iio:device0: Device or resource busy
Jul 19 13:35:48 pc04 iio-sensor-prox[938]: Failed to open 'iio:device0' at /dev/iio:device0: Device or resource busy
Jul 19 13:35:48 pc04 iio-sensor-prox[938]: Failed to open 'iio:device0' at /dev/iio:device0: Device or resource busy
Jul 19 13:35:49 pc04 iio-sensor-prox[938]: Failed to open 'iio:device0' at /dev/iio:device0: Device or resource busy
Jul 19 13:35:50 pc04 iio-sensor-prox[938]: Failed to open 'iio:device0' at /dev/iio:device0: Device or resource busy
Jul 19 13:35:50 pc04 kernel: usb 2-5.1: Failed to query (GET_CUR) UVC control 1 on unit 3: -32 (exp. 1024).
Jul 19 13:35:51 pc04 iio-sensor-prox[938]: Failed to open 'iio:device0' at /dev/iio:device0: Device or resource busy
Jul 19 13:35:51 pc04 iio-sensor-prox[938]: Failed to open 'iio:device0' at /dev/iio:device0: Device or resource busy
Jul 19 13:35:52 pc04 iio-sensor-prox[938]: Failed to open 'iio:device0' at /dev/iio:device0: Device or resource busy
Jul 19 13:35:53 pc04 iio-sensor-prox[938]: Failed to open 'iio:device0' at /dev/iio:device0: Device or resource busy

Would appreciate some help with this issue. Thanks.

@MartyG-RealSense
Copy link
Collaborator

Hi @jasatron9000 In regards to the message Failed to query (GET_CUR) UVC control 1 on unit 3, in 2020 I asked my Intel RealSense colleagues about this message and was provided with the response below.


The message “uvcvideo: Failed to query (GET_CUR) UVC control” indicates a failure on (kernel) driver level to read the requested parameter, for instance if the device was busy with internal operation or that the incoming requests queue is full. In most cases this message can be disregarded as the SDK has built-in auto-retry mechanism to compensate for the temporalities.


The full case can be viewed at #5302

In your particular case though, the computer does not appear to be auto-recovering from the problem as it becomes unresponsive.

'iio-sensor-prox' may be a reference to a non-RealSense component called iio-sensor-proxy that interfaces with accelerometers, like the one that the D455's IMU has.

https://gitlab.freedesktop.org/hadess/iio-sensor-proxy/

Ubuntu apparently makes use of it to rotate the screen's orientation when an IMU device is attached to the computer, like in the RealSense case at IntelRealSense/realsense-ros#1281

If iio-sensor-prox[938] is related to iio-sensor-proxy then you could consider removing it from the computer, as suggested by a RealSense team member at IntelRealSense/realsense-ros#1281 (comment)

@jasatron9000
Copy link
Author

jasatron9000 commented Jul 19, 2023

Thanks for the response, @MartyG-RealSense

Just to clarify, if we manage to limit or stop the error message uvcvideo: Failed to query (GET_CUR) UVC control from appearing from the log this will probably help alleviate the issues regarding the computer becoming unresponsive.

Looking through the case that you linked (#5302), these are the solutions that I saw that might help.

  • Reduce the FPS and/or reduce the resolution of the streams.
  • Increasing the frame buffer size to 2 rather than the default value of 1.

Additionally, would using specific cables or a USB Hub help with this issue. I've seen you talk about it on this comment, however, I'm unsure if it is related.

Can you confirm if these are the solutions we should try to limit that error from occuring.
Thanks.

@MartyG-RealSense
Copy link
Collaborator

MartyG-RealSense commented Jul 20, 2023

If the log messages are generating continuously at a fast rate then I can see how they could have an impact on the computer's performance.

Reducing FPS and / or resolution can sometimes reduce the frequency at which log messages are generated, though not completely eliminate them.

Increasing the frame buffer size can reduce the number of frames that are dropped (lost) from the pipeline. I do not think that it would help with your particular problem though.

It is important to use a high quality USB cable that is designed for data transfer and not just an inexpensive one that was designed only for device charging. This is because of the high volume of data that RealSense cables can transmit. If it was the cable that was the cause of the problem though then you would probably see errors occurring much sooner than 3 hours into the session.

You could though check the temperature of the cameras when the crash occurs. If you touch the outer casing of a camera and it is hot to the touch then it is likely overheated. If the internal temperature of a camera rises above 42 degrees C then problems can start to manifest when the temperature is past that point. Overheating could be caused by the heat of the environment that the camera is being used in (heating of the external casing causes some internal heating) or caused by an electrical glitch on the USB port or in the cable if the cable has unseen internal damage.

Incompatibilities with certain models of USB hub can also occur. There is no way to know in advance how well a particular hub will work with the camera. When Intel performed multi-camera bandwidth usage tests for their multiple camera hardware sync paper though, they tested with Amazon's own-brand AmazonBasics hub. I purchased this brand of hub myself for my RealSense cameras based on the paper's recommendation and have never had problems with it during three years of use.

https://dev.intelrealsense.com/docs/multiple-depth-cameras-configuration#2-multi-camera-considerations

@jasatron9000
Copy link
Author

Thanks for the reply, @MartyG-RealSense

We'll give these solutions a try and we'll get back to you if we have any further questions for you.

@MartyG-RealSense
Copy link
Collaborator

Hi @jasatron9000 Do you have an update about this case that you can provide, please? Thanks!

@MartyG-RealSense
Copy link
Collaborator

Hi @jasatron9000 Do you require further assistance with this case, please? Thanks!

@jasatron9000
Copy link
Author

@MartyG-RealSense Apologies for the late response. Here's an update on what we currently managed to try out.

We've managed to try lowering the resolution of the depth cameras from 1280x720 to 848x480 to see if it will help with this issue and from our observations it seemed to have increased the time between the crashes from 4 - 6 hours to 12 - 24 hours. In some cases, it managed to run without issues for ~2 days straight. However, the error message uvcvideo: Failed to query (GET_CUR) UVC control is still present.

There are two things remaining that we needed to try out which were:

  • Using the AmazonBasics Hub that was recommended for the RealSense cameras.
  • Loweing the frequency of the external sync signal which will effectively lower the frame rate.

Also, we had an idea floating around to help with this issue and it was to periodically restart the ROS RealSense nodes (perhaps once an hour). Just wondering on your thoughts on if this idea is worth pursuing.

Thanks in advance.

@MartyG-RealSense
Copy link
Collaborator

I would certainly recommend periodic resets as a way of dealing with problems during long-run sessions.

@MartyG-RealSense
Copy link
Collaborator

Hi @jasatron9000 Do you require further assistance with this case, please? Thanks!

@jasatron9000
Copy link
Author

Hi @MartyG-RealSense, I believe you've given us enough information to work through this problem and at this point we don't have anymore inquiries regarding this problem.

If we have any additional questions in the future, we'll be sure to ask. Thanks for the help.

@MartyG-RealSense
Copy link
Collaborator

You are very welcome. Please do feel free to ask if you have further questions at a future date. As you do not require further assistance at this time, I will close this case. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants