Skip to content

Troubleshooting the AutoRally Platform

Nolan Wagener edited this page Nov 30, 2018 · 42 revisions

This page contains troubleshooting tips and techniques for when something on a fully configured AutoRally platform is now working. This is a working document that we continue to fill in as we debug and troubleshoot our own platforms.

1. Chrony

If Chrony on a OCS computer or compute box is not looking for the correct sources, make sure Chrony is being launched on startup with the correct configuration file and the appropriate information is entered into each file according to the Platform Configuration Instructions.

Software updates can re-enable Chrony autostart that will use an incorrect configuration file, so you may have to disable autostart according to Platform Configuration Instructions.

After starting (autorally.launch), if an error of the form [ERROR] [##########.######]: 506 Cannot talk to daemon appears, chrony on the OCS machine has not started. See the Platform Configuration Instructions.

2. GPSD

If GPS has position fix, but Chrony does not see information from any PPS sources within a minute after starting GPSD, try: sudo killall gpsd and then re-run gpsd according to Operating Procedures.

Software updates can re-enable GPSD autostart that will use an incorrect configuration file, so you may have to disable autostart according to Platform Configuration Instructions.

3. GPS

  • Periodically going from solid fix to 0 satellites, then back to a solid fix. This is likely the GPS board resetting because of inadequate input voltage or a short/broken wire somewhere in the GPS wiring harness between the GPS board and the serial to USB adapters plugged into the motherboard.

  • Not dropping into any RTK fix mode. Verify that RTCM3 correction messages are being received by the XbeeNode at about 2.1Hz using the XbeeNode diagnostic topic. If not, restart the base station. If corrections are coming through, restart autorally core (autorally.launch) to restart gpsRover.

  • Not receiving communications from the GPS in the diagnostic topic. With autorally.launch shut down, open Cutecom and connect to /dev/arGPSroverPortB and check that you are receiving messages on this port. The data should be human readable with messages beginning with things like "GPGGA" or "GPGNS" followed by data. If you are not receiving data, check to see if the GPS is powered correctly by opening the GPS box and looking at the 4 lights on the GPS board. The red light should be on to indicate power, and the other 3 lights indicate the position fix status, which may or may not be lit. If all of the lights look OK, and still no data is being received, something is likely wrong in the wiring harness between the GPS board and serial to USB converters plugged into the motherboard.

4. IMU

  • You see IMU rejected baud rate on the command line, and no IMU diagnostic information appears in the OCS. You have to power cycle the IMU by powering down the compute box and disconnecting all power.

  • One IMU diagnostic topic (imu_filter) will always be red, but two of the three IMU diagnostic topics are red and complain about timestamps indicates an problem. First, verify the system clock is synchronized to GPS time (diagnostic topic for chrony on the compute box is green), then restart autorally.launch. You may have to restart a couple times to for the IMU to be happy. This has something to do with the GPS PPS and timestamp timing being fed into the IMU, but we haven't figured out what exactly.

5. autorally_chassis

  • The autorally_chassis diagnostic topic will not be green unless the chassis is powered. Expected errors in this case are ESC data incomplete packets and PWM out of range.

6. OCS

  • In the diagnostics view, if a topic has purple, that means one of the diagnostic messages with timing information is stale (haven't received one in at least 5 seconds). This is likely caused by a node crash or infrequent events. You can remove stale messages by double-clicking on each stale message, or clicking Clear Stale Messages button below the diagnostics view.

  • On startup, programs will may throw errors to diagnostics and the command line as startup node order is not guaranteed. Let the system settle for 10-15 seconds after startup, then clear stale messages.

  • On occasion, the OCS will segfault (currently once about every 4 hours of running, if you get meaningful output from the crash let us know), which will take the rest of autorally.launch down with it because the OCS is marked required in the launch file. Kill the remaining nodes with ctrl-c and relaunch autorally.launch.

7. imuGpsEstimator, GTSAM

When you installed GTSAM, but imuGpsEstimator does not compile, one possible reason can be a permission problem. You cannot launch stateEstimator.launch file if the imuGpsEstimator is not compiled.

Do umask 0022 before sudo make install to get the right permission.

If umask 0022 does not work, try below.

  • in ~/catkin_ws/src/autorally/autorally_core/src folder, open cMakeLists.txt and change (comment out) line 7 and 18.

  • #if(GTSAM_FOUND)

  • #endif()

  • In a terminal, cd to gtsam/build folder that you made, type sudo nano install_manifest.txt. You have to change permissions of all the files listed in install_manifest.txt file.

  • To change the permissions, install gksu package. In a terminal, type sudo apt-get update and sudo apt-get install gksu. After the package is installed, type gksudo nautilus. One window showing Desktop folder will pop up. Now you can change the permissions of folders and files in the window.

  • Change Access: Read and write

  • Click Change Permissions for Enclosed Files...

  • Files: Read and write

  • Folders: Create and delete files

  • After you change the permissions, do sudo make install in the gtsam/build folder.

  • cd ~/catkin_ws and do catkin_make clean then catkin_make.

  • Check the message Built target imuGpsEstimator

8. Xbee / GPS delay

If data seems to be delayed over XBee try:

  • cat /sys/bus/usb-serial/devices/ttyUSB0/latency_timer should be 1. If not, check that your 99-autoRally.rules for the line

ACTION=="add", SUBSYSTEM=="usb-serial", DRIVER=="ftdi_sio", ATTR{latency_timer}="1"

  • ttyUSB0 part is arGPSroverPortA/B/D or XbeeNode before udev assigns the alias

In the OCS, a message No data from RF indicated that the XbeeNode on the compute box is not receiving any information from the XbeeCoordinator, which includes GPS RTK corrections and runstop information from the runstop box. To debug:

  • Are the runstop box and GPS base station plugged into the OCS laptop?
  • Is the base station launched?
  • Are the Xbee antennas on the compute box and runstop box tighened and undamaged?
  • Is something metal close to one of the antennas and blocking the signal?

9. USB and Serial Devices

If you cannot talk to device, but it appears to power correctly, ensure that all data connections are correct. In particular, serial devices are connected TX to RX and vice versa while USB devices are connected D+ to D+ and D- to D-.

10. ROS Node Crashes

If ROS nodes are crashing and there are no informative errors (this may happen if roslaunch script ends up remotely launching a node on another computer), run rqt_console. The console should have informative errors posted.