Gyro-ballancing programs suddenly drop out in SPIKE and RI hub[Bug] #232

DimitriDekvere · 2021-01-25T08:38:28Z

Describe the bug
Gyro-ballancing programs suddenly drop out in SPIKE and RI hub. the balancing program (based on Lausens Valk dem code) suddely stop working and the robot drops down, but the motors keep running and the animmation on the led matrix keeps working.

To reproduce
Steps to reproduce the behavior:

Start balaning robot with line tracking
Wait while robot is working until it suddenly drops down.

Expected behavior
should keep working, is in a infinite loop.

Screenshots
see movie file, Spike drops out after 3 full turns

spike.gyro.balancer.drops.out.mp4

laurensvalk · 2021-01-25T09:00:01Z

Thanks for sharing this, and great video too!

the animmation on the led matrix keeps working.

This is good to know. It means that the firmware is not frozen.

the motors keep running and

I think they might still be moving, but no longer being updated. It seems that the script itself gets stuck.

I'm wondering if the angular_velocity read command may get stuck sometimes. It might be that the underlying I2C operation does not complete.

I think I've seen this happen a few times too. I think I've only seen it on the Inventor Hub, not the Technic Hub, but that may not be conclusive just yet. It also seemed more likely to happen when Bluetooth was connected.

laurensvalk · 2021-01-25T19:06:08Z

It just happened again, but this time the animation froze also.

ZPhilo · 2021-01-28T17:26:25Z

Got the same problem...
Here is a short video of Dimi's balancer following my blue/grey floor tiling (based on color saturation)
https://photos.app.goo.gl/5NrmgzbvfrdcwSHKA
Once the balancer stops, there is no way to directly restart it, you have to power it down first.

laurensvalk · 2021-01-31T21:08:50Z

This also happens on the Technic Hub with the IMU (#245).

OutOfTheBots · 2022-01-16T11:57:46Z

Ok I might have hit this bug. Here's what I found"
this code runs fine

from pybricks.hubs import PrimeHub
from pybricks.tools import wait

hub = PrimeHub()

while True:
    print(hub.imu.tilt())
    wait(5)

but if I remove the wait(5) then it runs for a bit then the IUM crashes

SpudGunMan · 2022-01-16T21:21:51Z

I also found this to be a issue with the imu in testing beta code. the wait command is also how I corrected it as well

laurensvalk · 2022-02-18T08:20:44Z

I've had an initial look at this. We seem to be hitting HAL_I2C_ERROR_AF. And after that, new operations won't work because I2C_FLAG_BUSY remains set indefinitely.

A patch is given below to reproduce this, which gives the following output.

(-15, 55)
(-15, 55)
(-15, 55)
(-15, 54)
(-15, 54)
(-15, 54)
(-15, 54)
i2c err = 4  <---- This is HAL_I2C_ERROR_AF
Traceback (most recent call last):
  File "main.py", line 7, in <module>
OSError: [Errno 5] EIO: 

Unexpected hardware input/output error with a motor or sensor:
--> Try unplugging the sensor or motor and plug it back in again.
--> To see which sensor or motor is causing the problem,
    check the line in your script that matches
    the line number given in the 'Traceback' above.
--> Try rebooting the hub/brick if the problem persists.

When starting the program again we see:


Traceback (most recent call last):
  File "main.py", line 7, in <module>
OSError: [Errno 16] EBUSY: Device or resource busy

Code snippet and analysis patch

from pybricks.hubs import PrimeHub

hub = PrimeHub()

while True:
    print(hub.imu.tilt())

diff --git a/pybricks/util_pb/pb_imu.c b/pybricks/util_pb/pb_imu.c
index b010f3a7..8d6b6d68 100644
--- a/pybricks/util_pb/pb_imu.c
+++ b/pybricks/util_pb/pb_imu.c
@@ -9,6 +9,9 @@
 
 #include <lsm6ds3tr_c_reg.h>
 
+#include <pbio/error.h>
+
+#include <pybricks/util_pb/pb_error.h>
 #include <pybricks/util_pb/pb_imu.h>
 
 #include STM32_HAL_H
@@ -42,6 +45,16 @@ void HAL_I2C_MemRxCpltCallback(I2C_HandleTypeDef *hi2c) {
     _imu_dev.ctx.read_write_done = true;
 }
 
+uint32_t i2c_error = HAL_I2C_ERROR_NONE;
+pbio_error_t i2c_status = PBIO_SUCCESS;
+
+void HAL_I2C_ErrorCallback(I2C_HandleTypeDef *hi2c)
+{
+    i2c_error = HAL_I2C_GetError(hi2c);
+    i2c_status = PBIO_ERROR_IO;
+    _imu_dev.ctx.read_write_done = true;
+}
+
 // REVISIT: if there is ever an error the PT threads will stall since we aren't
 // handling the error callbacks.
 
@@ -175,6 +188,10 @@ void pb_imu_accel_read(pb_imu_dev_t *imu_dev, float_t *values) {
     struct pt pt;
     int16_t data[3];
 
+    if (__HAL_I2C_GET_FLAG(&hi2c, I2C_FLAG_BUSY)) {
+        pb_assert(PBIO_ERROR_BUSY);
+    }
+
     PT_INIT(&pt);
     while (PT_SCHEDULE(lsm6ds3tr_c_acceleration_raw_get(&pt, &imu_dev->ctx, (uint8_t *)data))) {
         nlr_buf_t nlr;
@@ -186,6 +203,13 @@ void pb_imu_accel_read(pb_imu_dev_t *imu_dev, float_t *values) {
             nlr_jump(nlr.ret_val);
         }
     }
+
+    if (i2c_status != PBIO_SUCCESS) {
+        mp_printf(&mp_plat_print, "i2c err = %d\n", i2c_error);
+    }
+
+    pb_assert(i2c_status);
+
     values[0] = data[0] * imu_dev->accel_scale;
     values[1] = data[1] * imu_dev->accel_scale;
     values[2] = data[2] * imu_dev->accel_scale;

laurensvalk · 2022-02-18T08:34:45Z

Related: https://electronics.stackexchange.com/questions/352730/stm32f4-i2c-address-timeout --- Maybe we should just drop some of the HAL stuff when we rewrite it anyway for the IMU process.

https://community.st.com/s/question/0D50X0000AupXVgSQM/i2c-device-never-sets-sb-bit-after-start-is-transmitted
https://community.st.com/s/question/0D50X0000CDm8X1SQJ/st32f429-i2c-stop-bit-set-incorrectly-in-i2ccr1
https://community.st.com/s/question/0D50X00009XkhfnSAB/stm32f2xx-i2c-not-sending-address-after-start

There also seems to be an erratum for some STM32s with workarounds like these.

laurensvalk · 2022-02-18T09:00:10Z

This also happens on the Technic Hub [...]

This doesn't seem to be true. It has been running fine on the Technic Hub for a while now.

But it fails on SPIKE Prime and SPIKE Essential. It could be lucky timing but maybe there really is a difference.

Related to the links above, the following is only done in stm32f4xx_hal_i2c, not in any other:

  /*Reset I2C*/
  hi2c->Instance->CR1 |= I2C_CR1_SWRST;
  hi2c->Instance->CR1 &= ~I2C_CR1_SWRST;

But doing this again when the problem occurs didn't seem to make it work. Next up, trying some of the reset workarounds given above. EDIT: Doesn't work either. Stays locked up somehow.

OutOfTheBots · 2022-02-18T21:43:17Z

Have you tried looking at the code for I2C in mainline Micropython because STM32F4 mainline micropython has stable I2C

dlech · 2022-08-06T16:47:35Z

Repeating my comments from #675 (comment):

We've made some updates to the driver code so that the IMU values are read continuously in the background
I have one SPIKE Prime hub that I let run overnight with the new code that never experienced an I2C error
I have a different SPIKE Prime hub that experienced the error with the new code, but only after running for about 30 minutes

If there is a faster way to reproduce the error, that would be helpful.

laurensvalk · 2022-08-06T18:09:27Z

If there is a faster way to reproduce the error, that would be helpful.

It was quite easy to reproduce before we switched to the new process. If not much else has changed, maybe debugging can be done in the old code, and then apply the fix to the new code as well?

Reading some of the links in the post above again, some things do sound suspiciously similar:

This error gets triggered more often if there is other stuff happening as well (interrupts at critical moment or other tasks running under RTOS)

dlech · 2022-08-06T18:25:27Z

Have you noticed a difference with the new code on your hubs?

laurensvalk · 2022-08-06T18:32:08Z

It seems to run without hanging at least for a few minutes, but I'm not 100% sure this hub was affected before. Will check later.

dlech · 2022-08-06T22:28:14Z

I think I found a way to speed up reproducing the error by just printing as fast as possible in a tight loop:

while True:
    print(
        timer.time(),
        *hub.imu.acceleration(),
        *hub.imu.angular_velocity(),
        hub.imu.temp(),
        hub.imu.err(),
        sep=','
    )

OutOfTheBots · 2022-08-08T03:15:15Z

Is there a way to eliminate that the error could be at the IMU not the spike?? I2C very easily gets a NAK back.

Does it only happen when using printing out the results fast or does it also happen when just reading the IMU fast but without printing out the result to the serial terminal? i.e can it be narrowed down to a solo problem with I2C or problem of print conflicting with I2C???

dlech · 2022-08-08T14:45:39Z

It happens the most quickly when printing in a tight loop. If a delay is added, then the problems seems to go away. It also seems to happen occasionally when motors are running even with the same delay. The thing printing (at least on the SPIKE hubs) and motors have in common is that they both use UART which has a higher priority interrupt. So maybe there is a timing problem of the I2C interrupt not being handle fast enough? This seems to be a well know issue, e.g. https://cnnblike.com/post/stm32-iicbug/ in addition to the links Laurens already gave.

Occasionally there is an I2C bus error on STM32F13 MCUs that causes the polling processes to lock up because errors were not handled. This adds error handling with a I2C peripheral reset to recover from the error. Fixes: pybricks/support#232

DimitriDekvere added the triage Issues that have not been triaged yet label Jan 25, 2021

laurensvalk added bug Something isn't working hub: primehub/inventorhub Issues related to the LEGO SPIKE Prime hub and LEGO MINDSTORMS Robot Invetor hub topic: sensors Issues involving sensors and removed triage Issues that have not been triaged yet labels Jan 25, 2021

laurensvalk mentioned this issue Jan 31, 2021

[Bug] DriveBase does not stop when program stops #245

Closed

dlech mentioned this issue Apr 14, 2021

[Bug] Unresponsive stop button after idle for a few minutes #307

Closed

dlech added the software: pybricks-micropython Issues with Pybricks MicroPython firmware (or EV3 runtime) label Jun 11, 2022

dlech closed this as completed in pybricks/pybricks-micropython@8b0e758 Aug 8, 2022

dlech self-assigned this Aug 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gyro-ballancing programs suddenly drop out in SPIKE and RI hub[Bug] #232

Gyro-ballancing programs suddenly drop out in SPIKE and RI hub[Bug] #232

DimitriDekvere commented Jan 25, 2021

laurensvalk commented Jan 25, 2021

laurensvalk commented Jan 25, 2021

ZPhilo commented Jan 28, 2021

laurensvalk commented Jan 31, 2021

OutOfTheBots commented Jan 16, 2022 •

edited by laurensvalk

Loading

SpudGunMan commented Jan 16, 2022

laurensvalk commented Feb 18, 2022

laurensvalk commented Feb 18, 2022 •

edited

Loading

laurensvalk commented Feb 18, 2022 •

edited

Loading

OutOfTheBots commented Feb 18, 2022

dlech commented Aug 6, 2022

laurensvalk commented Aug 6, 2022

dlech commented Aug 6, 2022

laurensvalk commented Aug 6, 2022 •

edited

Loading

dlech commented Aug 6, 2022

OutOfTheBots commented Aug 8, 2022

dlech commented Aug 8, 2022

Gyro-ballancing programs suddenly drop out in SPIKE and RI hub[Bug] #232

Gyro-ballancing programs suddenly drop out in SPIKE and RI hub[Bug] #232

Comments

DimitriDekvere commented Jan 25, 2021

laurensvalk commented Jan 25, 2021

laurensvalk commented Jan 25, 2021

ZPhilo commented Jan 28, 2021

laurensvalk commented Jan 31, 2021

OutOfTheBots commented Jan 16, 2022 • edited by laurensvalk Loading

SpudGunMan commented Jan 16, 2022

laurensvalk commented Feb 18, 2022

laurensvalk commented Feb 18, 2022 • edited Loading

laurensvalk commented Feb 18, 2022 • edited Loading

OutOfTheBots commented Feb 18, 2022

dlech commented Aug 6, 2022

laurensvalk commented Aug 6, 2022

dlech commented Aug 6, 2022

laurensvalk commented Aug 6, 2022 • edited Loading

dlech commented Aug 6, 2022

OutOfTheBots commented Aug 8, 2022

dlech commented Aug 8, 2022

OutOfTheBots commented Jan 16, 2022 •

edited by laurensvalk

Loading

laurensvalk commented Feb 18, 2022 •

edited

Loading

laurensvalk commented Feb 18, 2022 •

edited

Loading

laurensvalk commented Aug 6, 2022 •

edited

Loading