Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gyro-ballancing programs suddenly drop out in SPIKE and RI hub[Bug] #232

Closed
DimitriDekvere opened this issue Jan 25, 2021 · 17 comments
Closed
Assignees
Labels
bug Something isn't working hub: primehub/inventorhub Issues related to the LEGO SPIKE Prime hub and LEGO MINDSTORMS Robot Invetor hub software: pybricks-micropython Issues with Pybricks MicroPython firmware (or EV3 runtime) topic: sensors Issues involving sensors

Comments

@DimitriDekvere
Copy link

Describe the bug
Gyro-ballancing programs suddenly drop out in SPIKE and RI hub. the balancing program (based on Lausens Valk dem code) suddely stop working and the robot drops down, but the motors keep running and the animmation on the led matrix keeps working.

To reproduce
Steps to reproduce the behavior:

  1. Start balaning robot with line tracking
  2. Wait while robot is working until it suddenly drops down.

Expected behavior
should keep working, is in a infinite loop.

Screenshots
see movie file, Spike drops out after 3 full turns

spike.gyro.balancer.drops.out.mp4
@DimitriDekvere DimitriDekvere added the triage Issues that have not been triaged yet label Jan 25, 2021
@laurensvalk
Copy link
Member

Thanks for sharing this, and great video too!

the animmation on the led matrix keeps working.

This is good to know. It means that the firmware is not frozen.

the motors keep running and

I think they might still be moving, but no longer being updated. It seems that the script itself gets stuck.

I'm wondering if the angular_velocity read command may get stuck sometimes. It might be that the underlying I2C operation does not complete.

I think I've seen this happen a few times too. I think I've only seen it on the Inventor Hub, not the Technic Hub, but that may not be conclusive just yet. It also seemed more likely to happen when Bluetooth was connected.

@laurensvalk laurensvalk added bug Something isn't working hub: primehub/inventorhub Issues related to the LEGO SPIKE Prime hub and LEGO MINDSTORMS Robot Invetor hub topic: sensors Issues involving sensors and removed triage Issues that have not been triaged yet labels Jan 25, 2021
@laurensvalk
Copy link
Member

It just happened again, but this time the animation froze also.

@ZPhilo
Copy link

ZPhilo commented Jan 28, 2021

Got the same problem...
Here is a short video of Dimi's balancer following my blue/grey floor tiling (based on color saturation)
https://photos.app.goo.gl/5NrmgzbvfrdcwSHKA
Once the balancer stops, there is no way to directly restart it, you have to power it down first.

@laurensvalk
Copy link
Member

This also happens on the Technic Hub with the IMU (#245).

@OutOfTheBots
Copy link

OutOfTheBots commented Jan 16, 2022

Ok I might have hit this bug. Here's what I found"
this code runs fine

from pybricks.hubs import PrimeHub
from pybricks.tools import wait

hub = PrimeHub()

while True:
    print(hub.imu.tilt())
    wait(5)

but if I remove the wait(5) then it runs for a bit then the IUM crashes

@SpudGunMan
Copy link

I also found this to be a issue with the imu in testing beta code. the wait command is also how I corrected it as well

@laurensvalk
Copy link
Member

I've had an initial look at this. We seem to be hitting HAL_I2C_ERROR_AF. And after that, new operations won't work because I2C_FLAG_BUSY remains set indefinitely.

A patch is given below to reproduce this, which gives the following output.

(-15, 55)
(-15, 55)
(-15, 55)
(-15, 54)
(-15, 54)
(-15, 54)
(-15, 54)
i2c err = 4  <---- This is HAL_I2C_ERROR_AF
Traceback (most recent call last):
  File "main.py", line 7, in <module>
OSError: [Errno 5] EIO: 

Unexpected hardware input/output error with a motor or sensor:
--> Try unplugging the sensor or motor and plug it back in again.
--> To see which sensor or motor is causing the problem,
    check the line in your script that matches
    the line number given in the 'Traceback' above.
--> Try rebooting the hub/brick if the problem persists.

When starting the program again we see:


Traceback (most recent call last):
  File "main.py", line 7, in <module>
OSError: [Errno 16] EBUSY: Device or resource busy

Code snippet and analysis patch

from pybricks.hubs import PrimeHub

hub = PrimeHub()

while True:
    print(hub.imu.tilt())
diff --git a/pybricks/util_pb/pb_imu.c b/pybricks/util_pb/pb_imu.c
index b010f3a7..8d6b6d68 100644
--- a/pybricks/util_pb/pb_imu.c
+++ b/pybricks/util_pb/pb_imu.c
@@ -9,6 +9,9 @@
 
 #include <lsm6ds3tr_c_reg.h>
 
+#include <pbio/error.h>
+
+#include <pybricks/util_pb/pb_error.h>
 #include <pybricks/util_pb/pb_imu.h>
 
 #include STM32_HAL_H
@@ -42,6 +45,16 @@ void HAL_I2C_MemRxCpltCallback(I2C_HandleTypeDef *hi2c) {
     _imu_dev.ctx.read_write_done = true;
 }
 
+uint32_t i2c_error = HAL_I2C_ERROR_NONE;
+pbio_error_t i2c_status = PBIO_SUCCESS;
+
+void HAL_I2C_ErrorCallback(I2C_HandleTypeDef *hi2c)
+{
+    i2c_error = HAL_I2C_GetError(hi2c);
+    i2c_status = PBIO_ERROR_IO;
+    _imu_dev.ctx.read_write_done = true;
+}
+
 // REVISIT: if there is ever an error the PT threads will stall since we aren't
 // handling the error callbacks.
 
@@ -175,6 +188,10 @@ void pb_imu_accel_read(pb_imu_dev_t *imu_dev, float_t *values) {
     struct pt pt;
     int16_t data[3];
 
+    if (__HAL_I2C_GET_FLAG(&hi2c, I2C_FLAG_BUSY)) {
+        pb_assert(PBIO_ERROR_BUSY);
+    }
+
     PT_INIT(&pt);
     while (PT_SCHEDULE(lsm6ds3tr_c_acceleration_raw_get(&pt, &imu_dev->ctx, (uint8_t *)data))) {
         nlr_buf_t nlr;
@@ -186,6 +203,13 @@ void pb_imu_accel_read(pb_imu_dev_t *imu_dev, float_t *values) {
             nlr_jump(nlr.ret_val);
         }
     }
+
+    if (i2c_status != PBIO_SUCCESS) {
+        mp_printf(&mp_plat_print, "i2c err = %d\n", i2c_error);
+    }
+
+    pb_assert(i2c_status);
+
     values[0] = data[0] * imu_dev->accel_scale;
     values[1] = data[1] * imu_dev->accel_scale;
     values[2] = data[2] * imu_dev->accel_scale;

@laurensvalk
Copy link
Member

laurensvalk commented Feb 18, 2022

This also happens on the Technic Hub [...]

This doesn't seem to be true. It has been running fine on the Technic Hub for a while now.

But it fails on SPIKE Prime and SPIKE Essential. It could be lucky timing but maybe there really is a difference.

Related to the links above, the following is only done in stm32f4xx_hal_i2c, not in any other:

  /*Reset I2C*/
  hi2c->Instance->CR1 |= I2C_CR1_SWRST;
  hi2c->Instance->CR1 &= ~I2C_CR1_SWRST;

But doing this again when the problem occurs didn't seem to make it work. Next up, trying some of the reset workarounds given above. EDIT: Doesn't work either. Stays locked up somehow.

@OutOfTheBots
Copy link

Have you tried looking at the code for I2C in mainline Micropython because STM32F4 mainline micropython has stable I2C

@dlech dlech added the software: pybricks-micropython Issues with Pybricks MicroPython firmware (or EV3 runtime) label Jun 11, 2022
@dlech
Copy link
Member

dlech commented Aug 6, 2022

Repeating my comments from #675 (comment):

  • We've made some updates to the driver code so that the IMU values are read continuously in the background
  • I have one SPIKE Prime hub that I let run overnight with the new code that never experienced an I2C error
  • I have a different SPIKE Prime hub that experienced the error with the new code, but only after running for about 30 minutes

If there is a faster way to reproduce the error, that would be helpful.

@laurensvalk
Copy link
Member

If there is a faster way to reproduce the error, that would be helpful.

It was quite easy to reproduce before we switched to the new process. If not much else has changed, maybe debugging can be done in the old code, and then apply the fix to the new code as well?

Reading some of the links in the post above again, some things do sound suspiciously similar:

This error gets triggered more often if there is other stuff happening as well (interrupts at critical moment or other tasks running under RTOS)

@dlech
Copy link
Member

dlech commented Aug 6, 2022

Have you noticed a difference with the new code on your hubs?

@laurensvalk
Copy link
Member

laurensvalk commented Aug 6, 2022

It seems to run without hanging at least for a few minutes, but I'm not 100% sure this hub was affected before. Will check later.

@dlech
Copy link
Member

dlech commented Aug 6, 2022

I think I found a way to speed up reproducing the error by just printing as fast as possible in a tight loop:

while True:
    print(
        timer.time(),
        *hub.imu.acceleration(),
        *hub.imu.angular_velocity(),
        hub.imu.temp(),
        hub.imu.err(),
        sep=','
    )

@OutOfTheBots
Copy link

Is there a way to eliminate that the error could be at the IMU not the spike?? I2C very easily gets a NAK back.

Does it only happen when using printing out the results fast or does it also happen when just reading the IMU fast but without printing out the result to the serial terminal? i.e can it be narrowed down to a solo problem with I2C or problem of print conflicting with I2C???

@dlech
Copy link
Member

dlech commented Aug 8, 2022

It happens the most quickly when printing in a tight loop. If a delay is added, then the problems seems to go away. It also seems to happen occasionally when motors are running even with the same delay. The thing printing (at least on the SPIKE hubs) and motors have in common is that they both use UART which has a higher priority interrupt. So maybe there is a timing problem of the I2C interrupt not being handle fast enough? This seems to be a well know issue, e.g. https://cnnblike.com/post/stm32-iicbug/ in addition to the links Laurens already gave.

dlech added a commit to pybricks/pybricks-micropython that referenced this issue Aug 8, 2022
Occasionally there is an I2C bus error on STM32F13 MCUs that causes
the polling processes to lock up because errors were not handled.

This adds error handling with a I2C peripheral reset to recover from
the error.

Fixes: pybricks/support#232
dlech added a commit to pybricks/pybricks-micropython that referenced this issue Aug 8, 2022
Occasionally there is an I2C bus error on STM32F13 MCUs that causes
the polling processes to lock up because errors were not handled.

This adds error handling with a I2C peripheral reset to recover from
the error.

Fixes: pybricks/support#232
@dlech dlech self-assigned this Aug 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working hub: primehub/inventorhub Issues related to the LEGO SPIKE Prime hub and LEGO MINDSTORMS Robot Invetor hub software: pybricks-micropython Issues with Pybricks MicroPython firmware (or EV3 runtime) topic: sensors Issues involving sensors
Projects
None yet
Development

No branches or pull requests

6 participants