Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] ColorDistanceSensor takes two cycles to sync up on City Hub on boot #747

Open
laurensvalk opened this issue Oct 21, 2022 · 13 comments
Labels
bug Something isn't working hub: cityhub Issues related to the LEGO Powered Up Smart hub (Hub No. 4) platform: Powered Up Issues related to LEGO Powered Up software: pybricks-micropython Issues with Pybricks MicroPython firmware (or EV3 runtime)

Comments

@laurensvalk
Copy link
Member

Describe the bug

When booting the City Hub with the Color Distance sensor plugged in, it takes two full cycles to sync up.

On the SPIKE Prime Hub, it syncs up quickly on boot, but it also needs two full cycles when plugging it in later.

See pybricks/pybricks-micropython#121 for a video and code example.

This is not critical for the beta, and also not really for the 3.2 release, so I will not add the 3.2 project tag for now.

@laurensvalk laurensvalk added bug Something isn't working platform: Powered Up Issues related to LEGO Powered Up labels Oct 21, 2022
@dlech dlech added hub: cityhub Issues related to the LEGO Powered Up Smart hub (Hub No. 4) software: pybricks-micropython Issues with Pybricks MicroPython firmware (or EV3 runtime) labels Oct 21, 2022
@dlech
Copy link
Member

dlech commented May 31, 2023

There seem to be several issue here. I made a test from pybricks/pybricks-micropython@d80df1a with the following patch to turn power to the I/O off and back on again before the device connection manager starts running:

diff --git a/lib/pbio/drv/ioport/ioport_lpf2.c b/lib/pbio/drv/ioport/ioport_lpf2.c
index e609e0b5..f605b9e4 100644
--- a/lib/pbio/drv/ioport/ioport_lpf2.c
+++ b/lib/pbio/drv/ioport/ioport_lpf2.c
@@ -466,15 +466,20 @@ PROCESS_THREAD(pbdrv_ioport_lpf2_process, ev, data) {
 
     PROCESS_BEGIN();
 
-    etimer_set(&timer, 2);
+    pbdrv_gpio_out_low(&pbdrv_ioport_lpf2_platform_data.port_vcc);
 
     for (int i = 0; i < PBDRV_CONFIG_IOPORT_LPF2_NUM_PORTS; i++) {
         init_one(i);
     }
 
+    etimer_set(&timer, 500);
+    PROCESS_WAIT_EVENT_UNTIL(ev == PROCESS_EVENT_TIMER && etimer_expired(&timer));
+
     // Turn on power on pin 4 on all ports
     pbdrv_gpio_out_high(&pbdrv_ioport_lpf2_platform_data.port_vcc);
 
+    etimer_set(&timer, 2);
+
     for (;;) {
         PROCESS_WAIT_EVENT();
 
  1. It appears that the I/O port power is turned on in the bootloader. So there is ~280 ms already spent before the device connection manager code starts running. The image below indicates where this time was measured.
    image
    Without the extra patch, the I/O device will time out and revert to 2400 baud before we can detect it. With the patch, we effectively restart the timeout because we power cycle the I/O device. The image below indicates the ~500 off time from the patch. It should also be noted that it does take some time for the I/O device to actually drain power (note how pin 6 stays high for most of the 500 ms).
    image

  2. With the patch, the baud rate request and ack at 115200 baud is working.

    image

    However, once all of the sensor info is sent, the hub doesn't reply with an ACK.

    image

    So it appears the software on the hub may not be seeing the initial ACK and switching to 2400 baud or something like that.

Full log attached. Attachment can be viewed with Saleae Logic 2.
city hub motor sync v3.3.0b5-86-gd80df1a4 plus patch.zip

@laurensvalk
Copy link
Member Author

laurensvalk commented Jun 1, 2023

Are we sure this is right?

    // To get in sync with the data stream from the sensor, we look for a valid TYPE command.
    for (;;) {
        PBIO_PT_WAIT_READY(&data->pt, err = pbdrv_uart_read_begin(data->uart, data->rx_msg, 1, EV3_UART_IO_TIMEOUT));
        if (err != PBIO_SUCCESS) {
            DBG_ERR(data->last_err = "UART Rx error during sync");
            goto err;
        }
        PBIO_PT_WAIT_READY(&data->pt, err = pbdrv_uart_read_end(data->uart));
        if (err == PBIO_ERROR_TIMEDOUT) {
            continue;
        }
        if (err != PBIO_SUCCESS) {
            DBG_ERR(data->last_err = "UART Rx error during sync");
            goto err;
        }

        if (data->rx_msg[0] == (LUMP_MSG_TYPE_CMD | LUMP_CMD_TYPE)) {
            break;
        }
    }

It seems to just read all bytes and accept any value of 64/@, but there are several of those in the sync data.

So it would often hit Bad device type id. Depending on timing, it will go through several more cycles until it happens to see the right @ at the right time.

In the current uartdev I added a workaround to go back to syncing if it sees @ with a bad device ID, because that should be expected in its current form.

@laurensvalk
Copy link
Member Author

laurensvalk commented Jun 1, 2023

Combined with your findings, and assuming this device has no way of kicking it into the first stage of sync mode (unlike the other sensors), waiting for "1.5 cycles" on average is perhaps not a bug but the best we can do.

Aside from boot (this issue), the dcm is just taking quite a bit longer since we changed the AFFIRMATIVE_COUNT, so it's perhaps almost always going to go around again.

@dlech
Copy link
Member

dlech commented Jun 1, 2023

Are we sure this is right?

Yes, it has to get 3 bytes in a row that is a valid type command with checksum, not just one byte.

@dlech
Copy link
Member

dlech commented Jun 1, 2023

has no way of kicking it into the first stage of sync mode

I'm not sure what you mean by this, can you elaborate?

@dlech
Copy link
Member

dlech commented Jun 1, 2023

since we changed the AFFIRMATIVE_MATCH_COUNT

I don't think we actually changed it, only fixed a bug where it was ignored after the first detection. We could probably get away with making it 10 instead of 20.

@laurensvalk
Copy link
Member Author

Are we sure this is right?

Yes, it has to get 3 bytes in a row that is a valid type command with checksum, not just one byte.

But it is only reading one.

@dlech
Copy link
Member

dlech commented Jun 1, 2023

Yes, but then it reads 2 more after that to validate.

@laurensvalk
Copy link
Member Author

laurensvalk commented Jun 1, 2023

Right, and then it goes into the generic error, which triggers the whole uartdev logic to stop, do another passive device check with dcm, and then comes back. That takes very long, and in some unlucky cases it keeps finding the one @ it is not supposed to find, so never sync up.

Instead I'm thinking to just go back to the sync step a few times before giving up and handling the generic error.

sync:

    // To get in sync with the data stream from the sensor, we look for a valid TYPE command.
    for (;;) {
        PBIO_PT_WAIT_READY(&data->pt, err = pbdrv_uart_read_begin(data->uart, data->rx_msg, 1, EV3_UART_IO_TIMEOUT));
        if (err != PBIO_SUCCESS) {
            DBG_ERR(data->last_err = "UART Rx error during sync");
            goto err;
        }
        PBIO_PT_WAIT_READY(&data->pt, err = pbdrv_uart_read_end(data->uart));
        if (err == PBIO_ERROR_TIMEDOUT) {
            continue;
        }
        if (err != PBIO_SUCCESS) {
            DBG_ERR(data->last_err = "UART Rx error during sync");
            goto err;
        }

        if (data->rx_msg[0] == (LUMP_MSG_TYPE_CMD | LUMP_CMD_TYPE)) {
            break;
        }
    }

    // then read the rest of the message

    PBIO_PT_WAIT_READY(&data->pt, err = pbdrv_uart_read_begin(data->uart, data->rx_msg + 1, 2, EV3_UART_IO_TIMEOUT));
    if (err != PBIO_SUCCESS) {
        DBG_ERR(data->last_err = "UART Rx error while reading type");
        goto err;
    }
    PBIO_PT_WAIT_READY(&data->pt, err = pbdrv_uart_read_end(data->uart));
    if (err != PBIO_SUCCESS) {
        DBG_ERR(data->last_err = "UART Rx error while reading type");
        goto err;
    }

    bool bad_id = data->rx_msg[1] < EV3_UART_TYPE_MIN || data->rx_msg[1] > EV3_UART_TYPE_MAX;
    checksum = 0xff ^ data->rx_msg[0] ^ data->rx_msg[1];
    bool bad_id_checksum = data->rx_msg[2] != checksum;

    if (bad_id || bad_id_checksum) {
        DBG_ERR(data->last_err = "Bad device type id or checksum");
        if (data->err_count > 10) {
            data->err_count = 0;
            goto err;
        }
        data->err_count++;
        goto sync;
    }

@dlech
Copy link
Member

dlech commented Jun 1, 2023

which triggers the whole uartdev logic to stop, do another passive device check with dcm, and then comes back

I don't think it used to do that. I think this would be enough. If the sensor is disconnected, the read will time out triggering the goto err.

diff --git a/lib/pbio/src/uartdev.c b/lib/pbio/src/uartdev.c
index 0271e208..dc8dba14 100644
--- a/lib/pbio/src/uartdev.c
+++ b/lib/pbio/src/uartdev.c
@@ -780,6 +780,7 @@ static PT_THREAD(pbio_uartdev_update(uartdev_port_data_t * data)) {
         goto err;
     }
 
+retry:
     // To get in sync with the data stream from the sensor, we look for a valid TYPE command.
     for (;;) {
         PBIO_PT_WAIT_READY(&data->pt, err = pbdrv_uart_read_begin(data->uart, data->rx_msg, 1, EV3_UART_IO_TIMEOUT));
@@ -816,13 +817,13 @@ static PT_THREAD(pbio_uartdev_update(uartdev_port_data_t * data)) {
 
     if (data->rx_msg[1] < EV3_UART_TYPE_MIN || data->rx_msg[1] > EV3_UART_TYPE_MAX) {
         DBG_ERR(data->last_err = "Bad device type id");
-        goto err;
+        goto retry;
     }
 
     checksum = 0xff ^ data->rx_msg[0] ^ data->rx_msg[1];
     if (data->rx_msg[2] != checksum) {
         DBG_ERR(data->last_err = "Bad checksum for type id");
-        goto err;
+        goto retry;
     }
 
     // if all was good, we are ready to start receiving the mode info

@laurensvalk
Copy link
Member Author

laurensvalk commented Jun 1, 2023

That's what the above does, only it doesn't keep trying forever. EDIT: I suppose the timeout handles that too.

@laurensvalk
Copy link
Member Author

I don't think it used to do that.

At least for about 3 years 😄

dlech added a commit to pybricks/pybricks-micropython that referenced this issue Jun 1, 2023
The city hub powers on the I/O ports (pin 4) during boot, so by the
time the device connection manager detected a UART device, in many
cases, it was too late to catch the device on the first sync. In the
case of the BOOST color distance sensor, this meant waiting 10 seconds
instead of 5 for the sensor to be ready.

This adds an option to enable a workaround for the quirk that cycles
power to the I/O ports immediately after boot. This is faster than
waiting for the next sync cycle.

Issue: pybricks/support#747
dlech added a commit to pybricks/pybricks-micropython that referenced this issue Jun 1, 2023
The flush needs to be immediately before the read, otherwise the read
buffer can contain garbage that was read while sending the baud rate.
Also, the ACK usually comes in < 1 ms so we don't need to wait so long.
This improves the chances of not missing data if the ACK doesn't come,
e.g. on the BOOST Color Distance sensor.

Fixes: pybricks/support#747
dlech added a commit to pybricks/pybricks-micropython that referenced this issue Jun 1, 2023
The city hub powers on the I/O ports (pin 4) during boot, so by the
time the device connection manager detected a UART device, in many
cases, it was too late to catch the device on the first sync. In the
case of the BOOST color distance sensor, this meant waiting 10 seconds
instead of 5 for the sensor to be ready.

This adds an option to enable a workaround for the quirk that cycles
power to the I/O ports immediately after boot. This is faster than
waiting for the next sync cycle.

Issue: pybricks/support#747
dlech added a commit to pybricks/pybricks-micropython that referenced this issue Jun 1, 2023
The flush needs to be immediately before the read, otherwise the read
buffer can contain garbage that was read while sending the baud rate.
Also, the ACK usually comes in < 1 ms so we don't need to wait so long.
This improves the chances of not missing data if the ACK doesn't come,
e.g. on the BOOST Color Distance sensor.

Fixes: pybricks/support#747
dlech added a commit to pybricks/pybricks-micropython that referenced this issue Jun 1, 2023
The city hub powers on the I/O ports (pin 4) during boot, so by the
time the device connection manager detected a UART device, in many
cases, it was too late to catch the device on the first sync. In the
case of the BOOST color distance sensor, this meant waiting 10 seconds
instead of 5 for the sensor to be ready.

This adds an option to enable a workaround for the quirk that cycles
power to the I/O ports immediately after boot. This is faster than
waiting for the next sync cycle.

Issue: pybricks/support#747
dlech added a commit to pybricks/pybricks-micropython that referenced this issue Jun 1, 2023
The flush needs to be immediately before the read, otherwise the read
buffer can contain garbage that was read while sending the baud rate.
Also, the ACK usually comes in < 1 ms so we don't need to wait so long.
This improves the chances of not missing data if the ACK doesn't come,
e.g. on the BOOST Color Distance sensor.

Fixes: pybricks/support#747
dlech added a commit to pybricks/pybricks-micropython that referenced this issue Jun 2, 2023
The city hub powers on the I/O ports (pin 4) during boot, so by the
time the device connection manager detected a UART device, in many
cases, it was too late to catch the device on the first sync. In the
case of the BOOST color distance sensor, this meant waiting 10 seconds
instead of 5 for the sensor to be ready.

This adds an option to enable a workaround for the quirk that cycles
power to the I/O ports immediately after boot. This is faster than
waiting for the next sync cycle.

Issue: pybricks/support#747
@dlech
Copy link
Member

dlech commented Jun 2, 2023

This is not completely fixed yet since pybricks/pybricks-micropython#175 didn't get merged.

@dlech dlech reopened this Jun 2, 2023
laurensvalk pushed a commit to pybricks/pybricks-micropython that referenced this issue Jul 10, 2023
The city hub powers on the I/O ports (pin 4) during boot, so by the
time the device connection manager detected a UART device, in many
cases, it was too late to catch the device on the first sync. In the
case of the BOOST color distance sensor, this meant waiting 10 seconds
instead of 5 for the sensor to be ready.

Something similar was happening on Technic Hub, where it would always
miss the first synchronization cycle.

This adds a workaround that cycles power to the I/O ports immediately after
boot. This is faster than waiting for the next sync cycle.

Issue: pybricks/support#747
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working hub: cityhub Issues related to the LEGO Powered Up Smart hub (Hub No. 4) platform: Powered Up Issues related to LEGO Powered Up software: pybricks-micropython Issues with Pybricks MicroPython firmware (or EV3 runtime)
Projects
None yet
Development

No branches or pull requests

2 participants