Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

USB Improvements #344

Merged
merged 1 commit into from
Mar 17, 2024
Merged

USB Improvements #344

merged 1 commit into from
Mar 17, 2024

Conversation

Hylian
Copy link
Contributor

@Hylian Hylian commented Dec 28, 2023

  • Introduce shell module for basic serial shell with argument parsing
  • Introduce shell_cmd_list module for basic compile-time command registration
  • Harden USB handling to hang less and drop fewer inputs
    • Service tud_task() with periodic TC0 timer interrupt
    • Handle CDC write/reads in main app loop
    • Handle shell servicing in main app loop
    • Increase TinyUSB CDC RX/TX buffers to 128 bytes
    • Add an additional buffering layer for writes to TinyUSB
    • Add an additional circular buffering layer for reads from TinyUSB
  • Change newline prints to also send carriage return
  • Refactor filesystem commands for shell subsystem
  • Introduce new shell commands:
    • 'help' command
    • 'flash' command to reset into bootloader

Testing:

  • Shell validated on Sensor Watch Blue w/ Linux host
  • Shell validated in emscripten emulator

@Hylian
Copy link
Contributor Author

Hylian commented Dec 28, 2023

Sorry for the huge patch! USB serial was consistently hanging for me, and one thing led to another while digging into the issue. Since we're not running on an RTOS, TinyUSB seems a bit finicky about how we service it. But this setup seems fairly stable for me. Only tested against a Linux host, not macOS.

Copy link
Contributor

@wryun wryun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caveat: I have no privileges in this repo, just an interested bystander :)

This is great. I really appreciate that you refactored the command code and made the serial code actually do something useful.

Without trying it, I'm not entirely convinced the 256 byte buffer is going to solve all our problems (see comment), but this may be because I don't properly understand the constraints (I have basically no experience with this sort of the low level stuff). Any explanations gratefully received.

if (s_write_buf_len > 0) {
int32_t to_write = s_write_buf_len;
size_t written = 0;
// Write in chunks of 32 bytes at most
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the sort of comment that I like having a 'because' on.

It seems to me you're trying to fight against the code in tud_cdc_write, which automatically flushes at > 64 bytes (i.e. theoretically you don't need the intermediate tud_cdc_write_flushes here if this was reasonable).

Should we just be patching tinyusb?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You caught me using weasel words in the comments. :) To be honest, I bashed out the USB tweaks months ago while trying to get prints working, and didn't leave any comments for my future self. It's entirely possible that this change isn't needed, but it's been stable for me, so I figured I should just get what I have into a PR for now. From what I recall, it was quite sensitive to the TC0 timer period + how much data we try to place in the FIFO, and required a lot of guess-and-checking of various parameters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I tried hooking _write() directly up to tud_cdc_write(), and it more or less seems to work. Just need to make sure CFG_TUD_CDC_TX_BUFSIZE is large enough, and need to call tud_cdc_write_flush(), or the data won't actually be sent until several characters are written. I'll play around with it and push an updated patch.

* Basic write buffer for USB CDC serial.
*/

#define CDC_WRITE_BUF_SZ (256)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the implication of this is (afaik) that we can't read or write more than 256 bytes per loop, right? This seems a little low to me (e.g. it might end up interfering with debugging statements, and might make filesystem_cat malfunction?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, anything longer than this will get truncated until the next loop. _write() will properly return 0 bytes written, so the caller does have a chance to handle this scenario.

#include "tusb.h"

/*
* Basic write buffer for USB CDC serial.
Copy link
Contributor

@wryun wryun Dec 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen if we didn't have this buffer and just relied on TinyUSB's buffer like before? i.e. just did tud_cdc_write immediately in _write but increased the TinyUSB buffer size.

If the reason is that there's some kind of relationship because tud_task and cdc_task, then it might be a problem that their calling locations have now been separated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my initial approach, but I just wasn't able to get it working reliably. I suspect it should be possible, but tud_cdc_write() does have the side effect of potentially dropping into USB transfer code when it flushes, which I suspect was triggering some programmatic errors.

As for tud_task, and cdc_task, they're intentionally separated, as it turns out the tud user-facing API should not be called from an interrupt context. I originally had it all serviced in the TC0 ISR, and it started behaving much better once I broke it out.

movement/shell_cmd_list.h Outdated Show resolved Hide resolved
@@ -78,8 +78,11 @@ int main(void) {

while (1) {
bool usb_enabled = hri_usbdevice_get_CTRLA_ENABLE_bit(USB);
bool can_sleep = app_loop();
if (usb_enabled) {
watch_handle_usb();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this is added because we can easily loop faster than TC0 fires.

This does sort of make me wonder if we need to use TC0 for the usb handling at all, though I guess it defends us from dodgy watch face code...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, just realized I'm calling tud_task() both here and in the TC0 handler. That's definitely incorrect, standby...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, yeah, only calling tud_task() from main loop immediately breaks things, and only calling it from the TC0 ISR works fine. So I think this is just a leftover hunk from testing that I forgot to clean up.

@wryun
Copy link
Contributor

wryun commented Dec 29, 2023

As a data point, when I did the 'what happens if we have no buffers' experiment here it mostly seemed to work: #212

The issue was I made it a blocking write (obviously wrong), and I believe this failed if there was something that needed to be done first (e.g. if you pasted a lot into the buffer, it wouldn't let you write anything because something was still sitting in read, so it would block forever).

This makes me wonder if it would be possible to do something like a blocking write with a 'now handle any reads' inside the write. i.e. basically make sure any read-type usb operation would work. I know this seems a bit wrong from the perspective of actually running the watch code properly, but USB mode is quite different... (and doesn't have the display attached).

EDIT: actually, getting my head back into this, https://github.com/micropython/micropython/blob/d68e3b03b1053a6de0c7eb28f5989132c138364b/ports/samd/mphalport.c does not have a separate write buffer and instead uses a timeout.

@Hylian
Copy link
Contributor Author

Hylian commented Dec 29, 2023

Went back in and tested various ways of calling tud read directly inside _read(). They all resulted in dropped inputs, which was the original behavior I was seeing. Not sure on the root cause. It seems happy without the extra write buffering though, so I'm inclined to remove the write buffer.

Edit: IIRC, I messed around with with adding critical sections to prevent TC0/USB interrupt servicing, as well. And then also calling some of the internal tud servicing functions in a loop to keep kicking it. I don't think that went anywhere.

// Test: Using the read API as advertised
// Result: Drops inputs, usually takes sending multiple characters for a 
//         single input to go through.
int _read(int file, char *ptr, int len) {
    (void) file;

    if (ptr == NULL || len <= 0) {
        return -1;
    }

    if (tud_cdc_available() == 0) {
        return -1;
    }
    return tud_cdc_read((void *) ptr, len);
}

// Test: Try only reading a single char in _read()
// Result: Same as above
int _read(int file, char *ptr, int len) {
    (void) file;

    if (ptr == NULL || len <= 0) {
        return -1;
    }

    if (tud_cdc_available() == 0) {
        return -1;
    }

    char ret = tud_cdc_read_char();
    //tud_cdc_read_flush(); // Tested with and without flush.
    if (ret < 0) {
        return -1;
    }
    *ptr = ret;
    return 1;
}

// Test: Read a char at a time in a loop while FIFO has data
// Result: Same as above, maybe drops less frequently?
int _read(int file, char *ptr, int len) {
    (void) file;

    if (ptr == NULL || len <= 0) {
        return -1;
    }

    while (tud_cdc_available()) {
        char ret = tud_cdc_read_char();
        tud_cdc_read_flush();
        if (ret < 0) {
            return -1;
        }
        *ptr = ret;
        return 1;
    }
    return -1;
}

// Test: Circular buffer in cdc_task() as in original PR.
// Result: Works as expected.

@Hylian
Copy link
Contributor Author

Hylian commented Dec 29, 2023

Changes:

  • Removed unneeded tud_task() call in main loop, tud_task() is serviced by TC0 ISR only
  • Renamed sShellCommand -> shell_command_t
  • Tuned CFG_TUD_CDC_TX_BUFSIZE, CFG_TUD_CDC_RX_BUFSIZE, and CFG_TUD_CDC_EP_BUFSIZE back down to 64
  • Tuned CDC_WRITE_BUF_SZ up to 1024
  • New shell command: stress
    • Stress tests CDC serial writes of various lengths, with optional delay parameter

@Hylian
Copy link
Contributor Author

Hylian commented Dec 29, 2023

With the new stress command, I evaluated _write() across implementations and parameters:

  • Direct tud write call vs write buffering
  • Size of CFG_TUD_CDC_TX_BUFSIZE
  • Delay between printf() calls
  • Size of CDC write buffer

Results: https://gist.github.com/Hylian/71d6e1c30b70c1a6d23671729d0b2865

Analysis:

With direct tud writes, characters will start dropping after a certain amount of writes. Increasing CFG_TUD_CDC_TX_BUFSIZE or increasing the delay between writes will make it go further before starting to drop characters. Given that the printf() is a blocking call, TinyUSB is only being serviced by the TC0 ISR, and the flush() calls within _write().

With the buffered approach (+ chunked 32 byte writes), writes are extremely consistent, with no dropped characters. However, the number of bytes that you can write in one app loop is clamped by CDC_WRITE_BUF_SZ.

Given this, I think keeping the write buffer is a reasonable option, as it has consistent behavior independent of the user app loop. The tradeoff being the limit on maximum characters per loop. I've adjusted the PR to bump CDC_WRITE_BUF_SZ to 1024, and bumped the TUD buffers back down to 64, as 128 isn't necessary.

EDIT: Alternative approaches would be to rate limit writes, come up with a smarter method of servicing tusb, or figure out if there's a tusb backpressure mechanism we can use (fifo size?). But I don't have a ton of time to dig into those right now, so I'd like to get this merged first.

@wryun
Copy link
Contributor

wryun commented Dec 30, 2023

With the buffered approach (+ chunked 32 byte writes), writes are extremely consistent, with no dropped characters

The way I read the results is that the double-buffered approach drops more characters for the same total buffer size in all situations ;)

True, which characters it drops within a loop are more predictable, but you could get that effect without the double buffering by introducing a counter (i.e. track how many chars you've written in that loop, stop writing at a certain size, then reset this counter at the next loop). I'm not seeing that the buffering adds anything. My instinct would be to prefer the micropython approach (i.e. write directly, but add a bit of a wait so that we can handle it when the only issue is that the code is pushing faster than the baud as opposed to something else blocking our writes).

@wryun
Copy link
Contributor

wryun commented Dec 30, 2023

(very much appreciate you did the tests, though, despite my whinging!)

@Hylian
Copy link
Contributor Author

Hylian commented Dec 30, 2023

Haha, fair enough, I'll stop being lazy and see if I can get some sort of backpressure working.

@Hylian
Copy link
Contributor Author

Hylian commented Dec 30, 2023

Update: I did get writes working fairly well using an exponential backoff delay in _write(), but I then ran into the other issue I was seeing- if you type into the console while it's writing, it can cause the whole stack to crash. I think that's why serializing reads and writes into cdc_task() helped prevent crashes.

@wryun
Copy link
Contributor

wryun commented Dec 31, 2023

Are you blocking indefinitely on the write now? I would backoff for a bit then give up, similar to the micropython approach, which should prevent any freeze-ups.

Of course, if you're seeing some other kind of crash...

(my assumption is that not ever servicing the reads - or some other usb interaction - can lead to the inability to write at all after the buffer is filled... i.e. if we wanted a truly blocking write we'd have to service any usb 'thing' inside the blocking write)

@Hylian
Copy link
Contributor Author

Hylian commented Dec 31, 2023

The issue isn't the write per-se, but the moment a read comes in. I'm not really confident about figuring it out without attaching a debugger at this point, since I can't exactly printf debug over USB. :)

Looking at micropython's implementation for nrf, they actually do something similar: https://github.com/micropython/micropython/blob/9feb0689eeaca5ce88aedcc680f997a3b4d0221c/ports/nrf/drivers/usb/usb_cdc.c#L124

They serialize reads + writes in cdc_task(), and use a circular ring buffer for tx.

@Hylian
Copy link
Contributor Author

Hylian commented Dec 31, 2023

Ok, I think I've come up with a decent solution to handle reads and writes outside the app loop. I set up TC1 and ran cdc_task() there at a lower prio than TC0. Main issue was if you spammed input during large prints, and the read fifo was not read from quickly enough, it would crash. After making tx defer to rx in cdc_task() and adding a delay to _write(), it doesn't crash even in my worst case stress tests.

@wryun
Copy link
Contributor

wryun commented Dec 31, 2023

Nice!

(Re micropython, you linked to the NRF port. If you go to the SAMD one I linked earlier, it appears they don't have an intermediate buffer:
https://github.com/micropython/micropython/blob/d68e3b03b1053a6de0c7eb28f5989132c138364b/ports/samd/mphalport.c#L190
)

* Introduce shell module for basic serial shell with argument parsing
* Introduce shell_cmd_list module for basic compile-time command
  registration
* Harden USB handling to hang less and drop fewer inputs
  - Service tud_task() with periodic TC0 timer interrupt
  - Service cdc_task() with periodic TC1 timer interrupt
  - Handle shell servicing in main app loop
  - Add a circular buffering layer for reads/writes
* Change newline prints to also send carriage return
* Refactor filesystem commands for shell subsystem
* Introduce new shell commands:
  - 'help' command
  - 'flash' command to reset into bootloader
  - 'stress' command to stress CDC writes

Testing:
* Shell validated on Sensor Watch Blue w/ Linux host
* Shell validated in emscripten emulator
* Tuned by spamming inputs during `stress` cmd until stack didn't crash
@Hylian
Copy link
Contributor Author

Hylian commented Jan 7, 2024

Ready for review! After inspecting the micropython code, I decided to make both reads and writes use a circular buffer. I then serviced the buffer in the TC1 timer ISR, at a lower priority than TC0. Read/write flushing is now independent from the app loop. Reads are also handled in the middle of large writes, to not crash the stack. Tuned by calling stress and spamming keyboard inputs, until it stopped crashing.

@WesleyAC
Copy link
Collaborator

I don't have a lot of cycles to review this, but given that USB is AFAICT totally broken right now, and this code doesn't really touch anything outside of USB-land, I'm fairly inclined to lean towards just merging this. If anyone has concerns about that, speak up now! (cc @joeycastillo?)

@Hylian I am curious about the \r\n line endings in the output, is there a particular reason for that? If it's just preference, I'd personally prefer that we stay unixy :)

@Hylian
Copy link
Contributor Author

Hylian commented Jan 21, 2024

@WesleyAC Thanks for taking a look!

/r/n is required/idiomatic, as most serial terminals (such as minicom) will not automatically insert a carriage return upon newline. In these clients, without a CR, the cursor will just move down a row without resetting the column position, and text will keep scrolling to the right.

@Hylian
Copy link
Contributor Author

Hylian commented Feb 10, 2024

Ping :)

Is this good to be merged? If there are any specific tests you'd like to see, I can try and run them.

Copy link
Collaborator

@matheusmoreira matheusmoreira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me! Is it OK if I merge this on my branch?

@matheusmoreira
Copy link
Collaborator

matheusmoreira commented Feb 27, 2024

@WesleyAC

I am curious about the \r\n line endings in the output, is there a particular reason for that? If it's just preference, I'd personally prefer that we stay unixy :)

\r\n is actually the correct escape sequence as per teletypewriter semantics. Basically computing dates back to telegraphs, teleprinters and teletypewriters and everything thinks that they're talking to those old things literally to this day. Imagine an actual typewriter: \r is just like when you push the literal carriage back to make it return to the beginning of the paper margin while \n is like scrolling the paper down while maintaining the carriage position.

Linux kernel's terminal subsystem has options for automatically translating between \r\n and \n for the benefit of applications -- the so called cooked mode. This can be turned off, just like character echoing.

For more details:

TTY demystified
History of the TTY

@Hylian
Copy link
Contributor Author

Hylian commented Mar 4, 2024

@matheusmoreira Go for it! The more test time we can get on the patch, the better. :) In particular, I haven't really used it on my wrist much. None of the code should really run when not plugged into USB, but always good to check.

@matheusmoreira
Copy link
Collaborator

@Hylian Great, I'll go ahead and merge this into my branch then. I plan to merge as many pull requests as possible before flashing them into my watch for daily use.

This also needs tested-on-hw tag since you have already tested it:

Shell validated on Sensor Watch Blue w/ Linux host

matheusmoreira added a commit to matheusmoreira/sensor-watch that referenced this pull request Mar 5, 2024
 - Change newline prints to also send carriage return
 - Introduce shell module for serial shell with argument parsing
 - Introduce shell command list for compile time command registration
 - Refactor file system commands for shell subsystem
 - Introduce new shell commands:
   - 'help' command
   - 'flash' command to reset into bootloader
   - 'stress' tests CDC serial writes of various lengths
     - optional delay parameter
 - Harden USB handling
   - Hangs less
   - Drops fewer inputs
 - Circular buffers for both reads and writes

Reported-by: Edward Shin <[email protected]>
Tested-by: Edward Shin <[email protected]>
Reviewed-by: James Haggerty <[email protected]>
Reviewed-by: Wesley Aptekar-Cassels <[email protected]>
Reviewed-by: Matheus Afonso Martins Moreira <[email protected]>
Signed-off-by: Matheus Afonso Martins Moreira <[email protected]>
GitHub-Pull-Request: joeycastillo#344
matheusmoreira added a commit to matheusmoreira/sensor-watch that referenced this pull request Mar 5, 2024
 - Change newline prints to also send carriage return
 - Introduce shell module for serial shell with argument parsing
 - Introduce shell command list for compile time command registration
 - Refactor file system commands for shell subsystem
 - Introduce new shell commands:
   - 'help' command
   - 'flash' command to reset into bootloader
   - 'stress' tests CDC serial writes of various lengths
     - optional delay parameter
 - Harden USB handling
   - Hangs less
   - Drops fewer inputs
 - Circular buffers for both reads and writes

Reported-by: Edward Shin <[email protected]>
Tested-by: Edward Shin <[email protected]>
Reviewed-by: James Haggerty <[email protected]>
Reviewed-by: Wesley Aptekar-Cassels <[email protected]>
Reviewed-by: Matheus Afonso Martins Moreira <[email protected]>
Signed-off-by: Matheus Afonso Martins Moreira <[email protected]>
GitHub-Pull-Request: joeycastillo#344
matheusmoreira added a commit to matheusmoreira/sensor-watch that referenced this pull request Mar 5, 2024
 - Change newline prints to also send carriage return
 - Introduce shell module for serial shell with argument parsing
 - Introduce shell command list for compile time command registration
 - Refactor file system commands for shell subsystem
 - Introduce new shell commands:
   - 'help' command
   - 'flash' command to reset into bootloader
   - 'stress' tests CDC serial writes of various lengths
     - optional delay parameter
 - Harden USB handling
   - Hangs less
   - Drops fewer inputs
 - Circular buffers for both reads and writes

Reported-by: Edward Shin <[email protected]>
Tested-by: Edward Shin <[email protected]>
Reviewed-by: James Haggerty <[email protected]>
Reviewed-by: Wesley Aptekar-Cassels <[email protected]>
Reviewed-by: Matheus Afonso Martins Moreira <[email protected]>
Signed-off-by: Matheus Afonso Martins Moreira <[email protected]>
GitHub-Pull-Request: joeycastillo#344
matheusmoreira added a commit to matheusmoreira/sensor-watch that referenced this pull request Mar 5, 2024
 - Change newline prints to also send carriage return
 - Introduce shell module for serial shell with argument parsing
 - Introduce shell command list for compile time command registration
 - Refactor file system commands for shell subsystem
 - Introduce new shell commands:
   - 'help' command
   - 'flash' command to reset into bootloader
   - 'stress' tests CDC serial writes of various lengths
     - optional delay parameter
 - Harden USB handling
   - Hangs less
   - Drops fewer inputs
 - Circular buffers for both reads and writes

Reported-by: Edward Shin <[email protected]>
Tested-by: Edward Shin <[email protected]>
Reviewed-by: James Haggerty <[email protected]>
Reviewed-by: Wesley Aptekar-Cassels <[email protected]>
Reviewed-by: Matheus Afonso Martins Moreira <[email protected]>
Signed-off-by: Matheus Afonso Martins Moreira <[email protected]>
GitHub-Pull-Request: joeycastillo#344
@matheusmoreira
Copy link
Collaborator

Am now running this code on the watch. No issues so far.

@theAlexes theAlexes merged commit 35c0a4b into joeycastillo:main Mar 17, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants