Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rx fifo latency fix #4328

Merged
merged 17 commits into from
Mar 8, 2018
Merged

Rx fifo latency fix #4328

merged 17 commits into from
Mar 8, 2018

Conversation

mribble
Copy link
Contributor

@mribble mribble commented Feb 8, 2018

Update uart to flush the rx fifo when checking available bytes in fifo. This gives a more correct result rather than waiting until either the fifo is full or until a serial rx timeout occurs.

…set options. Then only offer all these options for the generic modules.
… more correct result rather than waiting until either the fifo is full or until a serial rx timeout occurs.
@mribble
Copy link
Contributor Author

mribble commented Feb 8, 2018

See issue #4278 for more info.

@@ -119,6 +119,16 @@ size_t uart_rx_available(uart_t* uart)
if(uart == NULL || !uart->rx_enabled) {
return 0;
}
ETS_UART_INTR_DISABLE();
while((USS(uart->uart_nr) >> USRXC) & 0x7F){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this modification, bytes which do not fit into rx_buffer will be dropped when calling uart_rx_available, which is a bit unexpected. Effectively, this reduces the total FIFO size from the (hardware FIFO size) + (software FIFO size) to just (software FIFO size), if uart_rx_available is called.

Shouldn't uart_rx_available meerly be checking the number of bytes in hardware FIFO and adding it to the size of rx_buffer? Actual copying can happen in uart_read_char function, if the rx_buffer is empty.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@igrr you are right. However, I suspect that the objective of the hw buffer is to accumulate what is coming in before it's handed off upwards (i.e.: driven by the rate of incoming bytes), while the software buffer's objective is to accumulate before it's handed off to the application (driven by the app's rate of service).
The issue described is that, if the interrupt is being triggered only when the FIFO is full, then it is essentially behaving like the overflow interrupt. In other words, the interrupt is triggered only when the hw FIFO is completely full, and the next byte would drop data. The following has to happen to avoid dropping bytes at hardware level:

  • interrupt trigger
  • jump to isr
  • execute isr code to xfer data from the fifo

If it happens that the fifo is full and the interrupts are off for the moment because of something else, then the next incoming byte(s) can be dropped.

Per the document reference above, the trigger is when the number of bytes in the FIFO is greater than the configured threshold. I understand from that reading that changing that threshold doesn't change the size of the FIFO. In other words, if the threshold is set to something lower than the full FIFO size of 0x7F, it will trigger the interrupt more often, but also allow buffer space for bytes coming for the latency between interrupt trigger and isr execution.
I think that explains @mribble 's original test that seemed to work:

I started trying things. I found if I reduce UCFFT from 127 to 7 in this line of uart.c:
USC1(uart->uart_nr) = (127 << UCFFT) | (0x02 << UCTOT) | (1 <<UCTOE );

If that 127 is the threshold, then it is currently set to the full FIFO size.

A question: what happens when the number of bytes that come in is smaller than the threshold? It seems to me that the isr isn't called, so do those bytes just sit in the hw FIFO?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, if the threshold is set to something lower than the full FIFO size of 0x7F, it will trigger the interrupt more often, but also allow buffer space for bytes coming for the latency between interrupt trigger and isr execution.

This is correct!

A question: what happens when the number of bytes that come in is smaller than the threshold?

There is a timeout feature — set by UCTOE and UCTOT fields. When FIFO is not empty, and no new bytes arrive in given amount of time, an interrupt fires.

Copy link
Contributor Author

@mribble mribble Feb 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@devyte - There is also a timeout check called UCTOT. Our code sets that to 2. I couldn't find any document that tells me the units for that 2, but I think 2 milliseconds make sense so maybe that's it. That pdf says the timeout threshold is measured just for inactivity so if my ms guess is correct that means if you don't write anything to the uart then after 2ms that timeout occurs and the rx fifo would be drained by the isr.

@igrr You make a good point about the buffer overflowing. However, I don't think just checking the bytes of data in fifo is correct. Then if you were actually trying to read those bytes you'd get the wrong results because you'd be trying to read bytes that haven't been moved into the software buffer yet.

OPTION_1
I think the simplest fix would be to copy over however many bytes from the fifo that can fit into the software buffer and then return however many bytes are in the software buffer. This means the largest number of available bytes that could be returned is 256.

Here's an example. Let's say you have 250 bytes in the software buffer and 50 bytes in the hw fifo. In the available check you'd move 6 bytes from the fifo (now has 44 bytes) and the software buffer would have 256 bytes which is what we'd report back to the user.

OPTION_2
Another option would be to return the total number of bytes in the fifo and sw buffer. However, then in read and peek functions you'd need to add code to get from the fifo if there is data there.

I think I like option 2 better. @igrr if you agree I can code up Option 2 and have you review it. Here is a summary of how I'd implement it:

  • Return the sum of hardware fifo and sw buffer bytes in uart_rx_available()
  • In both uart_peek_char() and uart_read_char() I'd add that loop to copy all the bytes from the hardware fifo to the software buffer. But we would only do this if the software buffer is empty, which would mean we'd never overflow the software buffer and would also minimize the number of times we go into that loop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also like the 2nd option. I think the copying loop can be moved to a separate inline function, and called from ISR and read/peek functions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined to agree. Here is my take on what is needed:

  • uart_rx_available should be modified to return the sum of available bytes in the sw buffer + the number of available bytes in the hw FIFO. No transfer of bytes from FIFO to sw should be done.
  • uart_peek_char() and uart_read_char() should be modified as follows: if there are bytes in the FIFO, transfer as many bytes as can fit into the sw buffer. After that, proceed same as now.
  • reduce 127 to about 107. My reasoning: if 2ms time is taken as a guideline, which is a huge estimation for upper bound of interrupt latency, then about 20 bytes can come in at 115kbps considering 11bits/byte. So reducing the threshold by 20 bytes should give ample time to service the FIFO, and still not overflow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks guys. I'll work on a change an upload it soon.

@devyte, we don't need to reduce the fifo size because the interrupt happens either when the fifo is full or when there is that timeout. So that code should work as it is now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mribble this is the sequence that I think is problematic:

  1. the FIFO gets full
  2. an interrupt is triggered
  3. there can be latency between the trigger and the calling of the uart isr (e.g.: something else is going on that has interrupts disabled)
  4. during this latency, more bytes come in, but the FIFO is full, so data is dropped
  5. the isr executes and bytes are transferred from the FIFO. Too late, data has already been dropped.

Per my understanding of the document that you referenced, reducing the number from 127 to 107 doesn't reduce the FIFO size. It only changes when the isr is called. In other words, the isr is called when the FIFO has 107 bytes instead of 127 bytes. With this reduction, the problem sequence becomes this:

  1. the FIFO fills to 107 bytes
  2. an interrupt is triggered
  3. there can be latency between the trigger and the calling of the uart isr (e.g.: something else is going on that has interrupts disabled)
  4. during this latency, more bytes come in, but the FIFO still has 20 bytes of space, so that's ok
  5. the isr executes and bytes are transferred from the FIFO. No data was dropped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@devyte - That's true of any of the stream hardware protocols (spi, i2c, uart) so you might want to consider how all those work too. In other systems I've worked with, you require IRS to both be short enough and infrequent enough that they don't cause the problem you described.

That said what you are saying seems like it would make things more robust in some case. I don't contribute much to this project though so I won't try to decide how this should be handled. @igrr is very involved so maybe he'll have an opinion. That said, this is a different feature than what I'm working on here so if you do that I think it should be done with a seperate push request.

@mribble
Copy link
Contributor Author

mribble commented Feb 8, 2018

I just checked in some code to implement what we discussed.

I realized there was an existing bug where we would drop drop bytes where we copied from the fifo to the the buffer so I fixed that by breaking out of the loop.

Here's an example of this problem. Say you have 240 bytes in the rx_buffer (we have space for 16 more bytes). Then the rx_fifo fills up. Previously we would copy over the first 16 bytes from the hw_fifo and then drop the next 112 bytes. With my change we'll copy over 16 bytes and leave 112 bytes in the fifo.

Beyond expanding the size of the rx buffer today, this would have broken uart control flow which depends on the fifo filling up. The esp8266 arduino code doesn't have a way to enable uart control flow, but the hardware supports it and the esp8266_peri.h seems to have the register defines for it too. My project is wired to use this feature, but the other chip talking to the esp8266 (a sam3x) has some uart code that would also need to enable uart control flow for me to test it. Enabling this is not something I plan to do in the short term. This would also require a UI exposed all the way out to user and then documentation explaining. It's a big change even though at this low level it should just be setting a few bits.

inline void uart_rx_copy_fifo_to_buffer(uart_t* uart) {
while(uart_rx_fifo_available(uart)){
size_t nextPos = (uart->rx_buffer->wpos + 1) % uart->rx_buffer->size;
if(nextPos != uart->rx_buffer->rpos) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use inverted logic. Instead, use direct logic to detect an early return or break, and leave the 3 lines of code free-standing after that.
E.g.:
if(nextPos == rpos)
return;
uint8_t data = ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the code would look like this:
'
inline void uart_rx_copy_fifo_to_buffer(uart_t* uart) {
while(uart_rx_fifo_available(uart)){
size_t nextPos = (uart->rx_buffer->wpos + 1) % uart->rx_buffer->size;
if(nextPos == uart->rx_buffer->rpos) {
// Stop copying if rx buffer is full
break;
}
uint8_t data = USF(uart->uart_nr);
uart->rx_buffer->buffer[uart->rx_buffer->wpos] = data;
uart->rx_buffer->wpos = nextPos;
}
}
'

I agree that code looks better, but I do have a concern. The line that initizes data with a value from the fifo also drains the fifo by 1 byte. So that line of code really can't be executed in the case we want to execute the break. I'm not sure if the compiler is allowed to rearrange that line to be before the if. If it can do that with any optimization level then we'd have a bug. So while it isn't quite as clean I'd suggest putting that code in an else (I can keep the current order though so the break is first). However, if you are sure this is legal then I can make that change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind. It's late and I was overthinking this. Of course a compile can't reorder this if it might affect results. I'll test this change with my test case and if it passes will push the change.

@devyte
Copy link
Collaborator

devyte commented Feb 9, 2018

@mribble I think you're right: if this is enough to fix your issue, then I won't pursue the threshold change. I'll just keep that in my notes for now.
Kudos on the PR, nice and structured with a function for each concept. It's soothing on the eyes :) my one comment is just to reduce complexity one level, makes it just a bit more readable.

@mribble
Copy link
Contributor Author

mribble commented Feb 9, 2018

@devyte, I think the code should have the change you requested implemented. Please review again and let me know if you want anything else changed. Thanks for helping with this!

@sticilface
Copy link
Contributor

Since the update to a hardware FIFO buffer II have really struggled to keep my adalight implementation working. It was stable as a rock prior to this, but now only when using 115200 baud and even then it will continually crash the ESP. With this pull request initial testing has it working at 2000000 baud again without crashing instantly (which it would do prior to this).

I made an issue here #2576

which i believe will all be related.

@sticilface
Copy link
Contributor

In fact, increasing the buffer to 1024 has it working almost perfectly at 2000000 which it has not done for over a year. very well done @mribble!

@mribble
Copy link
Contributor Author

mribble commented Feb 14, 2018

@devyte could you approve this change? I would like to get this code merged into master.

I am also planning to implement that other change devyte suggested where I trigger the rx interrupt earlier, but I will do that as a separate pull request (unless you guys think I should do it as part of this one).

@igrr
Copy link
Member

igrr commented Feb 14, 2018

@mribble this can be merged after 2.4.1.

@mribble
Copy link
Contributor Author

mribble commented Feb 14, 2018

Waiting sounds good to me. I agree there is some risk whenever changing low level code that is used by so many apps. That said this should really help robustness (it did for me and it sounds like @sticilface also found improvements. (I know about the issue milestone lists, but is there anything describing more of a timeline for these releases? No response needed if there isn't anything, but if there is I'd be interested in a link to it.)

Since we're waiting, I decided to just combine the other change with this one. It's a one liner:

USC1(uart->uart_nr) = (100 << UCFFT) | (0x02 << UCTOT) | (1 <<UCTOE );

So basically this will ask the esp8266 to trigger the rx fifo full buffer at 100 bytes instead of 127 bytes. This means the fifo has 28 bytes before overflow instead of 1 more byte. This makes things more robust when there are a lot of ISRs that are taking longer than they probably should. I tested this with my app and as expected it didn't make a difference. sticilface, it would be interesting to see if this new version has an impact on your test case. I would expect it to either make no difference or make things better in stressful uart case.

@tonton81
Copy link

i havnt tested this patch yet but i do run esp8266 <-> teensy running at 3 megabaud (3000000) and im able to transfer data back/forth in tight loops without crashing or data loss.

@devyte
Copy link
Collaborator

devyte commented Feb 16, 2018

@igrr I carefully traced the changed code logic, and also did a smoke test. I think this can be merged now, but I'm ok waiting until after 2.4.1 .

@devyte devyte added this to the 2.5.0 milestone Feb 17, 2018
@devyte devyte self-assigned this Feb 17, 2018
@sticilface
Copy link
Contributor

I've had a chance to test this, and it is a lot better original hardware serial. I can now use 2,000,000 baud without instant crashing! The original pull that i tested (from https://github.com/mribble/Arduino-1/commit/d1d17d2fb172bb4119aa8ec24922efd736ab260d) I think was still quite unstable. it would cause crashing (much improved) but some crashing nonetheless. the latest seems to crash a lot less but it is giving me a very jittery adalight effect. What is strange is that it works fine for the plain effects put out by hyperion. so it could be something rate related. I'm going to say, much improved but still not quite there. I'll take a look my code and see if there is anything there.

@mribble
Copy link
Contributor Author

mribble commented Feb 22, 2018

@sticilface I can't say anything with certainty, but I suspect your app is doing some bad things with serial. My changes shouldn't fix any crashes, but it should make it less likely you drop data in cases where you are streaming lots of data over serial. Without hardware flow control you can't completely fix the data dropping issue so if it is dropped data that is causing you issues then there is no way to fix it all the time for all apps. There are things you could do in your app to make sure it doesn't happen though. Like checking/draining the rx serial fifo often and making sure you don't disable interrupts for very long.

@sticilface
Copy link
Contributor

I think i might have been too quick to comment there about the pauses. I had debugging enabled for another part of the program, which outputs to serial, this will throw off the hyperion. I've removed it and now it has been running for 48min! no crashes, no glitches.

I think the crashes were a combination of running a huge amount of stuff on the ESP. I'm running web server, mqtt client, OTA, WS2812 and then the specific app which in this case is high throughput serial receiving. It was able to do all this with the software serial, but with the change to hardware serial, it just WDT all the time. I suspect it is to do with the changes you have implemented as now it seems to be running rather nicely. As with all these changes i'll need to just run them for a while to see if it works. but 45minis is a record.

@tonton81
Copy link

just so you know i use 2 and 3 megabaud with my tests, no issues, im surprised you have issues patching it while i can stream 500 byte CRC verrified packets in bursts :P

@devyte
Copy link
Collaborator

devyte commented Feb 28, 2018

@sticilface any further news from your testing?

@sticilface
Copy link
Contributor

yes. I would say that it is hugely improved! I highly recommend this pull request.

@s0170071
Copy link

doesn't fix #3869.

bryceschober pushed a commit to bryceschober/Arduino that referenced this pull request Apr 5, 2018
* Flush the rx fifo when checking available bytes in fifo.  This gives a more correct result rather than waiting until either the fifo is full or until a serial rx timeout occurs.

* When rx_avaiable is checked return rx_buffer plus rx_fifo.  Then during rx_read and rx_peek functions copy over the data in the fifo as needed.

* Clean up early out case.

* Set the rx full fifo ISR to trigger a little sooner.  This makes the uart rx isr more robust in cases where the ISR can't trigger very fast

(cherry picked from commit 7f0b9d1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants