-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ethos: move bulk of state machine out of ISR context #17265
ethos: move bulk of state machine out of ISR context #17265
Conversation
Forgot to mention: As the stdio now also needs special handling in the thread layer, this moves the |
I'm having a look, but I'm not too familiar with the ethos subsystem overall. If we already had subsystem maintainer reviews, would you consider yourself suitable as an ethos subsystem maintainer? If so, I think I can provide sufficient review, otherwise I'm still doing the same but would ask around for someone who can give this a high-level Go just from the description you provide in the PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to concrete code changes (at least such that GitHub would let me annotate):
- In stdio_uart,
#ifdef M_S_ETHOS
/uart_init(ETHOS_UART, ...);
remains; is that intentional? (Seems like something was missed in the split).
Other than that (and barring testing which I'd do on the complete thing), the ethos move LGTM.
Arghs forgot to remove that change. Will do ASAP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incomplete review; one nit, one "wat?".
DEBUG("ethos _recv(): inbuf doesn't contain enough bytes.\n"); | ||
return -EIO; | ||
} | ||
tmp = ethos_unstuff_readbyte(ptr, (uint8_t)byte, &escaped, &frametype); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the escaping yet (stay with me, I think I start understanding), but maybe you can clarify this easily: If a stdio_read() is started between having received an escape and the escaped data, will this not produce an erroneous read?
Say the buffer is filled with a text frame, and another frame (that won't fit in the buffer yet) is being filled in. When the buffer is full, stdio_read
is called and (with speed varying depending on which high-priority threads are running) pulls out data byte by byte, creating on and off opportunity for isrpipe_write_one to push data.
Frametype stays text, data keeps flowing in and getting dropped on and off before the reader code even gets to the being-corrupted frame, and all that happens on isrpipe_write_one failure is that dev->inbuf gets touched (which is weird as I'd understand that to be network related).
The more I think of this, the more it seems to me that the puts("lost frame");
code in the ETHOS_FRAME_TYPE_TEXT case doesn't do what it should. Maybe it should set the frame type to FRAME_TYPE_ERRORED and thus ensure that the rest of the frame is discarded (rather than risking that individual characters get eaten and possibly misassembled at the stdio reading site)?
(Not sure of any of this, but it seems weird; gotta cut it short now and will continue at a later point.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not 100% sure I understood your scenario, so let me just describe how the buffers and escaping work:
For data, hello and hello reply packets, the frame-type is determined by the head, consisting of the escaped frame type or just (escaped) data, when it is data which is pushed to the tsrb
with the remainder of the packet. Also: at the end of the frame there is a frame delimiter character (if it appears in the frame it is escaped). So if multiple frames are in the ring buffer, one frame will be read until the frame delimiter (and the type determined by its head) and the same is true for the next.
Text frames (which in the ISR arrive in sequence with data frames) are not put to the tsrb
ring buffer, but to the isrpipe
. Since only text frames go there (and since writing back the escape + type seemed to be more trouble than its worth) they are not identified to the handler thread with a head, but are delimited by the frame delimiter.
In both buffers, the data is escaped similar to SLIP, so if the frame delimiter or the escape byte appear within the packet, they are escaped with the byte sequence, {escape_byte, c ^ 0x20}
.
Packet types (at least as far as I understand it) are only escaped at the beginning of the packet. This is where the protocol design is somewhat flawed, because it seems a bit like a "because we can approach" and also makes it necessary to do at least some rudimentary unstuffing to determine into which buffer (data or text) the frame should go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the input, with that I'd try to make the point more precise:
-
If overflow happens in _handle_char for Ethernet frames, the whole frame is dropped (that's how I understand the
inbuf.{reads,writes} - 0
), and thanks to the frametype being reset, further data is ignored (is it?->frametype
is also reset, but we're still in thecase IN_FRAME
of ethos_isr, so wouldn't the next byte trigger the default case assert?). Dropping a frame is relatively safe (a counter might be nice but adding features is not what this PR is about). -
If overflow happens in _handle_char on text frames, the network frame is dropped (?; we're not in one), but otherwise the error is ignored.
As text may be read piecemeal, we can't "cancel" the whole text frame (some of it may have been consumed, that's OK), but IMO the sane behavior would be to discard the rest of the frame -- for the alternative is that while more bytes come in, some may be accepted and some not, and the ISR (rightfully, after this PR) feeds all the characters until end of frame to the isrpipe. For example, if the data stream is "H e l l o Escape EscapedEscape", and the Escape gets dropped but the EscapedEscape does not, the stdio reader would not read the bytes "H e l l o Escape" (which there should be) but "H e l l o EscapedEscape".
I think that on the UART, losing text is OK -- receiving wrong text is not. I have a slight preference for losing chunks of text over losing individual characters, but (now that escapes are decoded after the isrpipe) losing individual characters has the downside of possibly producing wrong text, and thus, losing a chunk (the rest of the text frame, ) it should be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I start to understand what you mean. Will fix.
Addressed with the other issues pointed out above. |
drivers/ethos/ethos.c
Outdated
if (isrpipe_write_one(ðos_stdio_isrpipe, c) < 0) { | ||
//puts("lost frame"); | ||
dev->inbuf.reads = 0; | ||
dev->inbuf.writes = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the location where I should have started the thread on what losing bytes on the stdio byte would do to escaping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hardened the error handling in the following way in febe559 ef54d1c (edit: removed the mutex_lock
/mutex_unlock
around the isrpipe
's reset, as we are in ISR context for that function):
- The previous
frametype
determines which buffer (tsrb
orisrpipe
) is operated on. - The buffer is reset (both
reads
andwrites
are set to 0) - A
ETHOS_FRAME_DELIMITER
is added to the now reset buffer (as the buffer was now reset it will even work if the error was caused by an overflown buffer), to tell the handler thread to stop reading and just truncate the frame as read so far.
febe559
to
ef54d1c
Compare
a3f8fa9
to
3e30300
Compare
Rebased and adopted for the |
Ping? This should go in for 2021.01 IMHO. |
LGTM, please squash. I'll give version merged after squashing a test run and then I think this can be done. Sorry for the holdup. |
This moves the following parts of ethos' state machine out of ISR context: - Sending and replying to HELLO messages - Byte-unstuffing Some escape handling is still needed in the ISR handler, due to ethos' protocol design, to determine if a received byte must go into the netdev queue (tsrb) or the STDIO queue (isrpipe), but the actual unstuffing is now done in the STDIO and netdev handler threads, respectively.
3e30300
to
820a397
Compare
Squashed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested by running gnrc_border_router on a microbit-v2.
Command line interaction works, network works, and success rate on flood pings is vastly improved.
ACK, and thanks for fixing this.
Now this broke standalong
|
With this PR for some reason,
|
Its the second commit 820a397, but I didn't all the time too look into it as I hoped yesterday, is there anything obvious that could be at fault here @miri64? |
Not sure, but I observed the behavior you mention also sometimes on |
It seems to happen after flashing only, resetting the BOARD after seems to fix it though. |
Still not clear to me if this happens when you just started the board or if you hit any key. |
At start, once the shell is launched. |
Ok, that I did not observe on |
Contribution description
This tries to fix #17254 by eliminating the condition that causes the race altogether. It moves the following parts of ethos' state machine out of ISR context, inspired by the way it is done in
slipdev
:Some escape handling is still needed in the ISR handler, due to ethos' protocol design, to determine if a received byte must go into the netdev queue (tsrb) or the STDIO queue (isrpipe), but the actual unstuffing is now done in the STDIO and netdev handler threads, respectively.
This causes a slight increase of ROM, as byte unstuffing needs to be now handled in both stdio and the netdev context, but we also gain some RAM, as the
ethos_t
membersframesize
,last_framesize
, andaccept_new
are not needed anymore.We also have the added benefits, that now multiple frames can be added to the respective pipes, separated by the
FRAME_DELIMITER
byte.Testing procedure
Compile and flash
examples/gnrc_border_router
to a board of your choice. To be closer to the setup in the testbed, I used aniotlab-m3
and setETHOS_BAUDRATE=500000
(but I also tested withsamr21-xpro
with the sameETHOS_BAUDRATE
):get the link-layer address of the
ethos
interface (the one with 6-byte hardware address) usingifconfig
and pound that address with multipleping
instances:On
master
I get a significantly worse result, when it comes to packet loss, as with this PR at the cost of slightly higher latency (the cost of cleaner multiplexing, I guess):master
This PR
Issues/PRs references
Fixes #17254