Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpu/cc2538: Prevent underflow of the RX FIFO #13906

Merged
merged 5 commits into from
Apr 20, 2020

Conversation

MBradbury
Copy link

Contribution description

The implementation of receiving packets for CC2538 can lead to underflows of the RX FIFO when running the gnrc_border_router and gnrc_networking examples.

RFCORE_ASSERT(RFCORE_XREG_RXFIFOCNT > 0) failed at line 77 in rfcore_read_byte()!
  RFCORE_SFR_RFERRF = 0x00
RFCORE_ASSERT(NOT(flags & RXUNDERF)) failed at line 40 in isr_rfcoreerr()!
  RFCORE_SFR_RFERRF = 0x00
RFCORE_ASSERT(RFCORE_XREG_RXFIFOCNT > 0) failed at line 77 in rfcore_read_byte()!
  RFCORE_SFR_RFERRF = 0x00
RFCORE_ASSERT(NOT(flags & RXUNDERF)) failed at line 40 in isr_rfcoreerr()!
  RFCORE_SFR_RFERRF = 0x00

This PR makes the following changes:

  1. Checks that FIFOP is set indicating that the last byte of a new frame has been received [Section 23.10.1]
  2. Moves the CRC check later in receive. This was because the look ahead check could be inconsistent, where the original CRC check passed but an assert on the CRC bit in crc_corr_val later failed.
  3. Only flush the RX FIFO if it has overflowed [Section 23.10.2]. This prevents discarding partial frames that haven't finishing being received and fit into the RX FIFO.

References are to the manual available at: https://www.ti.com/lit/ug/swru191f/swru191f.pdf

Testing procedure

Tested on 4 Zolertia RE Mote Rev. Bs, with one running gnrc_border_router (with RPL enabled) and the other three running gnrc_networking. Without the patch RXUNDERF and other asserts will be printed. With the patch these asserts will not be triggered.

Issues/PRs references

Fixes original issue identified in #7020 .

@fjmolinas fjmolinas added Area: cpu Area: CPU/MCU ports Area: network Area: Networking Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) labels Apr 20, 2020
@fjmolinas
Copy link
Contributor

@MBradbury Thanks for the fix, I'll test with an opemote-b.

Moves the CRC check later in receive. This was because the look ahead check could be inconsistent, where the original CRC check passed but an assert on the CRC bit in crc_corr_val later failed.

How can this inconsistency happen?

Also regarding the commits they seem to address a mix of things:

  • f2df34c your adding and removing some asserts, specifically regarding the rssi and crc. If those are needed please have them in different commits. If its just for testing purposes, please have REMOVEME commits or something like that that can then be squashed out.

So I think it would be better for the commit history if you had the rssi, crc and overflow/underflow fixes in different commits.

Otherwise they changes make sense.

@MBradbury
Copy link
Author

@MBradbury Thanks for the fix, I'll test with an opemote-b.

Moves the CRC check later in receive. This was because the look ahead check could be inconsistent, where the original CRC check passed but an assert on the CRC bit in crc_corr_val later failed.

How can this inconsistency happen?

I'm not sure, this one has me confused. A possibility is that rfcore_peek_rx_fifo doesn't peek at the correct location of the CRC, but I think it should.

Also regarding the commits they seem to address a mix of things:

* [f2df34c](https://github.com/RIOT-OS/RIOT/commit/f2df34cc5c5216aa5590019bcb188cc76799692e) your adding and removing some asserts, specifically regarding the rssi and crc. If those are needed please have them in different commits. If its just for testing purposes, please have `REMOVEME` commits or something like that that can then be squashed out.

Sorry about these the CC2538_RF_SENSITIVITY assert was temporarily removed as I was encountering this assert on my hardware where rssi_val was slightly lower (-99, -98) than CC2538_RF_SENSITIVITY (-97).

The RSSI assert should be removed as it checks RSSI_VALID for the RSSI register [Section 23.10.3]. Whereas my understanding of Section 23.9.7 is that the RSSI in the FCS should always be valid as it is measured during receiving the len field.

So I think it would be better for the commit history if you had the rssi, crc and overflow/underflow fixes in different commits.

Otherwise they changes make sense.

Sure, I can rework the PR to do this.

Copy link
Contributor

@fjmolinas fjmolinas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment regarding the commit length, our CI only allows commits as long as 72 in the first line. Extra information should be placed a line below, e.g for 27c58d2.

cpu/cc2538: remove check for RSSI_VALID

The RSSI assert should be removed as it checks RSSI_VALID for the RSSI register [Section 23.10.3]. Whereas my understanding of Section 23.9.7 is that the RSSI in the FCS should always be valid as it is measured during receiving the len field.

Also regarding the underflow issue, I took a look at the datasheet and it says that:

RX underflow is indicated when the RFERRF.RXUNDERF flag is set. RX underflow is a serious error condition that should not occur in error-free software, and the RXUNDERF event should be used only for debugging or in a watchdog function. The RXUNDERF error is not generated when the read operation occurs simultaneously with the reception of a new byte

My impression is that the bug you are describing is an attempt to read the FIFO when its empty. This means that it was somehow flushed before, and a second read attempt happens. Could it be that this is because we don't make a distinction between FIFOP and RXPKTDONE ISR, they are sometimes triggering two separate ISR?

@fjmolinas
Copy link
Contributor

The implementation of receiving packets for CC2538 can lead to underflows of the RX FIFO when running the gnrc_border_router and gnrc_networking examples.

Just to clarify its when using both at the same time or with any of them?

@MBradbury
Copy link
Author

The implementation of receiving packets for CC2538 can lead to underflows of the RX FIFO when running the gnrc_border_router and gnrc_networking examples.

Just to clarify its when using both at the same time or with any of them?

I have only tested when running both at the same time (gnrc_border_router on one node and gnrc_networking on three others).

@fjmolinas
Copy link
Contributor

I have only tested when running both at the same time (gnrc_border_router on one node and gnrc_networking on three others).

Should I be able to easily reproduce this while simply pinging? When do you see the bug?

@MBradbury
Copy link
Author

I see the bug even without pinging. I flash three nodes with gnrc_networking, flash
gnrc_border_router on the fourth and run sudo ./start_network.sh /dev/ttyUSB0 tap0 ab12::/64 in RIOT/dist/tools/ethos.

I do not receive the errors on all nodes, but on one of the gnrc_networking nodes I get:

main(): This is RIOT! (Version: 2020.07-devel-108-gcac35)
RIOT network stack example application
All up, running the shell now
> RFCORE_ASSERT(rssi_val > CC2538_RF_SENSITIVITY) failed at line 338 in _recv()!
  RFCORE_SFR_RFERRF = 0x00
RFCORE_ASSERT(RFCORE_XREG_RXFIFOCNT > 0) failed at line 77 in rfcore_read_byte()!
  RFCORE_SFR_RFERRF = 0x00
RFCORE_ASSERT(NOT(flags & RXUNDERF)) failed at line 40 in isr_rfcoreerr()!
  RFCORE_SFR_RFERRF = 0x00
RFCORE_ASSERT(RFCORE_XREG_RXFIFOCNT > 0) failed at line 77 in rfcore_read_byte()!
  RFCORE_SFR_RFERRF = 0x00
RFCORE_ASSERT(NOT(flags & RXUNDERF)) failed at line 40 in isr_rfcoreerr()!
  RFCORE_SFR_RFERRF = 0x00
RFCORE_ASSERT(rssi_val > CC2538_RF_SENSITIVITY) failed at line 338 in _recv()!
  RFCORE_SFR_RFERRF = 0x00

A ping does not seem to trigger the issue.

@fjmolinas
Copy link
Contributor

fjmolinas commented Apr 20, 2020

I do not receive the errors on all nodes, but on one of the gnrc_networking nodes I get:

I only have one cc2538 based board with me, and I seem to be unable to reproduce. I'll try to get someone else to do so, but If not I'll trust your results since the code changes make sense.

@leandrolanzieri
Copy link
Contributor

I can confirm that when running examples/gnrc_border_router on a remote-reva and examples/gnrc_networking on a remote-revb the assert errors appear:

2020-04-20 18:02:50,364 # main(): This is RIOT! (Version: 2020.07-devel-111-gd8fa6)
2020-04-20 18:02:50,366 # RIOT network stack example application
2020-04-20 18:02:50,367 # All up, running the shell now
> 2020-04-20 18:02:50,391 #  RFCORE_ASSERT(rssi_val > CC2538_RF_SENSITIVITY) failed at line 338 in _recv()!
2020-04-20 18:02:50,392 #   RFCORE_SFR_RFERRF = 0x00
2020-04-20 18:02:50,394 # RFCORE_ASSERT(RFCORE_XREG_RXFIFOCNT > 0) failed at line 77 in rfcore_read_byte()!
2020-04-20 18:02:50,395 #   RFCORE_SFR_RFERRF = 0x00

This does not happen when using other board as BR (I tried with samr21-xpro) though.

Applying these changes resolves the issue, and no more assertion errors are printed.

@leandrolanzieri leandrolanzieri added the Reviewed: 3-testing The PR was tested according to the maintainer guidelines label Apr 20, 2020
@fjmolinas
Copy link
Contributor

Thanks for testing @leandrolanzieri ! @MBradbury Could you adress Travis commit length comments?

@fjmolinas fjmolinas added the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Apr 20, 2020
Copy link
Contributor

@fjmolinas fjmolinas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK, changes make sense and @leandrolanzieri confirmed it fixes the issue. Thanks for contributing the fix @MBradbury!

@fjmolinas fjmolinas merged commit 31278b0 into RIOT-OS:master Apr 20, 2020
@miri64 miri64 added this to the Release 2020.07 milestone Jun 24, 2020
fjmolinas added a commit to fjmolinas/RIOT that referenced this pull request Sep 28, 2020
This commit adds NETDEV_EVENT_RX_COMPLETE, NETDEV_EVENT_TX_COMPLETE,
NETDEV_EVENT_RX_STARTE & NETDEV_EVENT_CRC_ERROR.

In RIOT-OS#13906 the crc lookahead had been moved because it would sometimes
be inconsistent. The reason why is because since the reception is
offloaded an new packet can be received before retrieving the first
pkt from the FIFO. This PR fixes it by disabling reception while
the frame is not processes i.e., `_recv` is not called. We also
now flush at the end of a succesfull reception to now where in the
fifo to probe.

cc2538_off now does not flush the FIFO this the radio could be turned
off after reception of a pkt but before this one is retrieved from
the FIFO.

This commit also adds NETOPT_STANDBY

squash! cpu/cc2538: add NETDEV_EVENTs and fix reception
fjmolinas added a commit to fjmolinas/RIOT that referenced this pull request Sep 30, 2020
This commit adds NETDEV_EVENT_RX_COMPLETE, NETDEV_EVENT_TX_COMPLETE,
NETDEV_EVENT_RX_START & NETDEV_EVENT_CRC_ERROR.

In RIOT-OS#13906 the crc lookahead had been moved because it would sometimes
be inconsistent. The reason why is because since the reception is
offloaded a new packet can be received before retrieving the first
pkt from the FIFO. This PR fixes it by disabling reception while
the frame is not processes i.e., `_recv` is not called. We also
now flush at the end of a succesfull reception to know where in the
fifo to probe.

cc2538_off now does not flush the FIFO this the radio could be turned
off after reception of a pkt but before this one is retrieved from
the FIFO.
fjmolinas added a commit to fjmolinas/RIOT that referenced this pull request Oct 1, 2020
This commit adds NETDEV_EVENT_RX_COMPLETE, NETDEV_EVENT_TX_COMPLETE,
NETDEV_EVENT_RX_START & NETDEV_EVENT_CRC_ERROR.

In RIOT-OS#13906 the crc lookahead had been moved because it would sometimes
be inconsistent. The reason why is because since the reception is
offloaded a new packet can be received before retrieving the first
pkt from the FIFO. This PR fixes it by disabling reception while
the frame is not processes i.e., `_recv` is not called. We also
now flush at the end of a succesfull reception to know where in the
fifo to probe.

cc2538_off now does not flush the FIFO this the radio could be turned
off after reception of a pkt but before this one is retrieved from
the FIFO.
fjmolinas added a commit to fjmolinas/RIOT that referenced this pull request Oct 2, 2020
This commit adds NETDEV_EVENT_RX_COMPLETE, NETDEV_EVENT_TX_COMPLETE,
NETDEV_EVENT_RX_START & NETDEV_EVENT_CRC_ERROR.

In RIOT-OS#13906 the crc lookahead had been moved because it would sometimes
be inconsistent. The reason why is because since the reception is
offloaded a new packet can be received before retrieving the first
pkt from the FIFO. This PR fixes it by disabling reception while
the frame is not processes i.e., `_recv` is not called. We also
now flush at the end of a succesfull reception to know where in the
fifo to probe.

cc2538_off now does not flush the FIFO this the radio could be turned
off after reception of a pkt but before this one is retrieved from
the FIFO.
fjmolinas added a commit to fjmolinas/RIOT that referenced this pull request Oct 2, 2020
This commit adds NETDEV_EVENT_RX_COMPLETE, NETDEV_EVENT_TX_COMPLETE,
NETDEV_EVENT_RX_START & NETDEV_EVENT_CRC_ERROR.

In RIOT-OS#13906 the crc lookahead had been moved because it would sometimes
be inconsistent. The reason why is because since the reception is
offloaded a new packet can be received before retrieving the first
pkt from the FIFO. This PR fixes it by disabling reception while
the frame is not processes i.e., `_recv` is not called. We also
now flush at the end of a succesfull reception to know where in the
fifo to probe.

cc2538_off now does not flush the FIFO this the radio could be turned
off after reception of a pkt but before this one is retrieved from
the FIFO.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: cpu Area: CPU/MCU ports Area: network Area: Networking CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Reviewed: 3-testing The PR was tested according to the maintainer guidelines Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants