Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LoRaWAN: Mitigating reception issues at lower data rates & FCnt increment after retry exhaustion #8822

Merged
merged 7 commits into from
Dec 11, 2018

Conversation

hasnainvirk
Copy link
Contributor

@hasnainvirk hasnainvirk commented Nov 20, 2018

Description

A new algorithm is proposed and taken in use for calculation of receive window length and
timing offset in opening the said window. This algorithm performs better than the stock algorithm and consumes less power at the same time. Stock algorithm would miss packets while using low data rates as the offset calculation was not suitable enough.

Previously we had been incrementing UL frame counter for a CONFIRMED
message only when the transmission was deemed successful i.e., we would
have received an ack before all the retries would have exhausted.
Now we have opted to increment the frame counter if all the retries are
exhausted considering the fact that we essentially treat the next
message after retry exhaustion as a new packet so we should also
increment the frame counter.

Pull request type

[X] Fix
[ ] Refactor
[ ] Target update
[ ] Functionality change
[ ] Docs update
[ ] Test update
[ ] Breaking change

@hasnainvirk
Copy link
Contributor Author

@AnttiKauppila Please review.

Copy link
Contributor

@0xc0170 0xc0170 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One styling change request

},
"downlink-preamble-length": {
"help": "Number of preamble symbols need to be captured (out of 8) for successful demodulation",
"value": 5
"help": "Number of preamble symbols expected on receive path. Default: 8",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perfect to align, but please done in the separate commit

@kjbracey
Copy link
Contributor

kjbracey commented Nov 20, 2018

Lovely long commit message, but reading it I can't figure out what the error actually was. My understanding was that the transmitted preamble was 8 symbols, but the radio only needed 5 of the 8 to lock on to the signal.

So the attempted optimisation seemed to be to target the wake-up to be in time to get at least 5 preamble symbols - the radio doesn't need to necessarily hit the first 3.

You seem to be describing my understanding, but then say "We were wrong." but I can't see why we were.

... As the code logic isn't changed, just the number, it now seems that we think we need to capture all 8 of the symbols to lock on?

"help": "Maximum timing error of the receiver in ms. The receiver will turn on in [-RxError : + RxError]",
"value": 10
"help": "Maximum timing error of the receiver in ms. The receiver will turn on in [-RxError : + RxError]",
"value": 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is being done in a high-priority thread, I hope? Otherwise I wouldn't expect 1ms precision.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wary about this defaulting to 1ms when you are just running on normal priority thread. You will need documentation like "if you're running any normal priority thread code, this may need to be increased". Same applies to deep sleep, which is documented for 10ms wakeup time - may work on certain platforms.

@hasnainvirk
Copy link
Contributor Author

@0xc0170 The PR is updated. @kjbracey-arm Have updated the PR with our new algorithm as discussed in person.

@hasnainvirk
Copy link
Contributor Author

@0xc0170 Bump.

@hasnainvirk
Copy link
Contributor Author

@kjbracey-arm Tested the code base with Deep sleep on (UARTSerial removed), error fudge to be 1 ms and everything works pretty well.

},
"downlink-preamble-length": {
"help": "Number of preamble symbols need to be captured (out of 8) for successful demodulation",
"value": 5
"help": "Number of whole preamble symbols needed to have a firm lock on the signal. Default: 5 + 1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest not changing this - this new version is wrongly stated (you're thinking you need 5 whole symbols, hence 6 symbol times). Do the +1 internally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "+ 1" shouldn't be there.

"help": "Maximum timing error of the receiver in ms. The receiver will turn on in [-RxError : + RxError]",
"value": 10
"help": "Maximum timing error of the receiver in ms. The receiver will turn on in [-RxError : + RxError]",
"value": 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wary about this defaulting to 1ms when you are just running on normal priority thread. You will need documentation like "if you're running any normal priority thread code, this may need to be increased". Same applies to deep sleep, which is documented for 10ms wakeup time - may work on certain platforms.

// Computed number of symbols
*window_timeout = MAX((uint32_t) ceil(((2 * min_rx_symb - 8) * t_symb + 2 * rx_error) / t_symb), min_rx_symb);
*window_offset = (int32_t) ceil((4.0 * t_symb) - ((*window_timeout * t_symb) / 2.0) - wakeup_time);
float min_rx_symbol_overlap_required = float (min_rx_symb);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest doing the +1 to convert from whole symbols to symbol times here.

Although thinking again, seeing as we're carefully calculating to hit the end of the preamble, and rounding up, we know we're aiming to get the whole of the last symbol, and hence should be enclosing whole symbols anyway - we're not aiming into the middle of a sequence, so an extra +1 shouldn't be required. All error terms should have ensured we enclose the whole number. I think?

Copy link
Contributor

@kjbracey kjbracey Nov 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's roughly correct. The +1 is the situation is landing in the middle of the preamble with effectively random phase - allow 6 symbol periods to catch 5 whole symbols. But if error is tiny, we're hitting the end quite precisely, so only need to allow 2*error+wakeup+jitter for being a bit earlier than the target, but not a whole symbol earlier. So timeout is min_rx_sym + min(1 symbol, 2*error+wakeup+jitter).

That's fine up until the point we turn on earlier than the transmitter. Then we will get its first min_rx_sym symbols, but have to wait for it to start.

To cover that the timeout is min_rx_sym + worst_case_how_long_before_it_starts, and how long is the min(0, offset - error - jitter) number in the other calculation.

So final result is min_rx_sym + max(how_long_before, min(1 symbol, 2*error+wakeup+jitter)).


// Actual window offset in ms in response to timing error fudge factor and
// radio wakeup/turned around time.
*window_offset = floor(target_rx_window_offset - error_fudge - wakeup_time);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question for @c1728p9 - does/will the sleep code compensate for the 10ms or whatever deep sleep latency, if a long sleep is requested?

Eg if wait(1000) or call_in(1000), do we get a wake from deep sleep at 990 then wake from shallow at 1000, so we hit 1000 precisely, or do we just schedule an interrupt at 1000 so we potentially wake at 1010?

This code in particular is trying to hit a 1 second delay for a receive window after a transmit with millisecond precision. Can that be achieved without holding a deep sleep lock?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as the RTX knows the wakeup time in advance (via some OS delay/wait call) then it should wakeup correctly with millisecond precision. If you are waking up using the low power ticker or a GPIO pin directly then deep sleep latency compensation won't help, as the OS doesn't know you are wanting to wake up in advance, and you'll need to hold the lock manually.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, that's what I hoped. I knew you'd been looking at the compensation, but couldn't recall if it was just covering "teeny" sleeps or reached out for long targets.

In this case, it's via an EventQueue::call_in, so that's a timed semaphore wake, so we're good.

This "sleep latency compensation only via the OS" thing feels like it should be documented somewhere, but not sure where to encourage people to use OS mechanisms rather than LowPowerTimers. Some sort of knowledge base article about deep sleep/tickless or something. Or maybe even a note on LowPowerTimer, which is what people might be thinking of using with deep sleep active.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kjbracey-arm can you create a tracking issue for this one?

@hasnainvirk hasnainvirk force-pushed the dr0_and_fcnt_issue branch 3 times, most recently from ab09a8f to d52861d Compare December 3, 2018 15:07
@0xc0170
Copy link
Contributor

0xc0170 commented Dec 6, 2018

@kjbracey-arm All your comments were addressed? Is this ready for CI?

Copy link
Contributor

@kjbracey kjbracey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor nits.

It seems this needs more investigation, but this gets closer to it, so let's take this version of the algorithm for further testing.

The ASCII art in LoRaPHY.h must be updated when this is all finalised.

},
"downlink-preamble-length": {
"help": "Number of preamble symbols need to be captured (out of 8) for successful demodulation",
"value": 5
"help": "Number of whole preamble symbols needed to have a firm lock on the signal. Default: 5 + 1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "+ 1" shouldn't be there.

@@ -45,7 +45,7 @@ typedef uint32_t lorawan_time_t;
#endif

// Radio wake-up time from sleep - unit ms.
#define RADIO_WAKEUP_TIME 1
#define RADIO_WAKEUP_TIME 1.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a f if it's going to be here, to avoid double.

@@ -32,8 +32,9 @@ SPDX-License-Identifier: BSD-3-Clause
#define BACKOFF_DC_1_HOUR 100
#define BACKOFF_DC_10_HOURS 1000
#define BACKOFF_DC_24_HOURS 10000

#define CHANNELS_IN_MASK 16
#define MAX_PREAMBLE_LENGTH 8.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More f suffixes needed (if they need to be float at all)

{
return (8.0 / (double) phy_dr); // 1 symbol equals 1 byte
return (8.0 / (float) phy_dr); // 1 symbol equals 1 byte
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8.0f

@hasnainvirk
Copy link
Contributor Author

@kjbracey-arm Please review again,

@cmonr
Copy link
Contributor

cmonr commented Dec 7, 2018

CI started

@mbed-ci
Copy link

mbed-ci commented Dec 7, 2018

Test run: FAILED

Summary: 1 of 1 test jobs failed
Build number : 1
Build artifacts

Failed test jobs:

  • jenkins-ci/mbed-os-ci_unittests

@cmonr
Copy link
Contributor

cmonr commented Dec 7, 2018

CI restarted, but I suspect this is an issue with the PR.

@mbed-ci
Copy link

mbed-ci commented Dec 7, 2018

Test run: FAILED

Summary: 1 of 1 test jobs failed
Build number : 2
Build artifacts

Failed test jobs:

  • jenkins-ci/mbed-os-ci_unittests

@cmonr
Copy link
Contributor

cmonr commented Dec 7, 2018

Ooph, that was quick.
And unittests don't upload their logs yet. That's a shame.

@hasnainvirk Please refer to the attached CI log.
consoleText.log

Hasnain Virk added 7 commits December 10, 2018 08:53
Previously we had been incrementing UL frame counter for a CONFIRMED
message only when the transmission was deemed successful i.e., we would
have received an ack before all the retries would have exhausted.
Now we have opted to increment the frame counter if all the retries are
exhausted considering the fact that we essentially treat the next
message after retry exhaustion as a new packet so we should also
increment the frame counter.
A new algorithm has been taken in use to calculate the receive window
length and the timing offset involved in opening of the said receive
window. This algorithm performs better than the stock algorthm and
consumes less power.
Minor style alignment.
Unittest stubs needed to be updated for the API change.
Although this API is private and internal to LoRaPHY, somehow it
appeared in the LoRaPHYStub. Updating it for consistency.
In this version we try to mitigate a situation when we start listening
right in the middle of a preamble sequence (e.g., in high SF case).
Final code cleanup and adding ascii art for the version 2 of the
algorithm.
A missing macro definition is added to the unit-test cmake file for
LoRaPHY class.
@hasnainvirk
Copy link
Contributor Author

@cmonr Whoops. Unittest framework doesn't take any of the Mbed OS config files in consideration. Wasn't aware of that. unittest_cmake file contains manual definition if any needed. Fixed it in the latest commit.

@cmonr
Copy link
Contributor

cmonr commented Dec 10, 2018

CI started

@mbed-ci
Copy link

mbed-ci commented Dec 11, 2018

Test run: SUCCESS

Summary: 11 of 11 test jobs passed
Build number : 3
Build artifacts

@cmonr cmonr merged commit 3875ac1 into ARMmbed:master Dec 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants