-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to set non standard RX2 setting for TTN with LMIC #455
Comments
Sorry you're having problems. Are you by any chance using LMIC_setClockError()? The change right after this (42da75b) radically changed the timing for SF12, which is critical for EU868 join. It turns out that the reason we have needed LMIC_setClockErorr is that it works around the error in the code. If you check #442, you'll see a description. RedwoodComm and I have tested this extensively (on different hardware of course) and confirmed that the change greatly improves performance, but required that I remove the LMIC_setClockError hack. If this doesn't work for you, I may have to get one of your boards to see what is going on. |
I am not using I am using different ESP32 boards for my application. I don't think this is hardware dependent. But using ESP32 means running LMIC in an RTOS environment. I assume that this is causing subtile timing effects. So this likely is an application problem. But question is, what after #76f7bd5 did trigger this issue, because MCCI LMIC #76f7bd5 is working stable and smoothlessly in the ESP32 environment. |
My guess is that 42da75b broke things for you. You could also use git bisect to try to narrow down the problem commit. It could be something else I did. Ideally we'd get me a repro case. I'm traveling and so it will be hard for hardware to catch up with me; but if you can find something suitable available on Amazon, send me a link. Something with a U.FL or SMA is best for testing with the analyzer. I can get it and try to test. I've got my lab with me. |
I will do some timing experiments by changing task priorities in my application. Will keep you posted here. |
If you edit If the compliance sketch works, but your sketch doesn't, the next step would be to port the logging code to your app and get the same printouts. |
Thanks for your hints. I reversed commit 42da75b but this did not affect the join behaviour. Meanwhile i can't reproduce the issue with current 852f348 . Maybe the latest commits made this happen, or it was no LMIC issue, but a TTN issue. Currently i don't see any JOIN ACCEPT messages in TTN console, thus i assume the software version on TTN site is not constant. |
Hi. |
Nothing has changed in the LMIC code in the last two days. Which LoRaWAN sample program do you mean? |
@DeveloppeurPascal which platform? ESP32, ... ? |
Problem arised here again. It looks erratic, sometimes join works, sometimes join wait. I need to setup a reference environment to track this down finally. |
@terrillmoore i got the compliance script running stand alone on an ESP32 system with arduino-esp32 core (latest version 1.0.4). Settings of
Here is the resulting log, showing a failed join. I used the default keys you noted above.
|
@terrillmoore I repeated the test with valid TTNv2 keys. Here are the results for a successful join, as well an (yet) unsuccessful "JOIN WAIT" join: successful Join:
unsuccesful Join / "JOIN WAIT":
|
@terrillmoore this one would be suitable and available from amazon stock in USA. |
I see a problem. Pre-join, the LMIC is using the spec-compliant RX2==SF12 configuration. But TTN uses a non-standard RX2==SF9 configuration (see here). I wonder (since this comes and goes) whether TTN is sometimes using RX1 for join accept, and sometimes using RX2. This would be based on congestion and other metrics, and it would not be surprising if this decision is made in advance for the next hour or so. When network is using RX1, join would work; when network is using RX2, join would not work. (The reason for TTN choosing to run the network this way was to add capacity, and take better advantage of the channel for RX2; it doesn't have as strict a duty cycle limit as the RX1 channel.) If so, we need to find a place to add |
Yes,. I have something very similar (the one that isn't stable, mentioned in #463). Maybe you can post your pin-map, BSP, and board-ID in the Arduino IDE, and I can compare to what I'm using? Then, if it all looks the same, I'll order a board. |
@terrillmoore You nailed it! :-) I added Now the question is, how to properly deal with this non standard TTN setting for EU868 in the LMIC library? |
Hmm, looks like i was too optimistic. After a series of trials with two different ESP32 boards i still see some JOIN WAITs, but it seems that probability of successful joins is now increased. |
Please send logs as before. Also, look at the TTN console gateway log to see if they really sent the join accept through your gateway. If the network is busy, they might not send them right away. When I've had many devices on the same gateway joins took a while even with the old code. EU should be worse than the US, because of the lower downlink capacity. |
@terrillmoore To enforce SF9 for TTNv2 (EU868) i modified
=== Log 1: unsuccessful join ===1. Gateway Join Request
2. Gateway Join Accept
3. LMIC compliance script log
=== Log 2: immediately successful join after gateway reset ===
|
Meanwhile we tested at a different location, with different board and gateway (Lorix one). Same effect: "JOIN WAIT", and after a gateway reset immediately join. This is clearly reproducable. I wonder what kind of effect can a reset of gateway have? Perhaps duty cycle budget refill? |
Yep, I’m sure resetting the gateway resets any local idea of the duty-cycle limitation.
So can we conclude that the LMIC is working properly (with the patch for RX2 matching on TTN?
|
@terrillmoore I'm not sure yet. Since this effect came in suddenly after Sep 9th 2019, something somewhere seem to have changed, and now impacts the downlink performance. Is there a way to get reports by LMIC stack if and when the radio handler has seen a downlink? I want to compare these timestamps with the log of the packet forwarder of my gateway, to ensure, that no packets are dropped or missed one the device's side. |
There are two cases, JoinAccept and other messages. JoinAccept results in an EV_JOIN. Other messages cause a call to the rreceive-message handler registered using Hope this helps... |
@terrillmoore I tracked this down further. It seems in my app LMIC does not switch to SF9 pre-join, for some unknown reason. This forces RX2 joins in TTN with SF12, what would explain that the issue does not recur immediately after a gateway reset, because the reset "refills" the duty cycle limit of the gateway. I'm using this code for switching to SF9 pre-join, and asserted that
But LMIC does not switch to SF9 pre-join:
I am stuck here. Do you have any hint how i can proceed to get this fixed? |
I'm an idiot; I told you to do the wrong thing. The code you want is:
Sorry for the confusion. |
Thanks for the hint!
I assured that value of Sure, that it should work this way with OTAA (not ABP)? |
Something is still weird here. I tested different settings with the OTA compliance script. Modified setupForNetwork() as follows:
If i use
If i use
|
So here's the way it should work, according to my understanding of the spec.
Without using something like a network tester, it's hard to be sure what's going on, because the network is a big variable. It can legally:
We have evidence that case 3 happens at least some of the time. |
One way to tease out some of the network behavior (on TTN) is to watch several console pages in a web browser.
My understanding is that the gateway may not even know if a packet has been dropped from the TX queue, if it's too late or something else is going on (e.g., an uplink is happening from another device on the same channel -- since EU shares uplink and downlink channels, the gateway has to decide whether to discard TXs that happen to be schedule dwhile a packet is being received from another device.) |
Moved a small step forward here. Found out why in my code the setting RX2 == SF9 for ttn did not work. I called So far, so good. But the actual JOIN WAIT problem persists. Probably something caused by the TTN backend. But i still wonder what this triggered, because i didn't see this problem before 9th Sep 2019. *) Because
|
I think rest of the JOIN WAIT problem is depending on external factors, like TTN backend, network congestion and duty cycle. It's not a LMIC related issue. So we can keep this issue closed and detach the label "bug" from it. |
Thanks for the good work on RX2, I had overlooked that. Filed #474, will try to address this in a general way. |
Sorry for hitchhiking this thread. But any ideas who to analyze / solve the original "JOIN WAIT" problem? It still persists, and i am completely stuck. No more ideas where to look at. Any hints would be appreciated. |
@terrillmoore @manuelbl is it perhaps necessary to adjust timing constants in oslmic.h to run LMIC on the ardunio-esp32 platform? e.g. |
I rarely use the Arduino platform. But if I do, I don't change any timings. In my ESP-IDF based code, an entirely different hardware abstraction layer (HAL) is used, which cannot be blocked by user code. But again, I use the same timing constants incl. OSTICKS_PER_SEC, which could be easily changed. I don't experience any problems with joining. However, my tests are limited to communication via my own gateway as I can't get a connection to other gateways (the closest ones are all indoors and with hills in-between). |
@cyberman54 I've tried again with my gateway turned off and it seems I can reach another gateway after all. In the first test, I joined via my own gateway before turning it. That worked without problem. In the second test, I turned off my gateway before starting. It then took quite some time until the join succeeded. It succeeded on the second attempt with data rate 1 (SF 11, BW 125kHz). In the TTN Console (Applications / ... / Data), I could see activation requests much earlier. In both tests, I've used the ESP-IDF based library. With my limited understanding of the protocol, I think it's working correctly. None of the tests reached a JOIN WAIT. |
@manuelbl Thanks for your notes. I will take a look at your code. Perhaps i can change my arduino-esp32 based code to native ESP-IDF. @terrillmoore The OTAA compliance script logs millisecond deltas. What ranges of these values should be expected for stable downlink operation?
|
@terrillmoore @cyberman54 BTW: I don't set RX2 to SF9 because of this sentence (see here):
I think this implies that for joins either RX2 is not used for joins or that SF12 is used. After the join, it automatically goes to SF9. Regarding the timing: Would it make sense to issue a warning if a callback with a sensitive timing is executed too late? It would be a valuable feedback is timing is an issue to improve or not. |
@manuelbl I think there are (at least) two typical root causes which can cause timing issues:
I don't overlook yet the grade of impact 1) and/or 2) can have on LMIC. Perhaps it is better to have LMIC run on dedicated CPU and wrap it in a highlevel "AT-command" API. |
@manuelbl -- TTN uses RX2 == SF9 even for JoinAccept. So dn2dr should be set when a join is attempted, otherwise the device won't hear RX2 JoinAccepts. But you're right, it should not be changed after a JoinAccept. RX2 will be set by the JoinAccept message.
Yes. Increment a counter (and log "how late"). I would not print anything out, because that further distorts timing; I'd capture this via the log mecanism. @cyberman54 In fact, after we release 3.1, I would like to look at using interrupts to trigger the following things:
(1) is easier than (2), because (2) requires that we be able to get access to the SPI bus at any time, interrupting any pending operation. This therefore involves customizing the Arduino SPI driver so that we can reserve the SPI bus when we know a real-time operation is coming. This is not hard to do but... Here's the headache, how to i fund the effort for ESP32 platforms. I suppose I could start reselling an ESP32-based thing, but that's kind of a a backwards approach to this. It's a week or so up front, but then has to be supported forever. I didn't have a lot of luck getting anyone to support The Things Network NY based on my support of Windows for the Raspberry Pi; I suspect that people will mostly buy the cheapest thing they can find. Maybe we can find a volunteer to port my initial work done on MCCI's BSP to the ESP32 BSP? I can try to figure out a way to subclass the SPI driver, perhaps, so that if everybody uses the subclass it will work. |
@cyberman54 the timings for starting the receive window vary based on the spreading factor.
So: you're receiving on JoinAccept on SF7 during RX1, and SF12 during RX2. This won't work well for The Things Network. As for timings: I have a fairly unintelligible spreadsheet that works out the delays for each spreading factor. Semtech uses the following at 125kHz (all timings in ms):
So if the window should nominally open at 5000 or 6000 ms, the actual values are.
The LMIC also applies the clock tolerance to this, making the window earlier. Since you see 4964 for SF7, I infer that your clock tolerance set to around 0.5% for this test. However, we need to get your RX2 JoinAccept running at SF9.... |
@cyberman54 I have a hard time believing that the RTOS or the interrupt handling causes any problems on the ESP32. The LMIC code and the compliance script are a trivial load for the ESP32. If it works for an ATmega, it should easily work for the ESP32 too. For those people sticking with the Arduino framework, I don't see any reason for creating separate code for the ESP32. Do you have JOIN_WAIT problems with the unmodified compliance script? If so, which branch of LMIC are you using? I'd like to reproduce it. And what boards are you using? (I have quite a collection of ESP32 board with integrated or external LoRa modules; I'm more restricted regarding gateways.) |
@manuelbl i can clearly reproduce the JOIN_WAIT problem with the unmodified compliance script, running with LMIC from master branch, current commit The problem seems to be board hardware independent, but it could depend on a specific local duty cycle / radio / gateway / TTN situation. I did not yet spent time on a reasonable measurement. My testing environments, all and only TTN on EU868, so far: |
@terrillmoore thanks for your - again - very helpful and clear notes. Join in RX2 on SF9 works by applying the I totally agree with you on your thoughs how to fund support for an ESP32 native LMIC port. In our commercial application project we decided to not progress with ESP32, but to use ARM/Cortex SoCs dedicated to the communication stack, like MCCI does with Catena boards. |
With one of latest commits after Sep 9th 2019 joining on EU868 (tested with TTNv2) is broken. Event "Join Wait" is endlessly reported. It seems the join accept packet by the gateway is somehow not properly processed, could be a timing or a logic problem.
Set back to commit #76f7bd5 clearly solves this issue for me.
So, i assume this issue came in with one of the commits later Sep 9th.
The text was updated successfully, but these errors were encountered: