Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Joining infinite loop #37

Open
BryanMM opened this issue Jun 3, 2021 · 26 comments
Open

Joining infinite loop #37

BryanMM opened this issue Jun 3, 2021 · 26 comments

Comments

@BryanMM
Copy link

BryanMM commented Jun 3, 2021

Greetings, i've been trying to use ur library for quite some time but i keep getting stuck at a joining infinite loop.
Sometimes it connects once and starts sending msgs but they either usually get rejected by TTN (v2) or they never reach their platform at all.
I've checked all the keys and reinstalled the component countless times but it doesnt help at all.
The board i'm currently working with is heltec's wireless stick lite.
Any help about my issue or advice would be appreciated.
Thanks.
image

@manuelbl
Copy link
Owner

manuelbl commented Jun 8, 2021

This output doesn't look familiar. I've never seen such a loop. The most suspicious part is the transmission right after the join. It's still within the window of the first join so it could easily confuse the network and reset the successful join.

Are you using some sample code for this test? If not, can you post the code.

And what region are you in?

@BryanMM
Copy link
Author

BryanMM commented Jun 8, 2021

Hi, thanks for the answer. I'm currently using the north american region, and yes, i'm currently testing with the hello world's sample code, i tried with a different gateway (TTIG) with v2 and i got better results, with the only difference being that the first 3 msgs after the join are sent with no payload.
I saw at the pull request within the git that there's an issue with 8ch gateways and before i was using heltec's ht-m01. Maybe that's the issue?

@manuelbl
Copy link
Owner

manuelbl commented Jun 9, 2021

The 8 channel limitation could indeed be an issue. The good news is that the underlying LMIC library has just released a new version, which supposedly improves the channel handling for regions like US. I will soon integrate the new version. Unfortunately, I can't test it as I'm in Europe and don't have a lab to simulate the US region.

@BryanMM
Copy link
Author

BryanMM commented Jun 9, 2021

I implemented the changes within the pull request that i saw here, that's how i got the TTIG (who's also an 8ch gw) so as u said i think the issue must be around that topic. The ht-m01 is not yet tho.
If needed once the implementation is done, i can help with the testing.

@DylanGWork
Copy link

DylanGWork commented Jun 24, 2021

Hi, I am having the same issue raised here and using the Hello_World example. I am using AU915.

The code hangs at this line: xQueueReceive(lmicEventQueue, &event, portMAX_DELAY);
in the TheThingsNetwork.cpp file. I have managed to get two packets to send but I don't believe it was a result of any changes I made, going by this discussion.

Is it possible to lock in a DR? (SF7BW250) and lock in a channel (In case I use a single channel gateway).

Cheers

@manuelbl
Copy link
Owner

@DylanGWork It's currently not possible to set DR. That's planned though. There are no plans however for single channel operation.

@manuelbl
Copy link
Owner

An upcoming version will support a few changes relevant for you (@BryanMM and @DylanGWork):

  • Sub-band 2 is automatically selected for regions with sub-bands (incl. US915). If that's insufficient, it can be selected explicitly.
  • The data rate can be locked by disabling ADR (ttn.setAdrEnabled(false)) and setting the data rate (e.g. ttn.setDataRate(kTTNDataRate_US915_SF7);).

The changes are in the master branch. I would appreciate if you give it a try.

@BryanMM
Copy link
Author

BryanMM commented Jul 31, 2021

@manuelbl Got it!, i'll be testing it soon.

@DylanGWork
Copy link

@manuelbl Tested and working great, is C version new too? I recall only having a C++ version?

Great work!

@manuelbl
Copy link
Owner

@DylanGWork Thanks for testing. Yes, the C version is new too.

@BryanMM
Copy link
Author

BryanMM commented Aug 15, 2021

@manuelbl I've been testing it out too and did render great results, i no longer needed to perform workarounds with the initial message (usually the first and second uplink bounces till a third one is sent and it has some probability of failure from there onwards).
Tested the band selection and spread factor's functions and also worked great.
Good job man.

@manuelbl
Copy link
Owner

@BryanMM Cool. Thanks for testing.

@maizezoidberg
Copy link

@manuelbl
I have the same problem with an infinite loop when join. I am using the ttn_join_provisioned () method to connect. If the gate is enabled, then the method successfully returns true, but if the gate is disabled, or is out of reach of the device, then I never get false and the method is in a blocked state. Help me understand under what conditions ttn_join_provisioned () should return false?

I (1258) ttn_prov: DevEUI, AppEUI/JoinEUI and AppKey saved in NVS storage
I (8472) ttn: event EV_JOINING
I (8534) ttn: event EV_TXSTART
I (13569) ttn: event EV_RXSTART
I (14565) ttn: event EV_RXSTART
I (14839) ttn: event EV_JOIN_TXCOMPLETE
I (78616) ttn: event EV_TXSTART
I (83650) ttn: event EV_RXSTART
I (84646) ttn: event EV_RXSTART
I (84920) ttn: event EV_JOIN_TXCOMPLETE
I (149551) ttn: event EV_TXSTART
I (154585) ttn: event EV_RXSTART
I (155581) ttn: event EV_RXSTART
I (155855) ttn: event EV_JOIN_TXCOMPLETE
I (226413) ttn: event EV_TXSTART
I (231498) ttn: event EV_RXSTART
I (232494) ttn: event EV_RXSTART
I (232768) ttn: event EV_JOIN_TXCOMPLETE
I (356974) ttn: event EV_TXSTART
I (362060) ttn: event EV_RXSTART
I (363056) ttn: event EV_RXSTART
I (363330) ttn: event EV_JOIN_TXCOMPLETE
I (483475) ttn: event EV_TXSTART
I (488560) ttn: event EV_RXSTART
I (489556) ttn: event EV_RXSTART
I (489830) ttn: event EV_JOIN_TXCOMPLETE

@manuelbl
Copy link
Owner

manuelbl commented Nov 3, 2021

@maizezoidberg That's a good question indeed. The ttn_join() and similar functions mainly return false if no provisioning keys have been provided or they are invalid. If the device cannot immediately join, it will continue to try it. In particular, the spreading factor will also be increase in order to improve the chances of contacting a gateway. As the spreading factor is increased, the time between retries is also increased. I'm not sure if it ever gives up and returns false. Probably not.

How could we improve the library? Should we add a timeout parameter to the ttn_join() functions? If so, a realistic timeout is 10 minutes or more. Or should the function be changed to be asynchronous? It would make it easier to handle the error case but more difficult to handle the regular case.

@maizezoidberg
Copy link

maizezoidberg commented Nov 3, 2021

@manuelbl,
Thanks for your quick response. In fact, the LMIC follows the https://www.thethingsnetwork.org/docs/devices/bestpractices/ specification for best practices. The device should use JOIN very rarely. Considering that ESP32 does not have very low power consumption during operation, we can set the "use_continuous_join" flag in the ttn_join() method, and if this flag is NOT set, look at getting EV_JOIN_TXCOMPLETE (means that "JOIN" in the response from Gate is NOT received) and return an error in the event_callback (...) method. But, after that, we must stop the JOIN process of the LMIC itself. Otherwise, we will exit the ttn_join () method, and the LMIC will still try to connect. This is one of the solutions. I'm ready to test it

ttn_event_t ttn_event = TTN_EVENT_NONE;

if (waiting_reason == TTN_WAITING_FOR_JOIN)
{
    if (event == EV_JOINED)
    {
        ttn_event = TTN_EVNT_JOIN_COMPLETED;
    }
    else if (event == EV_REJOIN_FAILED || event == EV_RESET || event == EV_JOIN_TXCOMPLETE)
    {
        ttn_event = TTN_EVENT_JOIN_FAILED;
    }
}

@manuelbl
Copy link
Owner

manuelbl commented Nov 3, 2021

In fact, the LMIC follows the https://www.thethingsnetwork.org/docs/devices/bestpractices/ specification for best practices. The device should use JOIN very rarely.

That sounds like a misunderstanding. Best practices recommend to avoid rejoins by retaining the assigned DevAddr. But this case is about the initial join and in particular about the case where the join doesn't succeed. Failed joins don't count. This case is not covered in the best practices.

And best practices basically boil down to either not power off your device or to retain the session settings including DevAddr. The former one is out of LMIC's control, and the latter one is not implemented. I had to go to some length to make work anyway.

Your proposal of changing ttn_join() is basically to add an option to abort the join if the first try fails. There are many reasons why a join can fail: too high data rate, RF TX collision, radio disturbance etc. It's not reliable to detect if there is a gateway nearby. Thus I think aborting after just a single try will not be useful to many people.

The options I'm considering are:

  • Aborting after the lowest data rate has failed (I think that's what the current implementation does but it takes very long)
  • Abort after a specified time
  • Abort after a specified number of tries

I will think about it.

@DylanGWork
Copy link

Hi guys, great conversation.

I have implemented an abort process (I even change an LED to red to indicate this) to the join process after 5 failed join processes, it's a messy implementation though.

Would be great to see this as a feature.

This may be a silly question that I can just look up, but while I'm here: Can we have the default join DR be the lowest DR, or an easy way to set it as that?

@cdrx
Copy link

cdrx commented Jan 24, 2022

How could we improve the library? Should we add a timeout parameter to the ttn_join() functions? If so, a realistic timeout is 10 minutes or more. Or should the function be changed to be asynchronous? It would make it easier to handle the error case but more difficult to handle the regular case.

An async version of ttn_join() would be really useful. Something like this:

ttn_join_async();
uint8_t timer = 0;

while (ttn_is_joined() == false) {
   timer++;

   if (timer > 120) {
       ttn_join_abort();
   }

    vTaskDelay(1 second);
}

ESP_LOGI(TAG, "joined!");

Would be ideal.

For my use case; the TTN provisioning is done by writing keys to the ESP over bluetooth, from a mobile app. If the user writes incorrect keys, then ttn_join() is ultimately called but never returns (because the join will never succeed). If the user updates the provisioning keys, over bluetooth connection, I can't find a practical way to cancel an active ttn_join() and try again with new keys.

@Nightroamer
Copy link

Hi All,

I have implemented the Hello World test code and also get an infinite join loop. Occasionally i will see an Accept Join request on TTN but never any payload data. Serial monitor shows:
[0;32mI (33376) ttn: event EV_TXSTART�[0m
[0;32mI (38716) ttn: event EV_RXSTART�[0m
[0;32mI (39716) ttn: event EV_RXSTART�[0m
[0;32mI (39826) ttn: event EV_JOIN_TXCOMPLETE�[0m
[0;32mI (40736) ttn: event EV_TXSTART�[0m

I am using AS923 on my Gateway, and node (TTN setup)
I am using AS923 in the code also via setting menu.

Has anyone been able to get around this?

@Nightroamer
Copy link

Hi All,

I have implemented the Hello World test code and also get an infinite join loop. Occasionally i will see an Accept Join request on TTN but never any payload data. Serial monitor shows: [0;32mI (33376) ttn: event EV_TXSTART�[0m [0;32mI (38716) ttn: event EV_RXSTART�[0m [0;32mI (39716) ttn: event EV_RXSTART�[0m [0;32mI (39826) ttn: event EV_JOIN_TXCOMPLETE�[0m [0;32mI (40736) ttn: event EV_TXSTART�[0m

I am using AS923 on my Gateway, and node (TTN setup) I am using AS923 in the code also via setting menu.

Has anyone been able to get around this?

So i also managed to fix the issue by inserting the below into the thethingsnetwork.cpp

bool TheThingsNetwork::joinCore()
{
if (!provisioning.haveKeys())
{
ESP_LOGW(TAG, "Device EUI, App EUI and/or App key have not been provided");
return false;
}

@manuelbl
Copy link
Owner

So the problem has been solved?

BTW: If the file TheThingsNetwork.cpp contains the method joinCore(), you are using an old version of the library. This method was removed more than a year ago.

@Nightroamer
Copy link

Nightroamer commented Sep 27, 2022

So the problem has been solved?

BTW: If the file TheThingsNetwork.cpp contains the method joinCore(), you are using an old version of the library. This method was removed more than a year ago.

Yes it is solved but only if I add the above code to thethingsnetwork.cpp

I have downloaded the source code from here so is there a way I could somehow have the old library? In my ignorance (new to this) I thought the library was supplied within.

@manuelbl
Copy link
Owner

You have probably downloaded the code from the Releases. I have indeed not updated this for some time. Now it's up-to-date again.

You can either download it from the release page or with green "Code" button on the home page.

@Nightroamer
Copy link

Excellent I will try this later today.

I had blindly followed the download in the Getting Started guide (Platformio also the same, I use this)

https://github.com/manuelbl/ttn-esp32/archive/master.zip

@jpalumbo1981
Copy link

Hi, I encountered the same issue of an infinite loop when testing the 'Hello World' example on a Heltec Wireless Bridge with an ESP32 and SX1276 transceiver. I've tried all the suggestions written in this forum, but without success. I receive random join requests, but they are not successful. I noticed that the RSSI is -110, but when I compile the code in Arduino with the Heltec library, the RSSI is -40. Thanks for your assistance.

@Nael2311
Copy link

@manuelbl, Gracias por su rápida respuesta. De hecho, el LMIC sigue la especificación https://www.thethingsnetwork.org/docs/devices/bestpractices/ para las mejores prácticas. El dispositivo debe usar JOIN muy raramente. Teniendo en cuenta que ESP32 no tiene un consumo de energía muy bajo durante el funcionamiento, podemos configurar el indicador " use_continuous_join " en el método ttn_join(), y si este indicador NO está configurado, busque obtener EV_JOIN_TXCOMPLETE (significa que "JOIN" en la respuesta de Gate NO se recibe) y devuelva un error en el método event_callback (...). Pero, después de eso, debemos detener el proceso JOIN del propio LMIC. De lo contrario, saldremos del método ttn_join() y el LMIC seguirá intentando conectarse. Esta es una de las soluciones. Estoy listo para probarlo

ttn_event_t ttn_event = TTN_EVENT_NONE;

if (waiting_reason == TTN_WAITING_FOR_JOIN)
{
    if (event == EV_JOINED)
    {
        ttn_event = TTN_EVNT_JOIN_COMPLETED;
    }
    else if (event == EV_REJOIN_FAILED || event == EV_RESET || event == EV_JOIN_TXCOMPLETE)
    {
        ttn_event = TTN_EVENT_JOIN_FAILED;
    }
}

This solution worked for me, but I’m not sure if it’s the most appropriate approach. I’d appreciate any feedback or suggestions for improvement. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants