-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the error callback from not being called sometimes #1683
Conversation
IMO this isn't really a downside but a clearer behavior. I like auto reconnect, but only in case a connection was established at all in the beginning. Observing that connection can be achieved at least once rules out a bunch of potential networking problems. Automatic algos can then keep on massaging the network layer as required. |
I mentioned this elsewhere, but I thought to bring it up here as well. Feel free to yell at me if I'm talking nonesense :) |
@devyte I noticed your post elsewhere after posting here and think that the case you describe should be considered. Picking up your API proposal (not fully backwards compatible though): Does this sound meaningful? |
That looks about right! Thank you for taking my feedback into account :)
|
I'm think slightly differently -- how about this?
The error callback gets an additional argument that says whether this is a 'final' error or not. A final error means that no further activity is expected. Also, make sure that retries are done with an exponential backoff (with a max of 1000 seconds). I really don't want to have a mode where, if the server ip is wrong (or password is wrong), there is no error indication given. Of course, the programmer can ignore the error, or pass a nil as the error callback. |
Thanks for picking this up Arnim.
It may be a perfectly valid approach in the C/embedded world, I have little to compare it with, but I don't like that. I prefer explicit APIs. English is not my mother tongue but to me a parameter named |
@marcelstoer an explicit api is ok too. I don't really have a preference for how this is offered. The name "autoreconnect" does not necessarily imply true|false. It can be the name of a feature, which takes many parameters. But like I said, I don't really have a preference for the api, so I'd rather leave that up to you :) The n-retries approach is widely used in pretty much everything, including the Linux OS itself. It pretty much says: "retry n times, after that give up". Therefore, I would argue that it would be used quite often, especially in view of what @pjsg said. Also, like I explained in the previous post, all ESPs, the broker, and all wifi routers, among other devices, are on the same power circuit. If the power circuit goes down, when it comes back up all devices will power on at the same time, but will come online at very different times. I suspect that this is rather common among users, as is the case where the ESP is faster to come online that the broker. Therefore, if the first connection attempt by the ESP MQTT to the broker fails, it makes sense to keep retrying, at least for a while. This is also covered by the same n-retries approach, so it makes for simple usage for the programmer covering both cases in the same way. |
Yes, I believe that's important, too. I can't judge though whether 1k seconds is long enough. |
@pjsg
Exponential backoffs are usually done to cover two cases at rather opposite ends of the spectrum: In this MQTT reconnect case, I don't think it's necessary to cover both ends of the spectrum. Personally, I feel that adding exponential backoff here would overcomplicate things, but that's just my own thought. you said:
I assume that you want to force an error if the first connection attempt fails, and only try to auto-reconnect if the first try succeeds. This is a Bad Idea, like I explained previously. you said:
This is a very good idea, it makes sense to know inside the callback whether more attempts will be tried or not. you said:
The callback serves the purpose of error handling. If the callback receives the "final" argument, or "i-th of n" attempts argument, then an error indication can be implemented, either for each auto-reconnect attempt, or only for the final one (i.e.: I tried n times and I'm now giving up), or both. |
I removed the 2.0.0 milestone as we're past the two-weeks-before-master-drop milestone and there are some unresolved issues (as per this discussion). |
It seems like auto-reconnection after the error without handling the error doesn't do anything towards fixing the error. Aside from connection time-out in @devyte's power example, everything else needs the Lua programmer or user to take action. If the call back worked correctly then the Lua programmer can decide to reconnect or stop and notify the user. As it stands now the error handler doesn't always get called and if you try to stop the auto-reconnect by calling It seems like first steps should be to fix the bugs in the system; then improve the system with preference changes (if still necessary).
After those two items are fixed the Lua programmer can choose how to correctly deal with each error. Currently, they feel tied down because the system isn't working consistently. |
The current approach is that the erro callback is supposed to be called at most once after a connect call is made. This is a final callback, no further action will be taken. If autoreconnect means "always retry errors" then the programmer will get no error callbacks at all (there is no final error). I'm thinking that autoreconnect true means that errors are retried (no matter what the error is). I'm going to stick a reasonable timeout there -- say 10 seconds, However, all errors will be signalled to the error callback. An argument will be passed to say if there is a retry pending. Calling Does this meet people's needs? |
Again, consider the case where the ESP and the broker power on at the same
time. The ESP will take a few seconds to come online and try to connect to
the broker. However, the broker could be a large system like a pc, or at
least a mini-version of one, generally linux-based. How long would such a
system take, from power on to the broker coming online? If you really must
put a timeout on retries, this must be taken into account in a robust
manner. Otherwise, after every power failure, all ESPs will need to be
restarted manually, because they will have timed out.
…On Jan 22, 2017 7:43 PM, "Philip Gladstone" ***@***.***> wrote:
The current approach is that the erro callback is *supposed* to be called
at most once after a connect call is made. This is a final callback, no
further action will be taken.
If autoreconnect means "always retry errors" then the programmer will get
no error callbacks at all (there is no final error).
I'm thinking that autoreconnect true means that errors are retried (no
matter what the error is). I'm going to stick a reasonable timeout there --
say 10 seconds, However, *all* errors will be signalled to the error
callback. An argument will be passed to say if there is a retry pending.
Calling close would stop the retry.
Does this meet people's needs?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1683 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AQC6BqwSd9vN-i-9Zkseu-cFCN_o_o3yks5rU9uIgaJpZM4LV8v8>
.
|
The programmer has two choices:
I suspect that most people will choose the second option (infinite retries). |
Actually, now I stare at the mqtt code some more, I'm not sure that I can make it do anything consistent around reconnects. I'm now feeling that the auto-reconnect option is a very bad idea. The programmer should deal with the reconnects by doing these from the error callback. The error callback should be guaranteed to be called at most once per connection. It should be called whenever the system gives up trying to get a good connection, or an existing connection breaks. Also, given that the default for I'm now in favor of the following:
|
Philip, your proposal sounds reasonable, I'm ok with that.
Had we somehow resolved #1538 we could make sure programers don't miss that. |
I've updated the documentation to make it clear that auto-reconnect should not be used. It particular, it interacts badly with I think that the underlying problem (of the missing callback) is also resolved here. |
@pjsg Way back when I started reporting about the mqtt becoming "stale" or unresponsive in #1406 the reconnect functionality was part of the situation. The following would have been nice: If auto-reconnect is set, when the mqtt connection becomes "stale" or unresponsive, auto-reconnect is tried as many times as I have told it to try to reconnect. If it is does not succeed after the tries the "offline" event in triggered with a auto-reconnect "true" flag allowing me to "know" reconnect failed and will allow for specific code in the callback. Now, different scenarios I experienced complicates the suggestion above. The scenarios consists of a combination of one or more of the following: ESP connection to AP failed. Thus as a final thought regarding the auto-reconnect: It should only be attempted if none of the 3 conditions above are true or present. If this can be incorporated in the back-end functionality and returned as flags in the callback it will be great. |
Is this waiting for a particular milestone? |
No, not really. But maybe for another approval (besides mine) from someone more familiar with the C-side of things 😉 |
👍 from me then, and +1 on adding deprecation warning in code |
I just tried to add the deprecation warning to the code so we could finally merge this...and I failed 😞 I added this
to https://github.com/nodemcu/nodemcu-firmware/pull/1683/files#diff-904c0a57714312fb890c87ccf9beb2a4R1076 but the compiler complained about
Why is this? After all, |
Yes, but the feature branch in this PR diverged before the deprecation function in ba9d3af was merged. |
Oh sorry, silly me...why didn't I check. I didn't realize this PR is that old. I'll merge it then and add the snippet to |
There you go: b645100. |
* Fix the error callback from not being called sometimes * Moved the setting of the reconnect status to after the connack is recevied * Increase the irom0_seg size * Updated the documentation * Make it clearer that autoreconnect is deprecated
Fixes #1680
dev
branch rather than formaster
.docs/en/*
.There were a number of issues (now fixed);
The downside is that there is a behavior change -- if the first connection fails, then reconnect is not performed.