-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MQTT doesn't detect broker outage and overflows message queue #1772
Comments
Update: If I disable reconnect option and modify logic like this m = mqtt.Client(MQTT_ClientID, 10, MQTT_Client_user, MQTT_Client_password)
m:on("offline", function(client)
publishMqtt:stop()
reconnMqtt:start()
print("MQTT: offline")
end)
--MQTT reconnect logic timer
reconnMqtt = tmr.create()
reconnMqtt:register(10, tmr.ALARM_SEMI, function (t)
reconnMqtt:interval(500);
print("MQTT: trying to connect to "..MQTT_BrokerIP..":"..MQTT_BrokerPort);
m:close()
m:connect(MQTT_BrokerIP, MQTT_BrokerPort, 0, 0, function(client)
print("MQTT: connected")
publishMqtt:start()
print("PUB: started")
end, function(client, reason)
publishMqtt:stop()
print("MQTT: connection failed with reason "..reason)
reconnMqtt:start()
end)
end) i.e. explicitly implement reconnect logic in my code, I get the script stop, because of not triggered any failure callback, while device still responds to wifi events and console
|
Please try with dev branch, there have been significant changes there that
are not present in the master branch yet.
|
Same on
|
#1683 hasn't landed yet. |
@marcelstoer, @devyte second test (with callbacks only) confirms that. Is #1683 planned to go into 2.0.0 release?
|
It will be merged to See https://github.com/nodemcu/nodemcu-firmware/#releases for a brief description of our release "process". |
Ok, thank you! What can you say about methods to determine publish queue state I proposed in the first post? How one can manage that queue to prevent memleak? Should I create a separate feature request for that proposal? |
There is no current way to figure out what is going on with the current mqtt implementation. It is all a bit of a mess.... Every time I dig into it, I find something more.... What do you think that the behavior ought to be when connection is lost to the server (even if that fact hasn't yet been discovered). My inclination is that we ought to have a max backlog of messages (settable when the client is created), and the publishing of a message would fail (either with an error code, or throwing an error). There needs to be a design doc for how the mqtt client is supposed to work -- then the implementation can probably be adjusted to match the specification. |
@pjsg The first thing that should be clarified is ambiguous callback system. What we have now is two set of them: 1st inside Then
BTW, I've checked the following script: m:on("offline", function(client)
publishMqtt:stop()
print("MQTT: connection failed with reason "..reason)
reconnMqtt:start()
end)
m:on("connect", function(client)
print("MQTT: connected")
publishMqtt:start()
print("PUB: started")
end)
--MQTT reconnect logic timer
reconnMqtt = tmr.create()
reconnMqtt:register(10, tmr.ALARM_SEMI, function (t)
reconnMqtt:interval(500);
print("MQTT: trying to connect to "..MQTT_BrokerIP..":"..MQTT_BrokerPort);
m:close()
m:connect(MQTT_BrokerIP, MQTT_BrokerPort, 0[, 1])
end)
reconnMqtt:start() and found out that p.s. @marcelstoer @pjsg , where can we discuss possible API/implementation changes for MQTT in more detail? |
The MQTT api is weird (as you note). My feeling is that we aren't going to make progress on this until there is a concrete proposal for the "fixed" API -- documented somewhere. This could be as simple as forking the repo, and then committing a new version of My two current nodemcu projects are not using mqtt, so I'm not highly motivated to drive this. |
Hi, I found this conversation while googling around similar problem - my node was not able to recognize that broker is unavailable and after some time running without broker it crashed and rebooted - againg and again, until broker becomes online. So I used your testcode (with a little modifications) to check if it is really because of that:
is never triggered or there is another bug in my code In my case I get success with FW 1.5.4.1-final. With 2.0.0 master / dev it tooks around 2.5minutes for the node to find out that broker is offline, m:on("offline")... was never fired, but "callback function for when the connection could not be established" (means the last one in "m:connect") was fired and reconnection started see my notes for more details: Im rewriting my code to do following with every published message:
|
Hi, I also experienced that the "on offline" event does not fire when the connection is lost. I believe the 2.5 minutes might be when the lwt is published by the broker when the broker stops receiving the keep alive heartbeat from the mqtt client and the broker then closes the connection. This then fires the offline event. I explored it further #1406 and found the control reports that it is connected but becomes unresponsive after time of inactivity resulting in a publish failure that can only be corrected by restarting the ESP and re-establishing a new MQTT connection. When the connection becomes "stale" re-connection is also not possible as the mqtt client still reports that it is connected to the broker. This is before the broker closes the connection. I devised similar workarounds:
|
Here is updated info about my investigation around this bug... In short - Im going back to 1.5.4.1, because it seems does not suffer with this bug and overall uses less heap. Will see if 2.0.0.0 will be fixed in future. |
Is there any way to know MQTT message queue size or limit it? I use the code listed below to connect to mosquito, handle reconnects and publish dumb test timestamps. And it works fine under stable conditions. But together with #1731 and possibly #1680 this code fails when I disconnect broker from the network and eating up all heap memory. @pjsg , Please can you take a look at that?
Expected behavior
When MQTT client detects broker gone away for some reason (e.g. timeout, which set to 10 seconds in this example) fire any of failure callbacks to stop sending.
Also it will be great to have any (or all) of the following
publish success
callback but alsopublish failed
callbackActual behavior
this doesn't work at all
Failure callback
inside the MQTT constructor haven't trigger eitherTest code
NodeMCU version
NodeMCU custom build by frightanic.com
branch: master
commit: 81ec366
SSL: false
modules: adc,dht,file,gpio,mqtt,net,node,ow,tmr,uart,wifi
build built on: 2017-01-31 20:44
powered by Lua 5.1.4 on SDK 1.5.4.1(39cb9a32)
Hardware
NodeMCU LUA Amica R2. I guess, it can be reproduced on any ESP8266 module.
The text was updated successfully, but these errors were encountered: