-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wifi connection lost..... No reconnection #3208
Comments
I've noticed same issue, when in some cases (did not found exact ones, so did not reported yet). Wi-Fi monitor will report |
There is definitely something wrong with the wifi sta modul.... wifi gets disconnected after some time and reports 201 and never reconnect until power cycle.. |
@chathurangawijetunge Maybe you got any steps to reproduce this constantly? On dev version, |
It only happens in long run over 24 hours... And It happens with both master and dev LUA5.3 getting wifi.eventmon.reason.AUTH_EXPIRE (2) The Device will not auto reconnect even with wifi.sta.disconnect() wifi.sta.connect() 3.0-master_20190907 work's fine. |
i'v tried wifi connection is little stable..... not sure y this is happening |
any updates with regards to above situation....? |
Yes |
Any one having this issue....? |
It may well be that nobody but you is experiencing this problem (yet); perhaps all our long-running esp8266es are still back on |
i don't know how to do git bisect exactly... NO_AP_FOUND (201) after about 12-24 hours no way to recover until power cycle |
I will leave 2 devices with #995114b LUA53 (I've obsoleted 5.1 in my head already) with weak wifi connection (<-85 dBm) and will respond after few days about results. As I mentioned before I've also had same issue with few boards but somehow I could not identify problem and now it for me it works stable (on the first glance). station_cfg = {}
wifi.sta.sethostname("WiFitest")
wifi.sta.autoconnect(1)
station_cfg.ssid = "ssid"
station_cfg.pwd = "password"
station_cfg.save = true
wifi.sta.config(station_cfg) With two wifi.eventmon.register(wifi.eventmon.STA_CONNECTED, function(T)
print("\n\tSTA - CONNECTED" .. "\n\tSSID: " .. T.SSID .. "\n\tBSSID: " ..
T.BSSID .. "\n\tChannel: " .. T.channel)
end)
wifi.eventmon.register(wifi.eventmon.STA_DISCONNECTED, function(T)
print("\n\tSTA - DISCONNECTED" .. "\n\tSSID: " .. T.SSID .. "\n\tBSSID: " ..
T.BSSID .. "\n\treason: " .. T.reason)
connectedToMqtt = false
end) Because it will use mqtt, I will add additional check: function MyMqtt.watch_mqtt()
tmr.create():alarm(10000, tmr.ALARM_AUTO, function()
if not connectedToMqtt and wifi.sta.getip() ~= nil and wifi.eventmon.STA_CONNECTED == 0 then
m:close() print('Reconnecting to Mqtt!') collectgarbage()
tmr.create():alarm(1000, tmr.ALARM_SINGLE, function()
MyMqtt.Connect()
end)
elseif not connectedToMqtt and wifi.sta.getip() == nil then
wifi.sta.config(station_cfg)
end
end)
end Also mqtt will be as indicator of lost and not restored connection if it will happen. |
This comment has been minimized.
This comment has been minimized.
@KT819GM: |
At what point does MyMqtt.watch_mqtt() is called..? |
@chathurangawijetunge Please either edit your earlier comments or merely refrain from making duplicate comments like that. They are, like the duplicate issues, not conducive to conversation. |
Yeah, it bit of brain fart, got stuck experimenting with
p.s. both units online, one at constant -86 / -92 dBm |
There is definitely something weird going on on the dev branch. Today, the node that I'm working on got into a state where it wouldn't connect to the AP. It kept on given eventmon reason 23 (a type of auth fail) and the AP also reported
I tried redoing the I tried disabling the AP that it was trying to connect to so that it would switch to a different AP. No help. I switched to I switched to The eventmon data showed the correct ssid. This started after I loaded a new LFS image. I have no idea whether this is related -- I include it for completeness. |
Not just with dev this happens in 3.0-master_20200610 |
I'm adding some code so that I can read out the last 12k of flash (where the wifi setup informaion is stored) and see when/if it changes unexpectedly. |
Any luck in finding the bug....,? |
I managed to reproduce it today. It turns out that (I think) SPIFFS writes into the flash area at the end of the flash chip and overwrites the wifi settings. This is ugly. Normally the last 12k doesn't change -- even on a reboot. however, sometimes it does -- it could be to do with reloading the LFS region -- that was when it happened. However, the LFS partition also got corrupted at that time, so I don't know whether I can really blame it. It was SPIFFS data that was found in the last 12k. I suppose that I ought to check that the spiffs partition doesn't overlap the end of the flash..... |
@pjsg, This shouldn't make any difference, because the SDK is supposed to use the PT now. See my comments in #3260. If there some bit of the code in our current SDK that are still writing to the old locations then we have wider issues that we need to scope and understand. We are currently running an old 3.0 SDK version. My first instinct would be to rebaseline to a current version and see if that fixes the problem before abandoning use of the Partition Table. |
We currently use SDK 3.0.1. From what you say saving the default |
This is my partition table:
I just iterated through all the partition types and these were the values that were returned. This looks plausible, but nevertheless, when you do wifi.sta.config, it does overwrite the last 12k of flash (I have a 4MB flash chip). |
This is as below and this looks pretty typical. It is worth moving SPIFFS to 1M for 1M, say, so you can see exactly what is writing to the forbidden region. Let me have a play.
|
I've just tried various combinations of |
On this unit |
We are currently running on 3.0.1 and the current is 3.0.4. We'll rebaseline the SDK immediately after the next master drop. This might help. |
I have following code ruing on 3 modules
after overnight all 3 went offline error '201 AP not found' this is in my init.lua not in LFS (now) |
I've bumped SDK to |
It seems that device have WTD restarted... |
Yeah, because "somebody" have done mqtt publish without checking if mqtt is available at all 😄. I've left it on battery and sadly can't fix that now. Still, if |
True... but to my experience this error happens only after about >24 hors so if the device reboots in between it might not pop |
not sure if this is related to this issue, but i have notice that by
will do nothing... by ssid ="" (empty sting) |
Your device is re starting.... not running continuously |
Even with |
@chathurangawijetunge, are you saying that 3.0-master_20190907 doesn't manifest this issue but 3.0-master_20200610 does? If so this piece of data will help to work out any underlying failure. |
To be honest I see watchdog restart for the first time, and I think it came from the code part I added from your example @TerryE Please consider checking WiFi when |
Yes @TerryE I have devices running on 3.0-master_20190907 over 6 months with out any issue. |
I'm going to try the example above and see if it does anything strange. I'm using a regular nodemcu board with nothing attached. However, I've been running a node off the dev branch for a while and after fixing the issue with spiffs overwriting the config, it has been rock solid. |
Thanks Philip 😊 |
After 24 hours, it is still running fine. Note that this code doesn't do anything except check for the status of the wifi. Some of the comments in this thread associated the failures with writing to spiffs. Another aspect that is different is the radio environment. I have a number of Ubiquiti APs and I have pretty strong signal and I'm running WPA2. When this fails, what environment does it fail in? @chathurangawijetunge |
My Wifi setting as follows also in ruing following code to check switch status with a timer of 50ms
i will remove above code and check if this problem is related to it...... and update...... |
I'm wondering if this is due to your 50ms timer -- maybe the wifi stack is
not getting enough CPU to actually maintain things. If this is the case,
then maybe the lua firmware could detect this case, and warn about it.
…On Wed, Sep 16, 2020 at 10:36 PM chathurangawijetunge < ***@***.***> wrote:
After 24 hours, it is still running fine. Note that this code doesn't do
anything except check for the status of the wifi. Some of the comments in
this thread associated the failures with writing to spiffs.
Another aspect that is different is the radio environment. I have a number
of Ubiquiti APs and I have pretty strong signal and I'm running WPA2.
When this fails, what environment does it fail in? @chathurangawijetunge
<https://github.com/chathurangawijetunge>
my Wifi setting as follows
RTS/CTS Threshold = 2347
Wireless Mode = 80211b+g+n
Channel Bandwidth = 20/40 Mhz
Authentication Type = WPA2-PSK
Encryption = AES
also in ruing following code to check switch status with a timer of 50ms
if #(Out_Pin or {})~=3 then
Out_Pin={}
Out_Pin[1] = 5 --GPIO-14
Out_Pin[2] = 6 --GPIO-12
Out_Pin[3] = 7 --GPIO-13
else print("'user Define out put pin set") end
--if table.getn(Sw_Pin or {})~=3 then
if #(Sw_Pin or {})~=3 then
Sw_Pin={}
Sw_Pin[1] = 0 --GPIO-4
Sw_Pin[2] = 1 --GPIO-5
Sw_Pin[3] = 2 --GPIO-16
else print("'user Define Switch pin set") end
Timer_status = {}
local Sw_Master = 3 -- GPIO0 and (+3.3v)
local prese_ctn=0
----------------------------------------------------------------------
gpio.mode(Sw_Master,gpio.INPUT)
gpio.write(Sw_Master,0)
--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
if #(file.getcontents("led") or "")~=6 then file.remove("led") end
if file.open("led", "r") then
for i=1, 3, 1 do
gpio.write(Out_Pin[i],string.gsub(file.readline(),"\n",""))
end
file.close()
end
--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
function write_led_Status()
file.open("led", "w")
for i=1,3, 1 do
file.writeline(Timer_status[i]==nil and gpio.read(Out_Pin[i]) or 0)
end
file.close()
end
--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
local function manual_on_off(pin)
gpio.write(Out_Pin[pin],gpio.read(Out_Pin[pin]) == 1 and 0 or 1)
write_led_Status()
pcall(LED_ON_OFF,pin,gpio.read(Out_Pin[pin]) == 1 and "on" or "off",1)
end
--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
local Sw_Clicks=0
local mytimer = tmr.create()
--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
local mytimer1 = tmr.create()
local sw_sta=gpio.read(Sw_Master)
local debounce=0
--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
local function check_master()
local function Process_switch_Press()
if prese_ctn>=10 then
pcall(system_reboot,1)
elseif prese_ctn>=1 and prese_ctn<=3 then
manual_on_off(prese_ctn)
elseif prese_ctn~=0 then
pcall(Beep,3)
end
prese_ctn=0
end
if sw_sta==0 and gpio.read(Sw_Master)==1 and math.abs(tmr.now()-debounce)>250000 then
debounce=tmr.now()
prese_ctn=prese_ctn+1
pcall(Beep,1)
mytimer1:alarm(750, tmr.ALARM_SINGLE, function ()
if gpio.read(Sw_Master)==1 then
local long_press=tmr.time()
mytimer1:alarm(500,1,function(t)
if math.abs(tmr.time()-long_press)==4 then
t:stop() prese_ctn=0
pcall(Go_AP_Mode)
elseif gpio.read(Sw_Master)==0 then
t:stop()
Process_switch_Press()
end
end)
else
Process_switch_Press()
end
end)
end
sw_sta=gpio.read(Sw_Master)
end
--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
local blink_ctn=0
tmr.create():alarm(50, 1, function()
check_master()
blink_ctn=blink_ctn>8 and 0 or blink_ctn+1
for i=1,3, 1 do
if Timer_status[i]=="timer" then
gpio.write(Sw_Pin[i], blink_ctn<=4 and 1 or 0)
else
gpio.write(Sw_Pin[i],gpio.read(Out_Pin[i])==1 and 0 or 1)
end
end
end)
--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
i will remove above code and check is this problem is related to it and
update......
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3208 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AALQLTKZ4S7TUWUS54SYD53SGFYZ7ANCNFSM4OYGNH3Q>
.
|
You are calling a pretty complicated function with various nested function calls every 50mS. What happens if it's execution time is near or over 50mS? You will always have a task ready to run and so are breaking SDK scheduling rules, as you are starving the WiFi stack the ability to run low priority housekeeping. My FAQ and the SDK API guides warm that this might happen. I am really tempted to close this unless you can do what the issue template asks for and that is to provide a minimal complete example that shows the failure mode. @pjsg, the task scheduling rules are what they are. IMO, it would be impractical to try to detect when a Lua developer isn't following them. |
I understand... but if this code works in 3.0-master_20190907 for continually over many months why not with the new firmware...? |
👍
That's a totally different question - a valid one, but kind of OT here. Nothing is ever going to be infinitely backwards compatible. Either our code or the Espressif SDK may change the behavior of your code. For our code we strive to mention breaking changes in the release notes. |
Feel free to ask the Q, and even try to answer it yourself. However if you want one of the maintainers to answer it for you and to fix the issue, then the first step is (as we ask) to supply a minimal, complete, and verifiable example that we can use to examine the core issue and determine a fix. |
At first with all respect to dev's - don't take this as some "cry to developers / hammer developers to find non-existing bug" thread. I've spent last two days checking commit history, so in bright side learned to use
Leaving literary part aside After Philip engaged in this thread I've removed most of the wifi.setmode(wifi.STATION)
wifi.sta.autoconnect(1)
wifi.sta.sethostname("TLStest")
wifi.setcountry({
country = "LT",
start_ch = 1,
end_ch = 13,
policy = wifi.COUNTRY_MANUAL
})
wifi.eventmon.register(wifi.eventmon.STA_CONNECTED, function(T)
print("\n\tSTA - CONNECTED" .. "\n\tSSID: " .. T.SSID .. "\n\tBSSID: " ..
T.BSSID .. "\n\tChannel: " .. T.channel)
end)
wifi.eventmon.register(wifi.eventmon.STA_DISCONNECTED, function(T)
print("\n\tSTA - DISCONNECTED" .. "\n\tSSID: " .. T.SSID .. "\n\tBSSID: " ..
T.BSSID .. "\n\treason: " .. T.reason)
_G.connectedToMqtt = false
end) Compiled firmware with Moving further commit 98e428f12edb7869993b5fa3d0eda3976f52a8f4
Author: Terry Ellison <Terry@ellisons.org.uk>
Date: Fri May 15 12:45:54 2020 +0100
Update wifi..c to fix #3106 Which led me to read about:
and
Though I'm not using light sleep, still for me So from all this, most likely fail observations, could any Thank you for whoever will take a look at this "essay". |
@KT819GM Modestas, I really appreciate this type of constructive feedback. A couple of of comments:
Incidentally my time on the project is pro-bono as and when available; I am currently having some yard-work done by some contractors and doing some of the associated tasks myself, so my NodeMCU work is itself being starved out a bit until this work is concluded. I will post further when I have time 😄 |
Thank you, will disable
Seems it would be better for some of us to come and do some of your yard-work, so you would have more time for nodemcu things. I'm pretty sure I would be better in digging than I'm in programming currently 😄 |
I have no idea what you guys have done to fix this issue... or with other fixes but I'm so happy to tell that new |
@chathurangawijetunge We have, I think, done nothing to address this issue, which lends credence to the theory that your code is treading dangerously close to instability occupying so much CPU time and denying the Espressif SDK stack the opportunity to run its tasks. If you have done nothing to correct your code, you should expect it to break again in the future, and I would ask you to please not file a similar issue with us until you can persuasively argue that your code is not starving the Espressif stack. |
I think that all our tasks run below the Espressif tasks. I suspect that the root cause was the missing IRAM_CACHE_ATTR on one of the functions being called at interrupt level. |
NodeMCU 3.0.0.0 built on nodemcu-build.com provided by frightanic.com
branch: dev
commit: 2fa63a1
release:
release DTS: 202007071335
SSL: false
build type: integer
LFS: 0x40000 bytes total capacity
modules: file,gpio,mqtt,net,node,rtctime,sntp,tmr,uart,wifi
build 2020-07-08 00:54 powered by Lua 5.1.4 on SDK 3.0.1-dev(fce080e)
I am using above build....
In long run wifi gets disconnected and does not reconnect automatically. even if i do a soft reset using node.restart() but if i toggle power supply it reconnects.
had this issue with 4 esp-07 devices .
not sure if it is a bug or not......
The text was updated successfully, but these errors were encountered: