Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TW#12288] Cannot connect to Wifi AP with latest code and firmware #207

Closed
MartyMacGyver opened this issue Jan 9, 2017 · 33 comments
Closed
Labels
Resolution: Done Issue is done internally Status: Done Issue is done internally Type: Bug bugs in IDF

Comments

@MartyMacGyver
Copy link

MartyMacGyver commented Jan 9, 2017

I downloaded and installed the latest code from master a few hours ago. Example 03_http_request will not work - the ESP32 wifi stack will not connect to my local router:

....
I (249769) wifi: n:11 0, o:11 0, ap:255 255, sta:11 0, prof:1
I (249769) wifi: state: init -> auth (b0)
I (250769) wifi: state: auth -> init (2)
.... repeats ....

My AP doesn't have an unusual SSID or password that should affect this. It is using the usual WPA2 authentication.

I also tested this against a hotspot with an even simpler configuration - the output above would pause when the hotspot was offline, and would loop as shown when the hotspot was active, showing it was seeing the hotspot even though it wouldn't connect to it.

In researching the problem I notice others have recently encountered the same issue in the last couple of days:
http://www.esp32.com/viewtopic.php?f=13&t=904

Given that, and as I'm using a completely fresh install, I think it's likely there is a bug in recently committed code or blob(s) that is leading to problem.

@andrei-ivanov
Copy link

One thing you could do is go back some revisions and see if those work.
When I got my board, people on the forum were doing that and were using the November 26/27th version.

@MartyMacGyver
Copy link
Author

After everything else, I did try checking out from a few days ago, but it didn't help matters. November is quite a distance backwards... and that leads to its own problems as that brings the main project out of sync with its submodules.

Obviously this worked at some point but wifi shouldn't be something that would stay broken for very long, especially given all the commits around wifi lately. However, there's not much to introspect on when it's mainly binaries being committed so it's hard to tell what broke it from an outside perspective.

@andrei-ivanov
Copy link

I know, but I was just suggesting this as a way to identify the commit that broke it.

@MartyMacGyver
Copy link
Author

MartyMacGyver commented Jan 9, 2017

I agree... There's been a ton of commits between then and now and clean builds are lengthy (dirty builds don't seem to notice the changed blobs so clean builds are the only sure way). It would take quite some time to narrow it down if it's possible to do so accurately given submodules.

@andrei-ivanov
Copy link

Since more people are complaining, maybe some ESP developer will notice this and find the problem without this.
My board has similar issues too, but then again I assume it's some issue with the board, not software.

@projectgus
Copy link
Contributor

projectgus commented Jan 9, 2017

Hi @MartyMacGyver,

Would it be possible for you to email me full the contents of your flash along with your SSID/password? You can get it with esptool.py --port PORT --baud 460800 read_flash 0 0x200000 flash_contents.bin. If you could send it to angus at espressif dot com, it should help us figure out what's going on.

Once you've saved the "broken" flash contents, can you please try "make erase_flash flash" to clear the entire flash and then re-flash it.

Please also check the output of git submodule status. If any of the the lines in the output start with - or + then this indicates a submodule mismatch.

For the record, we generally don't recommend rolling back the WiFi library submodules without also rolling back esp-idf to match. If you're looking for a regression, I'd recommend rolling back esp-idf and then updatng submodules to match that esp-idf revision. You can use "git bisect" to do this fairly quickly, although unfortunately you still need to "git submodule update" on each step.

(dirty builds don't seem to notice the changed blobs so clean builds are the only sure way)

If you're seeing this then it's a bug, but I couldn't reproduce it. If I modify any of the binary library files (ie touch components/esp32/lib/libphy.a) and run make then the linking step re-runs ("LD http-request.elf" is printed on the console). This is all that's required to pick changes in these libraries up.

Angus

@MartyMacGyver
Copy link
Author

Thanks! For the sake of this test, I'll create a simple test hotspot, verify it works with a normal device, then test it against the ESP32. Then I'll dump flash and send you the requested details. Finally, I'll try the erasure (haven't done that yet, at least not manually).

I'll also check the git details you mentioned to ensure its all consistent. I may do all this from a fresh install to be safe (I can back up my current install).

@MartyMacGyver
Copy link
Author

@projectgus - I forwarded the before and after logs and flash contents: make erase_flash flash made all the difference and it works now, but it raises some questions too.

I subsequently turned off my test hotspot and switched the settings to my normal Wifi and rebuilt and flashed (without erasing) - it still works fine... which makes me wonder, what did the erase_flash do exactly, that not only fixed the problem but allowed it to work even with a changed SSID and password?

I'd prefer the flash step actually do whatever needs to be done to avoid this problem, or to have a solid idea of when using erase_flash is appropriate (because that's probably not someone one wants to do routinely). I still think there's a bug, but it's more of a "flash may not always be sufficient" problem and characterizing that would be very useful in the long run.

(FWIW, I'm using the ESP-WROOM-32 "development kit" board from Adafruit, which is basically just a breakout board for this device.)

@projectgus
Copy link
Contributor

Thanks, got your email.

There's an NVS (non-volatile storage) data partition in the flash, where the wifi stores some data. Erasing the entire flash included erasing this partition, and seems to have made the difference.

This is a definitely a bug, there's no way that any amount of invalid data in NVS should have this effect (especially from normal operation & updates). We'll get it this fixed ASAP.

@MartyMacGyver
Copy link
Author

I'll watch for the fix - not quite sure how I'd test the fix, though maybe flashing the problematic flash dump back to the device might get it back to the original polluted state that erasing fixed.

@MartyMacGyver
Copy link
Author

Oh, and shall I rename this bug to make tracking the issue easier? Recommended title?

@projectgus
Copy link
Contributor

Thanks for the offer. I think the description is apt, people experiencing the issue will find it this way.

@MartyMacGyver
Copy link
Author

I just confirmed that I can write the "bad" flash back to the device, then flash over it with what worked and see that it fails (as expected), then re-do the erasure + flash and see that it works. So, testing any fix won't be too difficult. (I'm not sure why the dump needed only 2MB out of the normal 4MB flash size, but it seems to have been sufficient).

@seopyoon
Copy link

I have experienced the same with one of the DevC boards. @MartyMacGyver could you please elaborate on how you fixed it? What should I flash to the board? Even the simplest http_get example code is performing the same way you reported on your first thread.

@projectgus
Copy link
Contributor

@seopyoon , run "make erase_flash flash" to clean off the entire flash and re-flash it with the project.

@seopyoon
Copy link

@projectgus the following is what I get after having done what you have suggested.

D (16558) event: SYSTEM_EVENT_STA_DISCONNECTED, ssid:[email protected], ssid_len:15, bssid:98:de:d0:c4:a4:37, reason:2
I (16978) wifi: n:11 0, o:11 0, ap:255 255, sta:11 0, prof:1
I (16978) wifi: state: init -> auth (b0)
I (17978) wifi: state: auth -> init (2)
---- repeats ----

Is this bug different from the original post?? The original post does not have the first line, where the event DISCONNECTED is triggered. When tested the same on other 3 DevKit C boards, I do not see such behaviour. Thanks in advance.

@projectgus
Copy link
Contributor

I think you are seeing some other behaviour there. Is it possible the AP is deauthing you or otherwise rejecting your connection?

@seopyoon
Copy link

@projectgus The very same code on all other DevKits are working fine, so I cannot understand why the router would be deauthing this very kit. Why are you suggesting that my kit's behaviour is different from @MartyMacGyver 's?? Is it because erase_flash flash is not doing the trick?

@dpgeorge
Copy link

It may not be the same underlying problem, but I did have the same symptoms as described here (repeating init->auth, auth->init). My problem ended up being an issue with the antenna hardware.

@seopyoon
Copy link

@dpgeorge Ah-ha, it could be that the devKit is a faulty one then? That makes more sense, if that is the case.

@baycom
Copy link

baycom commented Jan 10, 2017

Try erasing the whole flash before flashing your app: make erase_flash flash

@andrei-ivanov
Copy link

I did a test too with the WiFiClientEvents Arduino sample but I can't get it to work, even with erase_flash :-(
How can I tell if it's a software issue or a hardware issue so I can try to get it exchanged? :-/
I only have one SparkFun version.

@seopyoon
Copy link

@baycom Yes, I have done that, and still behaves the same way, repeating that very lines. I would also like to know if there are ways to find out whether it is a hardware fault or something that can be fixed.

@baycom
Copy link

baycom commented Jan 10, 2017

I have not tried the Ardiuno stuff yet. Well maybe it works when calling the esptool directly with something like this:
~/esp/esp-idf/components/esptool_py/esptool/esptool.py --port /dev/ttyUSB0 erase_flash

@andrei-ivanov
Copy link

@baycom This is very similar to how I erased the flash, indeed.

@baycom
Copy link

baycom commented Jan 10, 2017

With my samples (devkit C and some of doit.am) it worked that way - that's all I can say. For sure this is a bug of the SDK.

@andrei-ivanov
Copy link

The version that works for me (but not stable enough anyway) is this: espressif/arduino-esp32@b82d0e1
I remember it because it was the latest when I bought the board and I used it to make the initial tests.

@projectgus
Copy link
Contributor

@seopyoon if you have two modules and the same "make erase_flash flash" works on one and doesn't work on the other (assuming no errors in the erasing & flashing output logs), then unfortunately you have faulty hardware.

@seopyoon
Copy link

@projectgus Thank you for the clear reply. Performing the same commands yields in different behaviours. I will have to assume the board is faulty.

@igrr igrr added the Type: Bug bugs in IDF label Jan 20, 2017
@FayeY FayeY changed the title Cannot connect to Wifi AP with latest code and firmware [TW#12288] Cannot connect to Wifi AP with latest code and firmware May 4, 2017
@igrr
Copy link
Member

igrr commented Aug 17, 2017

Closing as this seems to be fixed now. We have also implemented a compatibility test for NVS data in our CI environment.

@igrr igrr closed this as completed Aug 17, 2017
@Pratikhyadav
Copy link

Hello facing issue while sending data to esp32(AS HTTP server) and write received data into flash.

After some time my Soaftap of esp32 is disappear.

Please help me out in this issue. More detail about is on following thread
#1372

please let me know if more detail is required. i am using idf version 2.1

@Pratikhyadav
Copy link

Hello,
I created Simple Rest Server using netcon library.
Simple Rest Server Works Fine.
But When I introduce Flash Write Operation.
I am Writing to flash from 0x300000 location.
Size of Data Written into flash is 900KB.
My AP disappear After 3 Hour.

My partition file detail is As following:

Name, Type, SubType, Offset, Size, Flags
nvs, data, nvs, 0x9000, 0x4000,
otadata, data, ota, 0xd000, 0x2000,
phy_init, data, phy, 0xf000, 0x1000,
factory, app, factory, 0x10000, 0x100000,
ota_0, app, 0x10, 0x110000, 0x100000,
storage, data, 0x82, 0x210000, 0xC0000,

When I Replaced Flash Write Operation with 100 ms Delay, Which works fine more than 12 hours. And AP Not Disappear.

Can Any One please suggest me where to look?
Let me Know If anyone requires More Detail about it.

@MartyMacGyver
Copy link
Author

I suggest you open a new issue for this.

@espressif-bot espressif-bot added the Status: Opened Issue is new label Sep 28, 2021
@espressif-bot espressif-bot added Resolution: Done Issue is done internally Status: Done Issue is done internally and removed Status: Opened Issue is new labels Oct 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Resolution: Done Issue is done internally Status: Done Issue is done internally Type: Bug bugs in IDF
Projects
None yet
Development

No branches or pull requests

9 participants