Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTA Updating ESP32-S2 from 0.15.0-b4 fails #4241

Open
1 task done
espilioto opened this issue Nov 1, 2024 · 46 comments
Open
1 task done

OTA Updating ESP32-S2 from 0.15.0-b4 fails #4241

espilioto opened this issue Nov 1, 2024 · 46 comments
Labels
bug confirmed The bug is reproducable and confirmed

Comments

@espilioto
Copy link

What happened?

I have 2 instances of wled running on a ESP32-S2 that updated fine up to 0.15.0-b4.
Updated them through Home Assistant actually, everything rosy.

The problem is that they can't be updated to any version after that through OTA, as both the web interface and HA fail.
I have a third wled running on a 8266 and it updated fine manually and through HA just now to 0.15.0-b7 actually.

Any ideas?
Is plugging 'em in and reflashing my only choice?

Thanks in advance <3

To Reproduce Bug

Upload binary, WLED_0.15.0-b7_ESP32-S2.bin in my case and press the update button.
Alternatively through the update dialog in HA.

Expected Behavior

The web interface times out.
HA fails with this message: Failed to perform the action update/install. Error communicating with WLED API

Install Method

Binary from WLED.me

What version of WLED?

0.15.0-b4

Which microcontroller/board are you seeing the problem on?

ESP32-S2

Relevant log/trace output

No response

Anything else?

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@espilioto espilioto added the bug label Nov 1, 2024
@softhack007 softhack007 changed the title Update from 0.15.0-b4 fails OTA Updating ESP32-S2 from 0.15.0-b4 fails Nov 1, 2024
@DedeHai
Copy link
Collaborator

DedeHai commented Nov 1, 2024

I just uploaded the 0.15.0-b7 OTA to an S2, works fine.
can you backup your config, your presets, then do a factory reset and try to OTA again through web UI?

@DedeHai DedeHai added the cannot reproduce Developers are not able reproduce. Might be fixed already, or report is missing important details label Nov 1, 2024
@espilioto
Copy link
Author

espilioto commented Nov 1, 2024

Hey, thanks for the reply.

Nope, the reset happened but still can't OTA update.

edit: tried from both firefox and chrome, just to be sure.
edit 2: also tried 0.14.4, nothing.

@DedeHai
Copy link
Collaborator

DedeHai commented Nov 2, 2024

Did you try a fresh 0.14 install and then update it to 0.15 B7?

@blazoncek
Copy link
Collaborator

S2, if used with MQTT or other functions may be low on free heap. This may prevent OTA.

@espilioto
Copy link
Author

Did you try a fresh 0.14 install and then update it to 0.15 B7?

Tried downgrading to 0.14.4 via OTA but nothing. Obviously I'm trying to avoid manual flashing.

S2, if used with MQTT or other functions may be low on free heap. This may prevent OTA.

Hmm, can I do something about that? Maybe disable something in the settings?

@DedeHai
Copy link
Collaborator

DedeHai commented Nov 2, 2024

So when you did the factory reset test: did you also try to OTA directly in AP mode before bringing it back into your wifi?

@espilioto
Copy link
Author

espilioto commented Nov 2, 2024

So when you did the factory reset test: did you also try to OTA directly in AP mode before bringing it back into your wifi?

Didn't even think about trying it tbh.
Just did it though, same result.

S2, if used with MQTT or other functions may be low on free heap. This may prevent OTA.

No MQTT, just web interface and HA.
This gave me an idea though, deleted the HA integration (I figured it might use resources since it polls) and set my led length to one.
The free heap did gain a few kilobytes, but, of course, the update timed out once again.
(also I just remembered that the best case of this scenario was when I tried it right after the reset. Meh.)

@blazoncek
Copy link
Collaborator

Which board are you using? Lolin S2?

@espilioto
Copy link
Author

ESP32 WEMOS S2 mini V1.0
IMG_20241102_202545

@blazoncek
Copy link
Collaborator

That's Lolin all right. Have a few of them runnig on the edge of usable heap.
May require restart and immediate OTA. Show your Info screen.

@espilioto
Copy link
Author

image

@willmmiles
Copy link
Collaborator

I've got a few of those boards and I can replicate this issue. The root problem is heap exhaustion -- it appears to be that something in the wifi stack permanently locks up if it ever runs out of dma-capable heap, which overlaps partially with the regular heap used for other purposes. Unfortunately the S2s don't have much more available SRAM than the old 8266es, but a lot of the code shared with other ESP32s does larger allocations by default...

I've tried a few things here, manual curl commands, rate limiting, etc. to try to keep the dynamic heap usage down, but for me the OTA upload always stalls at about 60-80k uploaded. I'll take more of a look tomorrow. Unfortunately I don't think the situation will improve with -b7.

@DedeHai DedeHai added confirmed The bug is reproducable and confirmed and removed cannot reproduce Developers are not able reproduce. Might be fixed already, or report is missing important details labels Nov 3, 2024
@espilioto
Copy link
Author

Well that sucks.
Appreciate the insight Will.

So, I guess it either was luck or something changed with 0.15b4 as ota did work before, not sure since which version though.

@blazoncek
Copy link
Collaborator

FYI my S2 have between 15kB and 53kB free heap and I can update OTA all of them (so far). Sometimes I need to restart devices with low heap immediately prior to update.

If all else fails try curl or ArduinoOTA (espota). There is a shell script in tools folder that may help you update.

@espilioto
Copy link
Author

python espota.py -i 192.168.1.175 -f "C:\Users\SouLSLayeR\Desktop\WLED_0.15.0-b7_ESP32-S2.bin" -p 3232
Sending invitation to 192.168.1.175 ..........
11:22:49 [ERROR]: No response from the ESP

(also tried port 80)

Just to eliminate possible stupid mistakes, is there an option that should be enabled for espota to work?

@espilioto
Copy link
Author

After numerous reboots I got this:
Sending invitation to 192.168.1.175 .
Uploading.....
11:29:04 [ERROR]: Error Uploading: timed out

I'll keep hammering it till it uploads lol

@espilioto
Copy link
Author

espilioto commented Nov 3, 2024

Checked out the scripts, tried curl and also timed out, after a reboot too.

@blazoncek
Copy link
Collaborator

Then try espota, uses different protocol.

@espilioto
Copy link
Author

(comments got back to the future somehow 🤯)

@willmmiles
Copy link
Collaborator

Ugh, so my "reproducing case" turned out to be a broken version of curl -- apparently rate limiting was bugged in some older versions. I never would've expected that!

Anyhow, my recommendation would be to try, on a receny reboot, with curl -V >= 8.6.0 :
curl -v -v --limit-rate 50K http://<your_device>/update -F upload=@./WLED_0.15.0-b7_ESP32-S2.bin | cat

@espilioto
Copy link
Author

Anyhow, my recommendation would be to try, on a receny reboot, with curl -V >= 8.6.0 : curl -v -v --limit-rate 50K http://<your_device>/update -F upload=@./WLED_0.15.0-b7_ESP32-S2.bin | cat

Nothing 😭

@willmmiles
Copy link
Collaborator

Er, sorry, could you please post the output? Does it consistently stall at a particular place?

@espilioto
Copy link
Author

espilioto commented Nov 3, 2024

Exactly, always fails like this:

curl -v -v --limit-rate 50K http://192.168.1.175/update -F upload=@WLED_0.15.0-b7_ESP32-S2.bin
*   Trying 192.168.1.175:80...
* Connected to 192.168.1.175 (192.168.1.175) port 80
> POST /update HTTP/1.1
> Host: 192.168.1.175
> User-Agent: curl/8.9.1
> Accept: */*
> Content-Length: 1459129
> Content-Type: multipart/form-data; boundary=------------------------cidRy1ssfzitiqphhciDdD
> Expect: 100-continue
>
< HTTP/1.1 100 Continue
<
* Recv failure: Connection was reset
* closing connection #0
curl: (56) Recv failure: Connection was reset
curl 8.9.1 (Windows) libcurl/8.9.1 Schannel zlib/1.3 WinIDN

@DedeHai
Copy link
Collaborator

DedeHai commented Nov 4, 2024

what I find a bit peculiar about this issue: it works just fine on a fresh install (which I tested) but a factory reset does not seem to make it work for @espilioto. Also why would an update to 0.15 B4 be any different than an update to 0.15 B7?

@janchrillesen
Copy link

I see the same issue trying to upgrade a Lolin board from b6 to b7. The device shows 70.6 kb free heap right after reboot. I have tried upgrading from WLED itself as well as using rate limited curl. WLED restarts during the upload.

@kamkilt
Copy link

kamkilt commented Nov 18, 2024

I have the same issue with two ESP32-S2, but in my case OTA never worked, even before 0.15.0-b4.
I also had problems with flashing by USB, WLED web installer didn't worked (with esptool I also had some problems, but don't remember exactly), I have flashed with Tasmota Web installer (this worked flawlessly) and OTA to WLED, maybe that's the reason?

@espilioto
Copy link
Author

Ohhh, nice catch, if I remember correctly, that's case for my boards too!
The web installer didn't support the S2 (at the time?) so I flashed Tasmota first.

@softhack007
Copy link
Collaborator

It would be good to understand the root cause of this problem before releasing 0.15.0-RC1

@willmmiles
Copy link
Collaborator

Modern versions (>=v12) of Tasmota use a customized system for OTA updates: a small "safeboot" partition with an minimal OTA loader, and a single partition for holding the main application. While you can safely OTA install WLED in to Tasmota's application partition, and it will work fine from there, you will be unable to OTA update WLED past that as we don't (yet) support Tasmota's safeboot system. If you try it now, the device immediately crashes.

I'm going to take a quick look at why it crashes -- as far as I can tell, what should happen is simply returning failure to the OTA upload web request. It might shed some insight as to the other unexpected crashes OTA'ing WLED from an original WLED install.

(Long term: we might want to give some thought to supporting or adopting Tasmota's safeboot system for WLED; it'd buy us another MB or more of code storage on ESP32s without compromising on safety.)

@netmindz
Copy link
Collaborator

The challenge for adopting that approach surely is that we need a different partition table if I'm understanding your description correctly @willmmiles - which yes might be much better in the long term, but would require everyone to do a full reinstall rather than OTA upgrade, which is what is holding us back from just shrinking the filesystem to make more space for the firmware so we can swap to v4

@netmindz
Copy link
Collaborator

It's not clear from the comments on this thread if the issue is only present when upgrading from an earlier 0.15 beta to a newer or if the same issue is seen going from 0.14

If you can go from 0.14.4 to 0.15.0.b7 that's the first hurdle cleared.
If you can install 0.15.0.b7 over the top and see the success message, then that's the second hurdle resolved.

For anything else I think we might just need to accept that heap usage in those earlier builds prevents upgrade. We can't go back and make those previous builds use more memory and nothing about the file we are uploading is going to affect how it behaves during upload.

What we do want to be sure about however if that is people do install the 0.15.0 final release that they will be able to upgrade to 0.15.1 +

@netmindz
Copy link
Collaborator

I've just gone from a fresh install of 0.14 -> 0.15.b7 -> 0.15.0.rc1 without issue, so it's definitely not 100% failure if it's working for me and @blazoncek , then it suggests it might be an issue with users who have builds with extra usermods installed rather than the vanilla release and/or dependant on things like the number and type of LEDs defined.

Can anyone having issues updating please share a config backup so we can replicate your setup please? Just edit the json to remove your wifi name (the password is not included)

@netmindz netmindz added this to the 0.15.0-final candidate milestone Nov 24, 2024
@DedeHai
Copy link
Collaborator

DedeHai commented Nov 24, 2024

@netmindz the partition table can be re-written in a OTA update see: https://www.esp32.com/viewtopic.php?t=12004
I did not fully research this, there may be some pitfalls. If it is possible with current WLED partitions I imagine it would need to go something like tis:

  • users need to OTA to a intermediate "upgrade version" that loads the "safeboot" to the first OTA slot, i.e. 0x10000
  • upon reboot, the system then boots to the "safeboot" which can check if the partition table is updated or not and write the new table if it is still the old one (it may even be possible to reverse if that is ever needed)
  • now the user can update to any version the want

FS data should be preserved if that partition size is not changed as it is the last partition.
the update can brick devices if anything goes wrong with updating the partition table though.

there are implementation examples in that forum link.

@blazoncek
Copy link
Collaborator

To affirm @netmindz's find, I never had issues upgrading ESP32-S2 using OTA update (low heap situations needed reboot but otherwise they were fine). My first installation was using PIO with platform 5.3.0 (bootloader!). Since then I also tried platform 6.3.2 (current) without issues.

My gut feeling is the file system issue users are having as this was the main culprit I observed when I encountered OTA problems (other than low memory condition). WLED supports 2 flavors of OTA update: via HTTP using /update endpoint and ArduinoOTA (also used by espota.py tool). Both of them work for me and my S2s are really low on free heap (regularly I see below 20kB as I have several usermods loaded: temperature, PIR, multirelay, audio, etc).

Adopting @willmmiles 's suggestion to use intermediate application/2 step OTA would be best solution if it can be integrated into WLED (possibly with help from @Jason2866).

Other than that IMO the main issue lies in inadequate web flasher procedure prohibiting first time users to correctly flash their S2/S3 devices (possibly C3 as well). Firmware itself is ok once correct bootloader, app0 and partition map binaries are uploaded.

@Jason2866
Copy link
Contributor

  1. https://github.com/mathieucarbou/MycilaSafeBoot has implemented the safeboot approach
  2. Issues with WebInstaller is caused by using the orig. NabuCasa version which is using the faulty esptool.js from espressif. That's the reason for using a fork for Tasmota which has a enhanced Javascript version of the Adafruit esptool under the hood

@blazoncek
Copy link
Collaborator

Thank you @Jason2866

@espilioto
Copy link
Author

Can anyone having issues updating please share a config backup so we can replicate your setup please? Just edit the json to remove your wifi name (the password is not included)

wled_cfg_WLED-TV+.json

@janchrillesen
Copy link

I also installed using the Tasmota webflasher and then did an OTA update to WLED. Just flashed one of my S2's with beta6 using esptool and then OTA upgrading to beta 7. This works, so I can confirm that starting with tasmota seems like the root cause. For those wanting to flash the S2 using esptool I did it like this:

esptool.py erase_flash
esptool.py write_flash 0x01000 S2_bootloader.bin
esptool.py write_flash 0x08000 S2_partitions_4M.bin
esptool.py write_flash 0x10000 WLED_0.15.0-b6_ESP32-S2.bin

@Jason2866
Copy link
Contributor

Jason2866 commented Nov 25, 2024

@janchrillesen As expected and explained earlier. Tasmota uses a different partition scheme. It is not possible to OTA from actual Tasmota to any other firmware which is not using safeboot
As long WLED is using the original NabuCasa WebInstaller, flashing S2 with CDC port will not work. Known issue of the used espressif esptool.js
espressif/esptool-js#38

@blazoncek
Copy link
Collaborator

@janchrillesen can you explain where did you get bootloader and partition map from? Was it my Dropbox or did you get them elsewhere (PIO, etc)? You are missing app0 binary, though.

@Jason2866
Copy link
Contributor

Jason2866 commented Nov 25, 2024

@blazoncek It does work without app0 binary as long the main firmware is at address 0x10000 That's a fails safe fall back mechanism. But yes it is incorrect to do without!
Imho it would be a good idea to release additional factory images (for initial flash) which includes everything needed to avoid issues like this at all.

@janchrillesen
Copy link

janchrillesen commented Nov 25, 2024

@janchrillesen can you explain where did you get bootloader and partition map from? Was it my Dropbox or did you get them elsewhere (PIO, etc)? You are missing app0 binary, though.

Yes, I got it from https://github.com/Aircoookie/WLED/releases - go to the "WLED Beta Release 0.15.0-b2" release, under assets. I was not even aware there was an app0 file as well. Maybe you need to build from scratch with PIO to get it

@blazoncek
Copy link
Collaborator

Imho it would be a good idea to release additional factory images (for initial flash) which includes everything needed to avoid issues like this at all.

@softhack007 @netmindz @willmmiles @lost-hope hope this may be best approach if you want to avoid future problems when flashing various versions. Unfortunately that would mean separate binary for every possible combination of flash/PSRAM configurations as well as WLED options. The other option is to prepare "bootloader" image(s) for those as the firmware may end being the same. By "bootloader" I am thinking about combined image of proper bootloader, partiton map and app0 images.

I think @Aircoookie prepared such image for classic ESP32 long time ago (and many may still be flashing that today).

@Jason2866
Copy link
Contributor

Jason2866 commented Nov 25, 2024

@blazoncek Looking in release section of WLED. There are not many variants. Building the factory images can be fully integrated in Platformio build process. So when building a variant the factory image is generated too. See as example the Platformio script post_esp32.py from EspEasy or Tasmota

@blazoncek
Copy link
Collaborator

There are not many variants.

AFAIK S3 needs 3 different binaries: plain (no PSRAM), QSPI PSRAM and OPI PSRAM. ESP32 needs 2 binaries: 1 for rev.3 or newer (autodetect and use PSRAM if present), 1 for rev.1 for units with PSRAM. Others need one binary (compat and 160MHz versions are experimental).
We are not talking about different features within WLED which may require separate binary.

I am not PIO expert nor know how build process work. I only know what my own limited experience showed me. I hope someone more proficient will be able to streamline build process and release generation.

@Jason2866
Copy link
Contributor

Jason2866 commented Nov 25, 2024

Yes, and only providing the firmware does not change anything of this. A S3 firmware build for QIO Flash and PSRAM will only work for this setup. Providing or not providing the matching "helper" files do not change that. Not ment offending the release section here in github is just incomplete and irritating

Edit: Not using the exact matching bootloader (Flash type / speed) can end in write errors. This looks like happens "out of nowhere" and first noticed with Filesystem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug confirmed The bug is reproducable and confirmed
Projects
None yet
Development

No branches or pull requests

9 participants