Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential PR: over-the-air upgrade support #806

Closed
jmattsson opened this issue Dec 1, 2015 · 27 comments
Closed

Potential PR: over-the-air upgrade support #806

jmattsson opened this issue Dec 1, 2015 · 27 comments
Assignees

Comments

@jmattsson
Copy link
Member

So over at DiUS we're currently working on getting some over-the-air upgrade support into our NodeMCU. It's something that could be contributed back upstream if there's a real interest in it (and thus get it tested even more), so I'm wanting to gauge said interest. Before you get all excited, let me tell you the trade-offs that come with it.

First of all, we're not basing it around the official SDK's FOTA support. After researching it, we decided against using it because:

  • it is not well documented
  • there is no evidence of any support for automatic rollback to the previous firmware in case of a bad upgrade
  • it requires using different linker scripts
  • you have* to build two different firmwares, one for slot 0 and another for slot 1, then pick the correct firmware when you're upgrading
  • there is no access to source code for the boot loader
  • the FOTA boot loader appears to have a lot of legacy/backwards-compatibility support in it, which in our eyes increases the risk of bugs in it

Because of this, we have decided to implement something which:

  • uses a minimal boot-loader
  • has the boot-loader set up the hardware watchdog before booting NodeMCU, to catch really broken upgrades
  • has a concept of "in-test" for a new firmware, and a maximum number of boots allowed before the system rolls back to the previous firmware version
  • requires active acknowledgment after booting a newly upgraded firmware to have it become permanent (so loading a firmware which breaks WiFi won't necessarily brick the unit)
  • uses only a single firmware image, built using the normal linker script
  • can repack a NodeMCU firmware into an OTA image (though said firmware still needs to be built with the appropriate module enabled in order to boot properly)
  • allows us to pack a NodeMCU firmware and (read-only) filesystem into each image, or have a shared filesystem, depending on the use-case

Specifically, our approach is not compatible/interchangeable with the SDK's FOTA, since:

  • we have the concept of "in-test" firmwares
  • we use a different header format (to prevent any accidental confusion)
  • despite poking around in the disassembly of the official FOTA stuff, we could find no reliable way to inter-operate with its idea of which meg of flash should be mapped (we were hoping we could set things up so the SDK got the right idea about flash mapping, but no such luck)

However, the major drawbacks are:

  • it will only ever support ESP modules with >= 2meg flash chip, since it relies on the ESP's hardware mapping of the flash to switch between firmwares, and that is only possible at megabyte boundaries
  • we override the ROM function Cache_Read_Enable() to force the correct firmware to be mapped. Conceivably this could break some future SDK functionality (though we consider it unlikely considering the approach they took with their Cache_Read_Enable_New() function)
  • it does require some explicit configuration of the SPIFFS location and size, which can be described as "somewhat advanced"

I think that just about covers it. If there's a real interest I'll probably have to spend a chunk of the xmas break to curate it for upstream, so I'll only do that if it's really wanted.


*) This actually does depend on flash size, but that is not immediately obvious when you first start working with it.

@jmattsson jmattsson self-assigned this Dec 1, 2015
@eliabieri
Copy link

Sounds very interesting. I'd love to have this feature for NodeMCU and the
Arduino ESP8266 port as well.

@nickandrew
Copy link
Contributor

Interesting. BTW, @raburton has an ESP bootloader with OTA support.

I don't know if the advantage of having a single firmware outweighs the disadvantage of never supporting the ESP-01, which I expect will have several times the install base of the larger modules. End users will demand ESP-01 support.

@raburton
Copy link

raburton commented Dec 1, 2015

rBoot would satisfy many of the requirements stated in the PR, and those that it doesn't probably could be added without much effort. Certainly taking rBoot as a starting point would be a lot less effort than doing it from scratch, although I suspect from the description of the intended features and techniques the OP may have already looked at rBoot. I'd be very happy to help you integrate this and add any extra features you might need, either upstream or in a custom version, if you would like (I did this for Sming who now use it as their standard OTA mechanism).

To answer the problems listed with the original sdk boot loader:

  • it is not well documented rBoot function and config is documented, and open source so easy to figure out anything you can't get from the documentation
  • there is no evidence of any support for automatic rollback to the previous firmware in case of a bad upgrade rBoot has better retry options for failed booting (even the option to boot a specific rom based on a GPIO, to have a "recovery mode" type switch), it also has the option to checksum the entire rom not just the small ram sections. It does not enable the watch dog so roms that simply hang on boot might still be a problem, but I've worked on the assumption they would get tested a bit better than that before deployment!
  • it requires using different linker scripts rBoot can do it this way if you want to support smaller devices, or it can do it as a single image in 1mb blocks
  • you have* to build two different firmwares, one for slot 0 and another for slot 1, then pick the correct firmware when you're upgrading that's the same point as the previous one, although where rboot has been integrated into Sming the build process takes care of this for you so it isn't a big deal.
  • there is no access to source code for the boot loader rBoot is fully open source, written in C mainly with a tiny bit of assembler and MIT licensed
  • the FOTA boot loader appears to have a lot of legacy/backwards-compatibility support in it, which in our eyes increases the risk of bugs in it not really sure what you mean there, but I suspect it doesn't apply to rBoot anyway.

As for the plans for the new bootloader:

  • uses a minimal boot-loader this is what rBoot is
  • has the boot-loader set up the hardware watchdog before booting NodeMCU, to catch really broken upgrades rBoot doesn't do this but it could be added
  • has a concept of "in-test" for a new firmware, and a maximum number of boots allowed before the system rolls back to the previous firmware version + requires active acknowledgment after booting a newly upgraded firmware to have it become permanent (so loading a firmware which breaks WiFi won't necessarily brick the unit) rBoot doesn't do this right now, but adding this has been discussed previously and it would be very easy to add a "temporary boot" option using the rtc data area to communicate from the rom to the bootloader on restart, sdk rtc data access functions have already been reversed and made available for rBoot
  • uses only a single firmware image, built using the normal linker script rBoot can do this or you have the option of multiple roms per 1mb flash chunk
  • can repack a NodeMCU firmware into an OTA image (though said firmware still needs to be built with the appropriate module enabled in order to boot properly) + allows us to pack a NodeMCU firmware and (read-only) filesystem into each image, or have a shared filesystem, depending on the use-case I don't know enough about nodemcu to know how this is done, but for Sming it handles spiffs filesystem images that contain resource files, so I'm sure this could be easily adapted for your needs.

The drawbacks:

  • it will only ever support ESP modules with >= 2meg flash chip, since it relies on the ESP's hardware mapping of the flash to switch between firmwares, and that is only possible at megabyte boundaries if you only having a single linked image then yes, there is no way around that, but having the option to use mutliple linked images does give you the flexibility to support smaller devices, however fitting two roms on a 512k device is always going to be pretty tight so you may never be able to support them well
  • we override the ROM function Cache_Read_Enable() to force the correct firmware to be mapped. Conceivably this could break some future SDK functionality (though we consider it unlikely considering the approach they took with their Cache_Read_Enable_New() function) rBoot overrides Cache_Read_Enable_New() because it is more efficient (less rom and ram usage), and so takes the same risk, but has been fine through at least 3 versions of the SDK, there is currently no way around this as there is no API to do this "properly"
  • it does require some explicit configuration of the SPIFFS location and size, which can be described as "somewhat advanced" if you are only using 1mb images per rom you don't really need to do anything manual, just dedicate half the space to the rom and half to the filesystem (or some other reasonable default) and most users will never need to fiddle with it, the mounting code can easily detect which rom slot is in use and add the appropriate flash offset to the spiffs_mount call.

@TerryE
Copy link
Collaborator

TerryE commented Dec 1, 2015

I don't know if the advantage of having a single firmware outweighs the disadvantage of never supporting the ESP-01, which I expect will have several times the install base of the larger modules. End users will demand ESP-01 support.

Given that a single standard minimal firmware image barely fits into an ESP-01, I don't see this as a real issue for Lua runtimes. There are two main usecases, IMO:

  • For firmware development. For various reasons, I use an Ubuntu 32bit VM as my build host but connect to my ESPs from my host Ubuntu 64bit laptop. Making the make - reblow firmware loop a fully scripted OTA step would halve the rebuild latency. I have no problems using a larger Flash module or even replacing the flash on an ESP-01 to do this.
  • OTA upgade of "production" modules. A nice to have IMO, but only if you have a truly secure signature system to ensure that correctly signed firmware is applied

@nickandrew
Copy link
Contributor

All my ESP-01 modules have 1 mbyte flash, so I should be able to squeeze two in there, I think.

@nodemcu
Copy link
Collaborator

nodemcu commented Dec 2, 2015

+1024, This will be awesome.
whether a new bootloader support nodemcu or a port from rBoot for nodemcu will be great.
Very agreed with TerryE that a truly secure signature system is needed.
and one more thing, I hope that the server side is also open-sourced,
a brief guide for one can build their own upgrade server in raspberry pi with alternative open-source project.

as for the 512k modules like esp-01.
is there any workaround? maybe a two stage bootloader?
the main idea is swap the file system with ota bootloader, because there is a file system anyway.
(don't know how big a ota bootloader is right now).

  • ota bootloader: bigger one, which can download nodemcu firmware from server, and wifi config/server ip/auth_key stored.
  • mini bootloader: small one, always reside at beginning of the flash, detects the flash and boot nodemcu firmware or ota bootloader.

flash map changes procedure:
normal flash map
mini bootloader------nodemcu firmware-----file system

---> stage 1, download ota bootloader use nodemcu firmware from server

after stage 1 flash map
mini bootloader------nodemcu firmware-----ota bootloader(bigger one, ota enabled and config stored)

---> reboot 1, the mini bootloader detects the ota bootloader exists, and copy it to right place.

after reboot 1 flash map
mini bootloader------ota bootloader---------

---> stag 2, ota bootloader download nodemcu firmware from server.

after stage 2 flash map
mini bootloader------ota bootloader---------nodemcu firmware

---> reboot 2, the mini bootloader detects the nodemcu firmware exists, and copy it to right place.

after reboot 2 flash map
mini bootloader------nodemcu firmware-----

---> then nodemcu first run, format the file system

after file system format flash map
mini bootloader------nodemcu firmware-----file system

is this failsafe and can roll back to safe mode?
maybe ota bootloader needs a failsafe way to reconfigure the wifi/ip/auth_key manually.

@raburton
Copy link

raburton commented Dec 2, 2015

@nodemcu rBoot is a fraction over 2k and sits in a single sector because it is a 'bare-metal' app, using only rom functions to interact with the device. The downside of this is no wifi, etc. If you want wifi you need to make a full normal app. This is going to weigh in at at least 200k, so you won't fit that on a 512k device alongside a copy of nodemcu.

As for open source server side, you don't need anything special for the server - just a web server. No point reinventing the wheel when you just want to deliver a file across a network.

@marcelstoer
Copy link
Member

I'm wanting to gauge said interest.

I personally never missed that feature. I'm sure OTA-update support is handy in some more involved use cases (as DiUS' interest proves 😉) but for most users it would simply add a bit of convenience.

I feel we have lots of more important issues we should spend our limited time with.

@nodemcu
Copy link
Collaborator

nodemcu commented Dec 2, 2015

@raburton so rBoot is like the sdk's FOTA but opensource.
a bootloader which can pull firmware down from network at least 200K.
then a minimal 1M flash is needed... the 2 stage way is not gonna work...

@marcelstoer, I think as a maker/hobbyist's toy, this add a bit of convenience.
but as an iot product's core, ota seems very helpful:)

@raburton
Copy link

raburton commented Dec 2, 2015

@nodemcu that's right, it was designed as an open source replacement for the sdk bootloader, with the same basic aims but with extra features and more flexibility. And of course being open source means you can extend it with whatever features you want, within reason. To remain as a bootloader it needs to be bare-metal and then load the real rom. If it's linked to the sdk libraries (as you'd need for network support) then it stops being a bootloader and just becomes a special purpose rom (at standard rom size), and incapable of loading the real target rom itself (it would need to swap things around and reboot to get the device to do the actual loading again of the target rom, a bit like in your example earlier).

@marcelstoer I guess if you are mostly just flashing new lua scripts and usually keeping the core the same then serial flashing isn't a big deal, but for apps built with the sdk/Sming/etc. where the whole rom is reflashed the convenience of flashing ota in a couple of seconds vs getting back into the rom loader and serial flashing in a couple of minutes is a big improvement. However when you actually deploy something ota is much more important, especially if you were to deploy something on a large scale or commercially.

I think that adding rBoot to nodemcu in it's simplest use case (like the original suggestion, using a pair of 1mb images on the flash from a single linked rom) would be trivial. If I get a chance I'll have a look at it this evening, but I'm not really that familiar with nodemcu use or internals so there will be a bit of a learning curve before I can get started.

@TerryE
Copy link
Collaborator

TerryE commented Dec 2, 2015

I'm not really that familiar with nodemcu use or internals so there will be a bit of a learning curve before I can get started.

The firmware is just a standard SDK-based application.

@marcelstoer
Copy link
Member

@raburton

ota is much more important, especially if you were to deploy something on a large scale or commercially.

Definitely, but is NodeMCU ready for that? I don't want to hijack this conversation but @jmattsson can probably better assess this question than myself.

My point is one of priority. I feel that NodeMCU has ways to go until "large scale" or "commercial" become relevant. I suggest we look into OTA once we get there, or shortly before that, rather than now. It's not just the ESP8266 which runs with constrained resources...it's the same for the NodeMCU team.

@raburton
Copy link

raburton commented Dec 2, 2015

My point is one of priority. I feel that NodeMCU has ways to go until "large scale" or "commercial" become relevant. I suggest we look into OTA once we get there

Hard to imagine it would get to that point without something as basic as OTA support.

It's not just the ESP8266 which runs with constrained resources...it's the same for the NodeMCU team.

I'm not part of the nodemcu team and wouldn't be doing something else for the project instead. I personally have no interest in nodemcu, I just offered to help with this feature because someone pulled me into the conversation.

@raburton
Copy link

raburton commented Dec 2, 2015

The firmware is just a standard SDK-based application.

But the build system isn't very friendly, so it's taken some figuring out how to get my bits compiled. Also, I really have no idea what to do with it now I have it up and running. I have a couple of nodemcu v0.9 boards which I bought for the handy hardware. When I got them I did briefly try to play with them before reflashing them, but got literally nowhere and quickly gave up. So now I need to try to figure out what I'm supposed to do with them in order to see how far I've actually got!

@raburton
Copy link

raburton commented Dec 2, 2015

Ok, progress was slower than expected but I have nodemcu firmware building for use with rBoot. My nodemcu board now has two copies of the nodemcu firmware on it, and you can query which of the two you are running and switch between them. Now I just need to add the actual ota code and we should have a working prototype.

@raburton
Copy link

raburton commented Dec 2, 2015

Added the ota code and it works. I can now ota update and switch between two different versions of nodemcu firmware. There is no error checking or feedback, it's a bit cobbled together at the moment, but it proves the point.

@marcelstoer
Copy link
Member

I just offered to help with this feature

Thanks! Sorry if I missed that in the earlier comments. I didn't realize you offered to actually implement this for us but I since found your #806 (comment).

@jmattsson
Copy link
Member Author

It sounds like people would prefer to go with an rBoot based OTA solution, and would really like to be able to do upgrades on sub-2meg chips as well.

@raburton thank you for your work on this! Please feel free to glance at what we're using over in the DiUS OTA branch. It's fully functional and feature complete^ at this point. Actual firmware fetching is left up to the Lua code in our case.

^) For our needs. Though we might add a GPIO check to force switch at boot time.

@nodemcu
Copy link
Collaborator

nodemcu commented Dec 3, 2015

It sounds like people would prefer to go with an rBoot based OTA solution, and would really like to be able to do upgrades on sub-2meg chips as well.

From the post I don't see any preferences.
I have some questions here, both solutions will has two nodemcu firmware in flash?
and these two firmware are different(use different linker file)?
so can not move them to another place in flash, right?
Is there any risks to brick board if we use one part of the file system to store an identical firmware
and move it to over-write the old one? of course we should checked the consistency before doing this.
this way there will be more space for file system.
and only one version firmware is needed compiled.

@jmattsson
Copy link
Member Author

I have some questions here, both solutions will has two nodemcu firmware in flash?

Yes. With rBoot you might even have more than two, I believe.

and these two firmware are different(use different linker file)?

With the approach we've taken at DiUS, no, we use the same linker file for both. According to Richard's comment above, rBoot is capable of that too. In our case, we use the same linker file for regular NodeMCU builds as well as OTA-enabled builds.

so can not move them to another place in flash, right?

The images have to start at a known offset on a megabyte boundary, in order for Cache_Read_Enable() to be able to map it correctly.

Is there any risks to brick board if we use one part of the file system to store an identical firmware
and move it to over-write the old one? of course we should checked the consistency before doing this.

Yes. If you're overwriting your "good" image, you'll always risk this. That is why we picked the approach of using the SPI flash mapping to switch between images. No over-writing needed.

this way there will be more space for file system.

Yes, but at the cost of not having a fallback/rollback option.

and only one version firmware is needed compiled.

This advantage we already get with either the DiUS OTA or rBoot (depending on its use).

@nodemcu
Copy link
Collaborator

nodemcu commented Dec 3, 2015

With the approach we've taken at DiUS, no, we use the same linker file for both. According to Richard's comment above, rBoot is capable of that too. In our case, we use the same linker file for regular NodeMCU builds as well as OTA-enabled builds.

oops, it's right up there. that's good.

The images have to start at a known offset on a megabyte boundary, in order for Cache_Read_Enable() to be able to map it correctly.

so a 2M minimal. how the file system mapped for the slot 0?

Yes. If you're overwriting your "good" image, you'll always risk this. That is why we picked the approach of using the SPI flash mapping to switch between images. No over-writing needed.

even after consistency check? if it refers to spi flash copy error, we can try many times.

@jmattsson
Copy link
Member Author

so a 2M minimal. how the file system mapped for the slot 0?

The file system doesn't go through the mapped SPI, but uses the raw flash API instead.

As for the location, for a shared filesystem (same filesystem regardless of image 0 or 1 booted) you can set SPIFFS_FIXED_LOCATION to e.g. 0x200000 to have the filesystem at 2meg+. I'm working on support for having a per-image filesystem, where each image would have its own embedded filesystem.

even after consistency check? if it refers to spi flash copy error, we can try many times.

I wasn't thinking so much about flash copy errors, but rather logical errors such as the new image not managing to connect over WiFi or fails talking to the upgrade server for some other reason. This is why we're so keen on the roll-back support - unless the new image actively tells the OTA subsystem that everything is A-OK, it will automatically revert back to a known-good firmware.

@raburton
Copy link

raburton commented Dec 3, 2015

@jmattsson Sorry, I didn't realise you'd already done all this, your description sounds very much like a plan of what you intended to do rather than having already done it. I thought I was offering a base that might give you a big head start but now it looks more like I've just hijacked your thread.

@nodemcu I agree with @jmattsson there is really no safe way to be updating the running rom, the only safe way is a two rom system. In Sming this makes pretty much makes 512k flash support impossible, but nodemcu is a much larger rom than the average sming one so it definitely is, at 440k+ one copy barely fits on there, by the time you've allowed some space for a filesystem, 2 sectors for the bootloader and 4 for the sdk config. As a result fitting 2 on a 1mb flash will be just as tight, but it is just about doable. My suggestion would be implement the feature for devices 2mb or larger, because it's a lot simpler for the user and for the build system, but you do have the option to try to squeeze in support for 1mb flash (512k will never work though).

To answer a few other random points/questions raised:

  • rBoot can have more than two roms, on a 4mb nodemcu dev board you could have 4x1mb roms, 8x512k roms, etc, or a combination, although I have yet to find anyone using rBoot with more than two roms
  • filesystem - this is entirely up to you, you can have a single fs that is shared, you can have one per rom or you can allow this to be a user option (with appropriate build system options for the user), as @jmattsson says this doesn't use memory mapping so you can have them anywhere, in the same 1mb as the rom or entirely separate if you have spare flash space. The Sming users flash layout of choice is (1mb rom + 1mb fs) + (1mb rom + 1mb fs) on a 4mb flash.
  • bootloader could be made to copy a second firmware from a dedicated location on the flash to the boot position, allowing 1mb devices with to have ota support slightly easier (single linked image, only one fs needed - but if you had the single shared fs that that would already be the case) but it has some disadvantages - no fallback, requires dedicated space leaving for the ota store, but how big? if the image grows too large for your slot updates will no longer be possible and you can't repartition that after deployment
  • i've seen flash copy errors several times, when writing over serial it works fine, suddenly they switch to ota and they get bad writes (the write claims to have succeeded so no error is thrown but on switching you find you have a bad rom, if you read back the rom there are blocks of 0x00 or 0xff), problem is device being under powered, as wifi is enabled when doing ota, unlike when flashing over serial, the power requirement goes up, one guy only had a 100mA power supply! swap to a proper power source and problem solved, remember this one - it might save you a few hours of pointless debugging when someone reports bad ota updates!

@jmattsson
Copy link
Member Author

@raburton No need to apologise! The reason I worded this the way I did was that I'm not sure whether the DiUS OTA solution is the right one for NodeMCU overall, and wanted to see what people thought. It's quite possible that an rBoot solution is a better fit, and besides, it's always nice to have a bit of choice!

Now, what we actually tend to do here at work is a 4-stage chain-load: ROM-> OTA boot loader -> DiUS sensor-sampler + loader -> NodeMCU
As such we already had the boot loader knowledge before we started doing OTA, and the OTA mechanism uses the same approach as we have for other embedded platforms. We just need to be a bit extra careful in the hand-over between the loaders so they don't overwrite each other :)
(As for why? Power saving. Lots & lots of power saving...)

Module docs for the DiUS OTA upgrade stuff are now available for those wishing to have a peek. The module is completely agnostic in terms of actually downloading the firmware (we have yet to agree on the best protocol & security to use).

@devyte
Copy link

devyte commented Jun 30, 2016

Hello, will this OTA method be eventually merged? Is there anything preventing that?

Assuming that transfer to the spiffs is already covered somehow, would there be any issues with reading from the spiffs and flashing with this method?

@jmattsson
Copy link
Member Author

I'm honestly not sure whether this will be merged. Currently this is competing with the rBoot implementation, and before we go either way (or try to merge those two approaches) I need to understand how we would support OTA on the ESP32, and that's quite up in the air due to lack of hardware (and the substantial changes from the ESP31B engineering samples and the eventual ESP32 design).

And no, there should be no issues using a file on SPIFFS as the source.

@stale
Copy link

stale bot commented Jun 7, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jun 7, 2019
@stale stale bot closed this as completed Jun 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants