-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPIO interrupts crash the device #874
Comments
Nick, I've got the family here getting ready for Christmas, so I'll keep this short. The extra line enables HW interrupts for the given pin. Level changes on that pin are now captures by the gpio ISR and this can cause side effects. It's complicated and it needs to be robust. Sorry can't go into more details now. Maybe tonight after everyone has gone to bed, because that's when I usually end up clearing the backlog :( |
Tested with 1.5.0 and stressing the GPIO interrupt still crashes the device. Or maybe not even stressing it. If I had a signal generator, I could get a better idea. |
Nick, let me make the changes first and raise the PR before you test this. Thanks |
@TerryE Are you going to use the eLua model for interrupts or some other model? I ask because the eLua model uses the lua_sethook mechanisms which render them useless for other purposes (AFAIK you can only have one hook registered at a time) |
It's a standard Lua mechanism rather than eLua, AFAIK. Yes you can only have one hook registered, What this means is that any library seeking to register a second hook my save the previous hook and daisy chain to it. But given that we don't use any by default, even if we don't do this it should be a major issue now. Sorry Philip, I picked up the comment and though you were referring to your PR. The sethoook mechanism is currently used to hook debugging and diagnostics into the VM, not for interrupt processing -- which is broadly what you are wanting to enhance. |
Nick, IMO the root causes are still the same as #845 so I will answer there. |
I've been going through the implementation for all of the GPIO devices and its a mess on a number of levels, and really needs a full rework. At the nub of this are two factors which really mismatch:
So we have task driven SDK with stringent timeout constraints vs. a Lua implementation really intended for a procedural execution over bare metal. Luckily as Lua is typically used as a scripting language for applications such as gaming kernels and database engines, it facilitates an implementation using the SDK tasking and callback framework. It's just that this is very different to that assumed in the eLua architectural overview: OK, the picture looks OK but the lack of commonality of the codebase at the platform interface level really underlines how different nodemcu is from a more typical eLua implementation. I think that Zeroday and his colleagues at NodeMCU Inc did a fantastic job in their initial especially given that the SDK is closed source and given this, the documentation is very inadequate, but IMO the whole paradigm for GPIO handling and the related drivers layered on this needs revisiting to see if they will ever work properly, and if they aren't going to then we need to actively deprecate them.
Ultimately, if you want to understand the shortcomings of the ESP8266, you only need to compare it to the spec of the ESP32. IMO, we should be honest about these short-comings and deprecate drivers that we know to be unreliable. I have more to say, but accidentally posted this comment in partial form, so I will now pause for feedback. |
Thanks for this. I think that there are levels of unreliability in drivers. What is acceptable for a hobbyist would not be acceptable for a production system. I have a humidity controller based on the dht module and using gpio to control a relay that drives a misting pump for my carnivorous plant tank. Yes, the reads from the dht module sometimes fail, but I get an indication that this has happened, and it typically works when I retry a couple of seconds later. I don't know (and to some degree I don't care) whether this is due to interrupts happing at a bad time, or something else. If the driver causes system failure (e.g. the current gpio in interrupt mode), then we should clearly deprecate it (or fix it). If the driver sometimes signals failure, then that is OK -- provided that this is documented and it is easy/possible to recover from this failure. I don't know whether this code base is supposed to be production ready or hobbyist ready.... |
Failure is detectable with retry is a perfectly good strategy. Enet used it happily for years. Relay and I/O pin logic is fine; using PWM I2C chip is fine (you can pick these up on eBay for few $); ditto any chip using @Alkorin Thomas's UART technique. But using the RT module will trash the Wifi, as can the serout function. What I am trying to work out are the incremental steps that we can take where each has a tangible improvement. My current plan on this commit is to leave the current low level code which bypasses the ROM/SDK GPIO API in place, and fix the architect stuff around timing. |
If you commit the new task code, then I'll undertake to improve the GPIO code so that it doesn't invoke LUA from the interrupt callback, and doesn't spend too long in the interrupt callback. The only decision is whether to try and timestamp events and pass that information to the lua callback, or just do the simplest thing. I suspect that the timestamping approach is more useful, though more work and more prone to overflow. |
Per @TerryE's comment in #845, adding ICACHE_RAM_ATTR to relevant GPIO interrupt handlers seems to solve the crash. I was unable to reproduce the crash over 5 minutes of frenetic button pushing. I tested this on both 1.5.0 and 1.5.1. The diff against 1.5.1 is: diff --git a/app/modules/gpio.c b/app/modules/gpio.c
index d4c189e..7e5ceb5 100644
--- a/app/modules/gpio.c
+++ b/app/modules/gpio.c
@@ -29,7 +29,7 @@ void lua_gpio_unref(unsigned pin){
gpio_cb_ref[pin] = LUA_NOREF;
}
-void gpio_intr_callback( unsigned pin, unsigned level )
+void ICACHE_RAM_ATTR gpio_intr_callback( unsigned pin, unsigned level )
{
NODE_DBG("pin:%d, level:%d \n", pin, level);
if(gpio_cb_ref[pin] == LUA_NOREF)
diff --git a/app/platform/platform.c b/app/platform/platform.c
index 1b06f7d..be5ed76 100755
--- a/app/platform/platform.c
+++ b/app/platform/platform.c
@@ -135,7 +135,7 @@ int platform_gpio_read( unsigned pin )
}
#ifdef GPIO_INTERRUPT_ENABLE
-static void platform_gpio_intr_dispatcher( void *arg) {
+static void ICACHE_RAM_ATTR platform_gpio_intr_dispatcher( void *arg) {
platform_gpio_intr_handler_fn_t cb = arg;
uint8 i, level;
uint32 gpio_status = GPIO_REG_READ(GPIO_STATUS_ADDRESS); This doesn't deal with the wider issue of Lua code being invoked from an interrupt handler (that's #845) and there's no point me submitting a PR for the above diff as Terry's rewrite of GPIO interrupt handling will override it soon enough. So closing. |
Thanks Nick, and sorry for my tardiness in getting this out but I've been spending 8 hrs a day working on the roof of our new build cutting roof window slots in the sarking, mounting roof windows, verge and fascia edging etc. and trying to keep ahead of my slater. I've just been to knackered after this to progress the coding, but at least this give me lots of mull time to sort out details in my head 😅 |
hi. can anybody please insert ICACHE_RAM_ATTR in the proper place/places of this script?
|
@shinji2009 |
Enabling an interrupt handler for GPIO events can cause a crash.
I'm breaking this off from #871 as it is more fundamental than enabling safe posting of Lua tasks after an interrupt is handled.
In my test case, the NodeMCU is doing essentially nothing but it still crashes.
See https://github.com/nickandrew/nodemcu-firmware/commits/debug-gpio-intr and there's a 1-line commit which enables the crash to be demonstrated:
I have an Infrared Receiver attached to GPIO0. Its output is normally high; goes low with a demodulated infrared signal.
To test, I call
node.bootreason()
from lua to enable the gpio interrupt handler, then send a blast of infrared data at it (by mashing the keys on the remote). I can cause a reset within 10 seconds if I try. Note that there is no Lua code being run by this interrupt handler, as no function has been hooked into it.I did the testing on SDK 1.4.0; will try it all again with 1.5.0.
The text was updated successfully, but these errors were encountered: