-
-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
intermittent watchdog messages on esp32/esp32c3 #2914
Comments
All changing in #2913 so should disappear. |
@pljakobs can you check if the latest |
@pljakobs I will close the issue for now. If it happens again in the latest |
By way of explanation, the message appears because the task watchdog is created for the |
built my current version using the current sming develop branch fails an assert on the esp32c3:
same on the esp32:
|
@mikee47 I can't reopen this since I can only re-open issues that I have closed myself. I don't know if you get a notification on a closed issue, so: ping |
That assert indicates the queue hasn't been created yet: I'll try Basic_Wifi on the esp32c3, see what I can dig up. |
So I've tried the Basic_Wifi sample on an esp32c3 and runs without issue. We need to know what called If you rebuild with |
I can't reproduce the issue but I suspect something is trying to post to the Sming event queue before it's been created, probably during network initialisation. Can you try this patch to
|
will try, I have not looked at this more deeply yet (day job taking excessive cycles) - it happens right after initializing ConfigDB, but I will need to debug it more. |
with this startup.cpp, the issue persists
|
Going to need a reproducible sample or backtrace to diagnose this further. |
so it took me a while to set this up, broke out the jtag interface. Here's what I got:
I'm unsure that this really shows the issue since this is an unconfigured chip, meaning it does not connect to the wifi yet but should open it's AP for configuration - which it doesn't seem to do. |
Using GDB on the esp32 requires the
Should be able to break out then run
NB. I'm forcing this exception by patching NOTE: For some reason the esp32c3 isn't responding to GDB over serial (esp32 is fine). I'm not sure why, but it's annoying. |
OK, so at least with my esp32c3 dev. board I suspect that the auto-reset lines (RTS/CTS) on the serial port aren't being set correctly with GDB (both should be low). Can't find any gdb commands to override that behaviour, closest is flow control (which is disabled). Here's a workaround:
Replacing the IP as required. |
not sure how helpful this is, seems that gdb cannot identify the error?
but, here's the resulting backtrace:
|
So this is a different error to the one you reported above. The inconsistency suggests memory corruption of some sort. |
let's pause this for a bit, there's currently too much going on otherwise, I can't fully concentrate. |
@pljakobs when you have more time make sure to get the latest |
I'm currently chasing a bug in my own code that seems to try to configure an empty channels array, which might explain the stack canary error message. |
gdb on the esp32 keeps defeating me. I had no luck using an esp-prog to jtag debug (kept getting errors that core 1 was inaccessible) and I seem to not understand the built-in gdb either. Those results might not be overly helpful, but I could not get a true functional debug session going. I have built with the following GDB related sdk-config options:
which is the only way I could get a stack trace as seen below. I've tried Anyway, the below stacktrace seems to be what is reproducible in this setup.
and I've also seen doble exceptions:
the following is with ENABLE_GDB=1 and the GDB on panic handler. The backtrace pointing to somewhere in lfs / flash read is rather consistent.
|
also, I can confirm that checking out |
If you can point me to the code you're trying to debug, I'll see if I can reproduce the issue. |
it is, as usual, my rgbww firmware (https://github.com/pljakobs/esp_rgbww_firmware) - I wish I could provide a demonstrator here, but all I have is a basic hunch where this goes wrong (which is in initializing the network - or thereabout - as said, the stack trace seems to always be somewhere in reading flash for ConfigDB - which might just be coincidental) |
Which branch should I test? |
actually, while trying for a preproducer, I just re-built the Basic_IFS example and got this:
(I have set the sdk to print registers and halt on panic instead of rebooting. Seemed easier) |
otherwise, the current branch I'm actively working on for the rgbww firmware is feature/pinConfig |
okay, after a full rebild (dist-clean, components-clean, clean, flash) the above error in Basic_IFS is gone. Sigh. |
Is the webapp available anywhere pre-built? |
I can get the firmware built by excising the contents of |
OK, so hacked together some files and debugging throws on use of
to
or, better
|
In |
|
The problem is due to stack overflow. Because RAM is fragmented by freeRTOS we have to decide how much stack to allocate for each task. The This issue arose because of #2913. The Two ways to fix this:
I've done a full clean and rebuild and the crash has disappeared. Re-build with |
increasing the stack size will only impact esp32/eps32c3 builds, right? there, overall memory is not an issue so this should be good enough then. also: will changing the sdk-config be reflected in the project? (I can't see a change so far) or is there a way to make it part of a commit? also: I'm impressed with how fast you got to that. |
It took a lot of fiddling with #2913 so couldn't actually remember how it ended up. At one point I had the main task managing the IDF event queue, so inflating the stack for that would be very wasteful. However, turns out that for both networked and non-networked builds the
For non-networked applications step(2) is omitted and the main So the only side effect is heap fragmentation. |
@pljakobs OK, latest develop should fix this, default stack size is now 16K for |
I need to double check it, but I assume it will be fixed since changing the stack value in sdk-menuconfig already fixed it locally. |
I'm getting
intermittently.
Sometimes, that's followed by a restart, but I can't see any reason yet.
The text was updated successfully, but these errors were encountered: