Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hook system debug messages #1613

Merged
merged 2 commits into from
Apr 13, 2019
Merged

Conversation

mikee47
Copy link
Contributor

@mikee47 mikee47 commented Feb 10, 2019

To help track down "E:M" errors output by system functions, #define UART_DEBUG_SNOOP and rebuild. Any lines starting with "E:M" will then receive a full stack dump.

@mikee47 mikee47 changed the title Snoop system debug for "E:M" messages and dump stack if detected Snoop system debug out for "E:M" messages and dump stack if detected Feb 10, 2019
@mikee47
Copy link
Contributor Author

mikee47 commented Feb 10, 2019

Another way to do this is to put the code into your application, then call ets_install_putc1 on that. Saves having to rebuild the framework. Could also trap text other than "E:M". Thoughts?

@mikee47 mikee47 force-pushed the uart_debug_putc_snoop branch from 8c1767e to d0a462c Compare February 10, 2019 14:18
@slaff slaff added this to the 3.7.2 milestone Feb 11, 2019
Copy link
Contributor

@slaff slaff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, see the comments and change also the Makefile's so that it is easy to set that directive at compile time.


if(c == '\r' || c == '\n') {
if(lineLength >= 2) {
if(memcmp(line, "E:M", 2) == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikee47 Maybe you can change this to "E:M " ( <-- one space after the capital M) just to avoid false positives.

@slaff
Copy link
Contributor

slaff commented Feb 11, 2019

Another way to do this is to put the code into your application, then call ets_install_putc1 on that. Saves having to rebuild the framework. Could also trap text other than "E:M". Thoughts?

Can you paste here some code to illustrate also the other approach?

@mikee47
Copy link
Contributor Author

mikee47 commented Feb 11, 2019

Another way to do this is to put the code into your application, then call ets_install_putc1 on that. Saves having to rebuild the framework. Could also trap text other than "E:M". Thoughts?
Can you paste here some code to illustrate also the other approach?

OK, so instead of changing uart.cpp we could place a version of uart_debug_putc() into a separate module, maybe just add it to crash_handler.c where we already have an implementation of print_stack(). Something like:

typedef std::function<void(const char* line)> snoop_callback_t;

static snoop_callback_t snoopCallback;


static void debug_snoop_putc(char c)
{
  uart_debug_putc(c);

  static char line[16];
  static unsigned lineLength = 0;

  if(c == '\r' || c == '\n') {
    line[lineLength] = '\0';
    snoopCallback(line);
    lineLength = 0;
  } else if(lineLength < sizeof(line) - 1) {
      line[lineLength++] = c;
  }
}

void setDebugSnoop(snoop_callback_t callback)
{
    snoopCallback = callback;
    
    ets_install_putc1(debug_snoop_putc);
}

And in our debugging application we'd have:

void init()
{
	Serial.begin(SERIAL_BAUD_RATE);
	Serial.systemDebugOutput(true); // Allow debug print to serial

  setDebugSnoop([](const char* line) {
    if(memcmp(line, "E:M", 2) == 0) {
      print_stack();
    }
  });
  
  ...
  
}

@slaff
Copy link
Contributor

slaff commented Feb 11, 2019

OK, so instead of changing uart.cpp we could ... Something like:

I like that!

@mikee47
Copy link
Contributor Author

mikee47 commented Feb 11, 2019

OK, so instead of changing uart.cpp we could ... Something like:

I like that!

How did I know you were going to say that. LOL.

@slaff
Copy link
Contributor

slaff commented Feb 11, 2019

How did I know you were going to say that. LOL.

;-) Some might call it "magic" but I guess it is due to "consistency". A consistent behaviour leads to predictable behaviour same as with programming code. Thus it makes it easy to guess, use and advance.

@mikee47
Copy link
Contributor Author

mikee47 commented Feb 11, 2019

That's deep, man :-)

@mikee47 mikee47 force-pushed the uart_debug_putc_snoop branch from d0a462c to 5f904f6 Compare February 12, 2019 14:25
@mikee47 mikee47 changed the title Snoop system debug out for "E:M" messages and dump stack if detected Debug hook for system debug messages Feb 12, 2019
@mikee47 mikee47 changed the title Debug hook for system debug messages Hook system debug messages Feb 12, 2019
@mikee47
Copy link
Contributor Author

mikee47 commented Feb 12, 2019

OK, this is a more generic hook which, when installed, suppresses actual output so can do stuff like this:

	static char buffer[256];
	sysdbgInstallHook([](const char* line, unsigned length) {
		debug_w("[SYS] %s", line);
		if(memcmp(line, "E:M ", 4) == 0) {
			print_stack();
		}
	}, buffer, sizeof(buffer));

That includes the timestamp on system messages as well as trapping E:Mstuff. I've also blocked recursion in case further system messages get emitted during the callback.

It does require the addition of #include "sysdbg.h" somewhere as I've put it in a separate module.

@slaff slaff removed this from the 3.7.2 milestone Feb 13, 2019
@slaff
Copy link
Contributor

slaff commented Feb 13, 2019

@mikee47 Thanks for your efforts. Your comments are quite helpful but I would like to ask you to either modify an existing sample or create a new one demonstrating the usage of the new feature(s).

@slaff slaff mentioned this pull request Apr 10, 2019
@mikee47
Copy link
Contributor Author

mikee47 commented Apr 10, 2019

... either modify an existing sample or create a new one demonstrating the usage of the new feature(s).

@slaff I'm thinking an addition to the LiveDebug sample, an additional command to force an "E:M" output, etc.

@slaff
Copy link
Contributor

slaff commented Apr 10, 2019

... either modify an existing sample or create a new one demonstrating the usage of the new feature(s).

I'm thinking an addition to the LiveDebug sample, an additional command to force an "E:M" output, etc.

Sounds good to me.

… sample to demonstrate

* SDK debug output is reassembled into lines and passed to a defined callback
* Add `new0` and `freetwice` commands to LiveDebug sample
@mikee47 mikee47 force-pushed the uart_debug_putc_snoop branch from 5f904f6 to f7b6e88 Compare April 10, 2019 20:37
@slaff slaff added this to the 3.8.0 milestone Apr 11, 2019
@slaff
Copy link
Contributor

slaff commented Apr 11, 2019

@mikee47 Is the PR tested and ready to merge?

@mikee47
Copy link
Contributor Author

mikee47 commented Apr 11, 2019

@slaff It works, but taking another look at the code as I'm not completely happy with it yet. It feels clunky and overly complex to use. I'll update later today, but in the meantime if you get a chance to run it would welcome your feedback on how it fits in with debugging in general, and whether it's clear enough what it's supposed to be doing.

@mikee47
Copy link
Contributor Author

mikee47 commented Apr 11, 2019

I'm also not 100% happy with the name - SystemDebug would be better than sysdbg but both could be misleading; maybe OsPrintfHook or something?

@slaff
Copy link
Contributor

slaff commented Apr 11, 2019

It feels clunky and overly complex to use. I'll update later today,

Perfect. Thanks in advance.

maybe OsPrintfHook or something?

How about OsMessageInterceptor or SdkMessageInterceptor?

* Add separate `LineBuffer` class to deal with line reconstruction, including `startsWith()` and `contains()` methods for simple parsing.
* Add `OsMessageInterceptor` class with standard `begin()` and `end()` usage
* Update `LiveDebug` sample to use LineBuffer for editing
@mikee47
Copy link
Contributor Author

mikee47 commented Apr 11, 2019

OsMessageInterceptor it is!

The rewrite is, I hope, an improvement. However, it does highlight an issue I've found using GDB when inspecting the stack. I'll illustrate with an example.

Remote debugging using /dev/ttyUSB0
gdbstub_init () at /media/r/sming-v4/Sming/gdb/gdbstub.cpp:908
908			gdbstub_do_break();
(gdb) c
Continuing.
LiveDebug sample
Explore some capabilities of the GDB debugger.

[OS] mode : softAP(de:4f:22:1c:26:66)
[OS] add if1
[OS] dhcp server start:(ip:192.168.4.1,mask:255.255.255.0,gw:192.168.4.1)
[OS] bcn 100
3083430 GdbAttach(1)
3138229 open log 0
(Attached) malloc0
6037893 gdb_read_console() returned 8
Attempting to allocate a zero-length array results in an OS debug message.
The message starts with 'E:M ...' and can often indicate a more serious memory allocation issue.
[OS] E:M 0
** OS Memory Error **

Program received signal SIGTRAP, Trace/breakpoint trap.
gdb_do_break () at /media/r/sming-v4/Sming/gdb/gdbstub.cpp:937
937		gdbstub_do_break();

All good so far. Now let's see if we can find which piece of code caused the problem.

(gdb) bt
#0  gdb_do_break () at /media/r/sming-v4/Sming/gdb/gdbstub.cpp:937
#1  0x402350f1 in onOsMessage (msg=...) at app/application.cpp:401
#2  0x4023fe5d in OsMessageInterceptor::putc (this=0x3ffee884 <osMessageInterceptor>, c=<optimized out>) at system/OsMessageInterceptor.cpp:23
#3  0x40237da8 in OsMessageInterceptor::__lambda0::operator() (__closure=0x0, c=<optimized out>) at system/OsMessageInterceptor.cpp:43
#4  OsMessageInterceptor::__lambda0::_FUN (c=<optimized out>) at system/OsMessageInterceptor.cpp:43
#5  0x40001db4 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) 

It stops before we get to anything useful. The 'corrupt stack' warning doesn't help either, and I cannot find any way to get GDB to skip the dodgy stack frames. To get any further I have to skip ahead manually, thus:

(gdb) set $sp=$sp+0x100
(gdb) bt
#0  gdb_do_break () at /media/r/sming-v4/Sming/gdb/gdbstub.cpp:937
#1  0x402350f1 in onOsMessage (msg=...) at app/application.cpp:401
#2  0x40237eac in Print::println (this=0x3ffee884 <osMessageInterceptor>) at Wiring/Print.cpp:123
#3  0x402340ac in handleCommand_malloc0 () at app/application.cpp:415
(gdb) 

That's better.

(gdb) frame 3
#3  0x402340ac in handleCommand_malloc0 () at app/application.cpp:415
415		os_malloc(0);
(gdb) disass/m
Dump of assembler code for function handleCommand_malloc0():
410	{
   0x40234080 <+0>:	movi	a9, 192
   0x40234083 <+3>:	sub	a1, a1, a9
   0x40234086 <+6>:	s32i	a0, a1, 188

411		Serial.println(
412			_F("Attempting to allocate a zero-length array results in an OS debug message.\r\n"
   0x40234089 <+9>:	mov.n	a2, a1
   0x4023408b <+11>:	l32r	a3, 0x40234078
   0x4023408e <+14>:	movi	a4, 176
   0x40234091 <+17>:	call0	0x40236248 <memcpy_aligned(void*, void const*, unsigned int)>

413			   "The message starts with 'E:M ...' and can often indicate a more serious memory allocation issue."));
   0x40234094 <+20>:	l32r	a2, 0x40218cb4
   0x40234097 <+23>:	mov.n	a3, a1
   0x40234099 <+25>:	call0	0x40237ee4 <Print::println(char const*)>

414	
415		os_malloc(0);
   0x4023409c <+28>:	movi.n	a2, 0
   0x4023409e <+30>:	l32r	a3, 0x4023407c
   0x402340a1 <+33>:	movi	a4, 0x19f
   0x402340a4 <+36>:	movi.n	a5, 1
   0x402340a6 <+38>:	l32r	a0, 0x4020279c
   0x402340a9 <+41>:	callx0	a0

416	
417		return true;
418	}
=> 0x402340ac <+44>:	movi.n	a2, 1
   0x402340ae <+46>:	l32i	a0, a1, 188
   0x402340b1 <+49>:	movi	a9, 192
   0x402340b4 <+52>:	add.n	a1, a1, a9
   0x402340b6 <+54>:	ret.n

End of assembler dump.
(gdb) 

@mikee47
Copy link
Contributor Author

mikee47 commented Apr 11, 2019

Also, I note that the Arduino project uses a different java-based stack dumper (repo here), and uses GDB to get the address information (instead of addr2line).

@slaff
Copy link
Contributor

slaff commented Apr 11, 2019

java-based stack dumper

Yes, the python stack dumper that we use is based on that code. I could check later on their progress and try to update the python code.

Backtrace stopped: previous frame identical to this frame (corrupt stack?)

The 'corrupt stack' warning doesn't help either, and I cannot find any way to get GDB to skip the dodgy stack frames.

@igrr @earlephilhower Do you have some ideas? (See the previous comments in the thread)

@igrr
Copy link

igrr commented Apr 12, 2019

0x40001db4 looks like it is ets_printf from ROM, so i assume that 1) GDB session doesn't have the ELF file for the ROM code and 2) ROM wasn't compiled with -fno-omit-frame-pointer.

@mikee47
Copy link
Contributor Author

mikee47 commented Apr 12, 2019

0x40001db4 looks like it is ets_printf from ROM, so i assume that 1) GDB session doesn't have the ELF file for the ROM code and 2) ROM wasn't compiled with -fno-omit-frame-pointer.

@igrr Thanks for the pointer - missing symbols it was

@mikee47
Copy link
Contributor Author

mikee47 commented Apr 12, 2019

@slaff With #1665 we now get:

(gdb) bt
#0  gdb_do_break () at r:/sming-v4/Sming/gdb/gdbstub.cpp:937
#1  0x40234f95 in onOsMessage (msg=...) at app/application.cpp:401
#2  0x4023de95 in OsMessageInterceptor::putc (this=0x3ffee884 <osMessageInterceptor>, c=<optimized out>)
    at system/OsMessageInterceptor.cpp:23
#3  0x40239728 in OsMessageInterceptor::static_putc (c=<optimized out>) at system/include/OsMessageInterceptor.h:93
#4  0x40001db4 in ets_write_char ()
#5  0x40001f46 in ets_vprintf ()
#6  0x40002514 in ets_printf ()
#7  0x40100650 in pvPortMalloc ()
#8  0x40233f64 in handleCommand_malloc0 () at app/application.cpp:415
#9  0x402348ac in handleCommand (cmd=...) at app/application.cpp:479
#10 0x40234b9d in onConsoleReadCompleted (info=...) at app/application.cpp:514
#11 0x4023682e in SystemClass::taskHandler (event=<optimized out>) at SmingCore/Platform/System.cpp:34
#12 0x40000f49 in ets_post ()
(gdb)

@mikee47
Copy link
Contributor Author

mikee47 commented Apr 12, 2019

I tried this again using the 'hang' command in LiveDebug sample:

(Attached) hang
590597360 gdb_read_console() returned 5
Entering infinite loop...


***** Software Watchdog Reset

Program received signal SIGPWR, Power fail/restart.
0x402352c3 in debug_crash_callback (rst_info=rst_info@entry=0x3ffffc10, stack=stack@entry=1073741248,
    stack_end=stack_end@entry=1073741744) at r:/sming-v4/Sming/appspecific/gdb/gdb_hooks.cpp:104
104             gdbstub_break_internal(DBGFLAG_RESTART);
(gdb) bt
#0  0x402352c3 in debug_crash_callback (rst_info=rst_info@entry=0x3ffffc10, stack=stack@entry=1073741248,
    stack_end=stack_end@entry=1073741744) at r:/sming-v4/Sming/appspecific/gdb/gdb_hooks.cpp:104
#1  0x40235bea in __wrap_system_restart_local () at system/crash_handler.c:66
#2  0x40103ed2 in pp_soft_wdt_feed_local ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)

Looks like pp_soft_wdt_feed_local is defined in libpp.a.

@slaff slaff merged commit 461bd82 into SmingHub:develop Apr 13, 2019
@slaff slaff removed the 0 - Backlog label Apr 13, 2019
@slaff slaff mentioned this pull request Apr 13, 2019
4 tasks
@mikee47 mikee47 deleted the uart_debug_putc_snoop branch April 14, 2019 12:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants