Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZMS memory alocation error when runStream is called. #1153

Closed
knight-of-ni opened this issue Nov 7, 2015 · 6 comments
Closed

ZMS memory alocation error when runStream is called. #1153

knight-of-ni opened this issue Nov 7, 2015 · 6 comments

Comments

@knight-of-ni
Copy link
Member

Finally getting around to documenting this.
I have been chasing this error for well over a month now, and I'm out of ideas.

I'm testing in several virtual environments:
Development build of ZoneMinder 1.28.108
2GB ram
One Airlink 747W vga, mjpeg ip camera
One Foscam 9821 720p, h264 ip camera
CentOS or Fedora

Problem:
On recent development builds, ZMS started crashing when attempting to live stream for one of the named cameras, but never both at the same time. Event playback works fine. In one test environment the crash appears when using the Airlink, but in another environment the crash appears when using the Foscam camera. This happens in my CentOS 6 and CentOS 7 environments, but I have not been able to duplicate this in a Fedora environment. If I roll back to zoneminder 1.28.1, everything works.

Now get this. If I turn Debug on, under options, both cameras stream fine! Hello, race condition?

This is what appears in the zoneminder log:

2015-11-07 10:38:59.675955  zms 6017    ERR Backtrace 13: /usr/libexec/zoneminder/cgi-bin/nph-zms() [0x4169f9]  zm_signal.cpp   105
2015-11-07 10:38:59.674948  zms 6017    ERR Backtrace 12: /lib64/libc.so.6(__libc_start_main+0xfd) [0x324021ed5d]   zm_signal.cpp   105
2015-11-07 10:38:59.674087  zms 6017    ERR Backtrace 11: /usr/libexec/zoneminder/cgi-bin/nph-zms(main+0xbe3) [0x417753]    zm_signal.cpp   105
2015-11-07 10:38:59.673139  zms 6017    ERR Backtrace 10: /usr/libexec/zoneminder/cgi-bin/nph-zms(_ZN13MonitorStream9runStreamEv+0xf6a) [0x43790a]  zm_signal.cpp   105
2015-11-07 10:38:59.672142  zms 6017    ERR Backtrace 9: /usr/lib64/libstdc++.so.6(_Znam+0x9) [0x324aebd1d9]    zm_signal.cpp   105
2015-11-07 10:38:59.671339  zms 6017    ERR Backtrace 8: /usr/lib64/libstdc++.so.6(_Znwm+0x1d) [0x324aebd0bd]   zm_signal.cpp   105
2015-11-07 10:38:59.670491  zms 6017    ERR Backtrace 7: /lib64/libc.so.6(__libc_malloc+0x5c) [0x324027ab1c]    zm_signal.cpp   105
2015-11-07 10:38:59.669538  zms 6017    ERR Backtrace 6: /lib64/libc.so.6() [0x324027a41a]  zm_signal.cpp   105
2015-11-07 10:38:59.668715  zms 6017    ERR Backtrace 5: /lib64/libc.so.6() [0x3240275f4e]  zm_signal.cpp   105
2015-11-07 10:38:59.661227  zms 6017    ERR Backtrace 4: /lib64/libc.so.6() [0x3240270537]  zm_signal.cpp   105
2015-11-07 10:38:59.660436  zms 6017    ERR Backtrace 3: /lib64/libc.so.6(abort+0x175) [0x3240233e05]   zm_signal.cpp   105
2015-11-07 10:38:59.659382  zms 6017    ERR Backtrace 2: /lib64/libc.so.6(gsignal+0x35) [0x3240232625]  zm_signal.cpp   105
2015-11-07 10:38:59.658599  zms 6017    ERR Backtrace 1: /lib64/libpthread.so.0() [0x324060f790]    zm_signal.cpp   105
2015-11-07 10:38:59.657843  zms 6017    ERR Backtrace 0: /usr/libexec/zoneminder/cgi-bin/nph-zms(_Z14zm_die_handleriP7siginfoPv+0x6c) [0x45e6ec]    zm_signal.cpp   105
2015-11-07 10:38:59.656758  zms 6017    ERR Signal address is 0x3000001781, from 0x3240232625   zm_signal.cpp   84
2015-11-07 10:38:59.642301  zms 6017    ERR Got signal 6 (Aborted), crashing    zm_signal.cpp   54

From the error text, one can see that the crash is happening around when ZMS calls runStream. By strategic placement of Info statements, I have confirmed that the crash happens when the "new" operator is called to create a temporary image buffer:
https://github.com/ZoneMinder/ZoneMinder/blob/master/src/zm_monitor.cpp#L4224

At this point in the code, temp_image_buffer_count has the default value of 1000. Lowering this value does not help. Also, giving the virtual machine more memory does not help either.

Recall this line works fine for one camera, but not the other.

I have tried using valgrind to tickle out more information by changing ZM_PATH_ZMS to point to a script valgrind.sh which looks like this:

#!/bin/bash
/usr/bin/valgrind -v --track-origins=yes --leak-check=full --trace-children=yes --log-file="/var/lib/zoneminder/temp/valgrind.%p.log" /usr/libexec/zoneminder/cgi-bin/nph-zms "$@"

ZoneMinder debug symbols have been loaded.
The full Valgrind output can be found here:
https://gist.github.com/knnniggett/73c571caf626cccada53

The last part of the Valgrind log appears to be the most interesting:

--6612-- Reading syms from /lib64/libnss_files-2.12.so
--6612-- REDIR: 0x3240280bd0 (__GI_strcpy) redirected to 0x66ca0a0 (__GI_strcpy)
--6612-- REDIR: 0x324028aaa0 (rawmemchr) redirected to 0x64c355c (_vgnU_ifunc_wrapper)
--6612-- REDIR: 0x324028aad0 (__GI___rawmemchr) redirected to 0x66cc000 (__GI___rawmemchr)
--6612-- REDIR: 0x3240281380 (strncmp) redirected to 0x64c355c (_vgnU_ifunc_wrapper)
--6612-- REDIR: 0x32402813c0 (__GI_strncmp) redirected to 0x66ca470 (__GI_strncmp)
--6612-- REDIR: 0x3240282c10 (strncpy) redirected to 0x64c355c (_vgnU_ifunc_wrapper)
--6612-- REDIR: 0x324028de10 (__GI_strncpy) redirected to 0x66ca2c0 (__GI_strncpy)
--6612-- REDIR: 0x324aebd1d0 (operator new[](unsigned long)) redirected to 0x66c9110 (operator new[](unsigned long))
--6612-- REDIR: 0x3240280ba0 (strcpy) redirected to 0x64c355c (_vgnU_ifunc_wrapper)
--6612-- REDIR: 0x324aebb2f0 (operator delete(void*)) redirected to 0x66c7f8f (operator delete(void*))
--6612-- REDIR: 0x324027b8b0 (posix_memalign) redirected to 0x66c75fb (posix_memalign)
--6612-- REDIR: 0x324aebb330 (operator delete[](void*)) redirected to 0x66c7adf (operator delete[](void*))
==6612== Invalid write of size 1
==6612==    at 0x32402FFAEC: __vsnprintf_chk (in /lib64/libc-2.12.so)
==6612==    by 0x32402FFA19: __snprintf_chk (in /lib64/libc-2.12.so)
==6612==    by 0x43789A: MonitorStream::runStream() (stdio2.h:66)
==6612==    by 0x417752: main (zms.cpp:302)
==6612==  Address 0x6afd0a8 is 0 bytes after a block of size 24 alloc'd
==6612==    at 0x66C8A2E: malloc (vg_replace_malloc.c:270)
==6612==    by 0x436FEF: MonitorStream::runStream() (zm_monitor.cpp:4207)
==6612==    by 0x417752: main (zms.cpp:302)
==6612== 
--6612-- REDIR: 0xffffffffff600000 (???) redirected to 0x38049533 (vgPlain_amd64_linux_REDIR_FOR_vgettimeofday)
==6612== Syscall param socketcall.sendto(msg) points to uninitialised byte(s)
==6612==    at 0x324060EE13: __sendto_nocancel (in /lib64/libpthread-2.12.so)
==6612==    by 0x4345B3: MonitorStream::processCommand(StreamBase::CmdMsg const*) (zm_monitor.cpp:4010)
==6612==    by 0x46001A: StreamBase::checkCommandQueue() (zm_stream.cpp:96)
==6612==    by 0x437094: MonitorStream::runStream() (zm_monitor.cpp:4244)
==6612==    by 0x417752: main (zms.cpp:302)
==6612==  Address 0x7feff9e6c is on thread 1's stack
==6612==  Uninitialised value was created by a stack allocation
==6612==    at 0x434310: MonitorStream::processCommand(StreamBase::CmdMsg const*) (zm_monitor.cpp:3749)
==6612== 

I'm currently trying to understand this information, and would appreciate the help if anyone has any ideas.

@sogood007
Copy link

        swap_path = (char *)malloc( swap_path_length+15 );
        Debug( 3, "Checking swap image path %s", config.path_swap );
        strncpy( swap_path, config.path_swap, swap_path_length );
        if ( checkSwapPath( swap_path, false ) ) {
            snprintf( &(swap_path[swap_path_length]), sizeof(swap_path)-swap_path_length, "/zmswap-m%d", monitor->Id() );

This code doesn't look right to me. The code appear to use sizeof to return the size of the malloc but it is not how it works. It actually return the size of "type" of swap_path which is the size of the C pointer. In this case, it will be 8 (as running in x86_64) I think it will be much clean to define a "constant" for the 15. and reuse on other part of the code.

so the snprintf will be underflow which and size_t is unsigned. So it will create a huge number. I don't know how big is the your monitorID. I don't know whether it could be more than 6 digits and overflow swap_path and then corrupt the heap. (which could explain the later new abort)

@sogood007
Copy link

if ( checkSwapPath( swap_path, true ) ) {
snprintf( &(swap_path[swap_path_length]), sizeof(swap_path)-swap_path_length, "/zmswap-q%06d", connkey );

The connkey in this one could overflow the buffer so the previous monitorID probably ok.

@knight-of-ni
Copy link
Member Author

Thanks for the feedback. I'll take a look those lines of code. I hadn't considered the previous memory allocation being the actual culprit.

@connortechnology
Copy link
Member

instead of sizeof(swap_path)-swap_path_length, couldn't we just write 15?

@knight-of-ni
Copy link
Member Author

It actually return the size of "type" of swap_path which is the size of the C pointer. In this case, it will be 8 (as running in x86_64) I think it will be much clean to define a "constant" for the 15. and reuse on other part of the code.

so the snprintf will be underflow which and size_t is unsigned. So it will create a huge number.

Boom. That was it. It took me a while to understand this, but once I replaced sizeof(swap_path)-swap_path_length with 15, as you both suggested, and saw the results, I finally understood what was happening. 8 - 15 = -7 which becomes a large number when represented as an unsigned integer.

Will continue to verify the issue is fixed in all my environments, but I'm confident it will.

And to think I've been staring at this and scratching my head literally for 2 months now.

@pliablepixels
Copy link
Member

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants