Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenMAX hangs in OMX_EmptyThisBuffer #449

Closed
carlonluca opened this issue Jul 16, 2015 · 54 comments
Closed

OpenMAX hangs in OMX_EmptyThisBuffer #449

carlonluca opened this issue Jul 16, 2015 · 54 comments

Comments

@carlonluca
Copy link

Not sure if this is a fault of my code or not, but I'm experiencing a hang in OMX_EmptyThisBuffer. It seems very similar to #134.

My project is open: https://github.com/carlonluca/pi. I've been reported hangs to happen very rarely when playing a single video file (including audio), it takes many days of continuous playback to reproduce. I had runs of more then 40 days without issues.
As you'll see in the backtrace, one thread is decoding video to texture using EGLImage, one thread is decoding audio and one thread asks OpenMAX for the current position in the media here. Part of the code is taken from omxplayer.

What seems to be the same issue can be reproduced in seconds when trying to decode multiple video files at the same time on Pi1. Pi2 instead seems to work better. The firmware version on Pi2 is:

pi@raspberrypi ~ $ /opt/vc/bin/vcgencmd version
Apr 21 2015 14:42:19 
Copyright (c) 2012 Broadcom
version 2d5ad04b63af4233440c3f7c8587108223201102 (clean) (release)
pi@raspberrypi ~ $ uname -a
Linux raspberrypi 3.18.11-v7+ #781 SMP PREEMPT Tue Apr 21 18:07:59 BST 2015 armv7l GNU/Linux

The kernel version on Pi1:

pi@raspberrypi ~ $ uname -a
Linux raspberrypi 3.18.11+ #781 PREEMPT Tue Apr 21 18:02:18 BST 2015 armv6l GNU/Linux

Can't determine the GPU firmware version on Pi1 cause I'm keeping the locked "scene" under gdb and vcgencmd is not returning (would take other days to reproduce again), but the firmware should be the same of Pi2 (latest Raspbian image available on the Raspberry web site 2015-05-05).

Do you think this is a fault of my code or something firmware related?
This is a backtrace I got from the locked Pi1.

(gdb) t a a where

Thread 13 (Thread 0xb3ead450 (LWP 2598)):
#0  0xb58b1dcc in ioctl () from /lib/arm-linux-gnueabihf/libc.so.6
#1  0xb4212014 in completion_thread () from /opt/vc/lib/libvchiq_arm.so
#2  0xb423eda0 in vcos_thread_entry () from /opt/vc/lib/libvcos.so
#3  0xb5a8fc00 in start_thread () from /lib/arm-linux-gnueabihf/libpthread.so.0
#4  0xb58b9728 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6
#5  0xb58b9728 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 12 (Thread 0xb36ad450 (LWP 2599)):
#0  0xb5a96770 in sem_wait@@GLIBC_2.4 () from /lib/arm-linux-gnueabihf/libpthread.so.0
#1  0xb422cbec in dispmanx_notify_func () from /opt/vc/lib/libbcm_host.so
#2  0xb423eda0 in vcos_thread_entry () from /opt/vc/lib/libvcos.so
#3  0xb5a8fc00 in start_thread () from /lib/arm-linux-gnueabihf/libpthread.so.0
#4  0xb58b9728 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6
#5  0xb58b9728 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 11 (Thread 0xb2ead450 (LWP 2600)):
#0  0xb5a96770 in sem_wait@@GLIBC_2.4 () from /lib/arm-linux-gnueabihf/libpthread.so.0
#1  0xb4228b84 in tvservice_notify_func () from /opt/vc/lib/libbcm_host.so
#2  0xb423eda0 in vcos_thread_entry () from /opt/vc/lib/libvcos.so
#3  0xb5a8fc00 in start_thread () from /lib/arm-linux-gnueabihf/libpthread.so.0
#4  0xb58b9728 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6
#5  0xb58b9728 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 10 (Thread 0xb26ad450 (LWP 2601)):
#0  0xb5a96770 in sem_wait@@GLIBC_2.4 () from /lib/arm-linux-gnueabihf/libpthread.so.0
#1  0xb4229744 in cecservice_notify_func () from /opt/vc/lib/libbcm_host.so
#2  0xb423eda0 in vcos_thread_entry () from /opt/vc/lib/libvcos.so
#3  0xb5a8fc00 in start_thread () from /lib/arm-linux-gnueabihf/libpthread.so.0
#4  0xb58b9728 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6
#5  0xb58b9728 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 9 (Thread 0xb1b51450 (LWP 2602)):
#0  0xb58ac830 in poll () from /lib/arm-linux-gnueabihf/libc.so.6
#1  0xb42c25f0 in ?? () from /lib/arm-linux-gnueabihf/libglib-2.0.so.0
#2  0xb42c25f0 in ?? () from /lib/arm-linux-gnueabihf/libglib-2.0.so.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

---Type <return> to continue, or q <return> to quit---
Thread 8 (Thread 0xb132b450 (LWP 2603)):
#0  0xb58ac830 in poll () from /lib/arm-linux-gnueabihf/libc.so.6
#1  0xb42c25f0 in ?? () from /lib/arm-linux-gnueabihf/libglib-2.0.so.0
#2  0xb42c25f0 in ?? () from /lib/arm-linux-gnueabihf/libglib-2.0.so.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 7 (Thread 0xaebd9450 (LWP 2604)):
#0  0xb5a94504 in pthread_cond_wait@@GLIBC_2.4 () from /lib/arm-linux-gnueabihf/libpthread.so.0
#1  0xb423ee80 in _timer_thread () from /opt/vc/lib/libvcos.so
#2  0xb5a8fc00 in start_thread () from /lib/arm-linux-gnueabihf/libpthread.so.0
#3  0xb58b9728 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6
#4  0xb58b9728 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 6 (Thread 0xae3d9450 (LWP 2605)):
#0  0xb5a96770 in sem_wait@@GLIBC_2.4 () from /lib/arm-linux-gnueabihf/libpthread.so.0
#1  0xb4214ab0 in vchiu_queue_pop () from /opt/vc/lib/libvchiq_arm.so
#2  0xaf7bab9c in ilcs_task () from /opt/vc/lib/libopenmaxil.so
#3  0xb423eda0 in vcos_thread_entry () from /opt/vc/lib/libvcos.so
#4  0xb5a8fc00 in start_thread () from /lib/arm-linux-gnueabihf/libpthread.so.0
#5  0xb58b9728 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6
#6  0xb58b9728 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 5 (Thread 0xadbd9450 (LWP 2606)):
#0  0xb5a96770 in sem_wait@@GLIBC_2.4 () from /lib/arm-linux-gnueabihf/libpthread.so.0
#1  0xaf7ba7e0 in ilcs_execute_function_ex () from /opt/vc/lib/libopenmaxil.so
#2  0xaf7bb3f0 in ilcs_execute_function () from /opt/vc/lib/libopenmaxil.so
#3  0xaf7bbd34 in vcil_out_set () from /opt/vc/lib/libopenmaxil.so
#4  0x00000008 in ?? ()
#5  0x00000008 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 4 (Thread 0xad3d9450 (LWP 2607)):
#0  0xb5a94504 in pthread_cond_wait@@GLIBC_2.4 () from /lib/arm-linux-gnueabihf/libpthread.so.0
#1  0xb5b6fab4 in QWaitCondition::wait(QMutex*, unsigned long) () from /usr/local/Qt-rasp-5.4.1/lib/libQt5Core.so.5
#2  0xb6dc5240 in ?? () from /usr/local/Qt-rasp-5.4.1/lib/libQt5Quick.so.5
Cannot access memory at address 0x2
#3  0xb6dc5240 in ?? () from /usr/local/Qt-rasp-5.4.1/lib/libQt5Quick.so.5
Cannot access memory at address 0x2
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 3 (Thread 0xac32b450 (LWP 2610)):
#0  0xb5a96770 in sem_wait@@GLIBC_2.4 () from /lib/arm-linux-gnueabihf/libpthread.so.0
#1  0xaf7ba7e0 in ilcs_execute_function_ex () from /opt/vc/lib/libopenmaxil.so
---Type <return> to continue, or q <return> to quit---
#2  0xaf7bb658 in ilcs_pass_buffer () from /opt/vc/lib/libopenmaxil.so
#3  0xaf8c86a0 in COMXCoreComponent::EmptyThisBuffer(OMX_BUFFERHEADERTYPE*) () at /home/luca/project/pi/pi/piomxtextures_src/omxplayer_lib/OMXCore.cpp:369
#4  0xaf898e20 in COMXVideo::Decode(unsigned char*, int, double) () at /home/luca/project/pi/pi/piomxtextures_src/omxplayer_lib/OMXVideo.cpp:865
#5  0xaf8ac348 in OMXPlayerVideo::Decode(OMXPacket*) () at /home/luca/project/pi/pi/piomxtextures_src/omxplayer_lib/OMXPlayerVideo.cpp:229
#6  0xaf8ac510 in OMXPlayerVideo::Process() () at /home/luca/project/pi/pi/piomxtextures_src/omxplayer_lib/OMXPlayerVideo.cpp:269
#7  0xaf89bac4 in OMXThread::Run(void*) () from /usr/local/Qt-rasp-5.4.1/plugins/mediaservice/libopenmaxilmediaplayer.so
#8  0xb5a8fc00 in start_thread () from /lib/arm-linux-gnueabihf/libpthread.so.0
#9  0xb58b9728 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6
#10 0xb58b9728 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 2 (Thread 0xacb2b450 (LWP 2611)):
#0  0xb5a96770 in sem_wait@@GLIBC_2.4 () from /lib/arm-linux-gnueabihf/libpthread.so.0
#1  0xaf7ba7e0 in ilcs_execute_function_ex () from /opt/vc/lib/libopenmaxil.so
#2  0xaf7bb658 in ilcs_pass_buffer () from /opt/vc/lib/libopenmaxil.so
#3  0xaf8c86a0 in COMXCoreComponent::EmptyThisBuffer(OMX_BUFFERHEADERTYPE*) () at /home/luca/project/pi/pi/piomxtextures_src/omxplayer_lib/OMXCore.cpp:369
#4  0xaf8d6d24 in COMXAudio::AddPackets(void const*, unsigned int, double, double, unsigned int) ()
    at /home/luca/project/pi/pi/piomxtextures_src/omxplayer_lib/OMXAudio.cpp:953
#5  0xaf8c54b8 in OMXPlayerAudio::Decode(OMXPacket*) () at /home/luca/project/pi/pi/piomxtextures_src/omxplayer_lib/OMXPlayerAudio.cpp:256
#6  0xaf8c5790 in OMXPlayerAudio::Process() () at /home/luca/project/pi/pi/piomxtextures_src/omxplayer_lib/OMXPlayerAudio.cpp:313
#7  0xaf89bac4 in OMXThread::Run(void*) () from /usr/local/Qt-rasp-5.4.1/plugins/mediaservice/libopenmaxilmediaplayer.so
#8  0xb5a8fc00 in start_thread () from /lib/arm-linux-gnueabihf/libpthread.so.0
#9  0xb58b9728 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6
#10 0xb58b9728 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 1 (Thread 0xb41cc000 (LWP 2597)):
#0  0xb5a96770 in sem_wait@@GLIBC_2.4 () from /lib/arm-linux-gnueabihf/libpthread.so.0
#1  0xaf7ba7e0 in ilcs_execute_function_ex () from /opt/vc/lib/libopenmaxil.so
#2  0xaf7bb3f0 in ilcs_execute_function () from /opt/vc/lib/libopenmaxil.so
#3  0xaf7bbdf8 in vcil_out_get () from /opt/vc/lib/libopenmaxil.so
#4  0xaf8cad04 in COMXCoreComponent::GetConfig(OMX_INDEXTYPE, void*) const () at /home/luca/project/pi/pi/piomxtextures_src/omxplayer_lib/OMXCore.cpp:1199
#5  0xaf8cfec4 in OMXClock::OMXMediaTime(bool) () at /home/luca/project/pi/pi/piomxtextures_src/omxplayer_lib/OMXClock.cpp:337
#6  0xaf87b364 in OMX_MediaProcessor::streamPosition() () at /home/luca/project/pi/pi/piomxtextures_src/omx_mediaprocessor.cpp:558
#7  0xaf8e3b00 in OpenMAXILPlayerControl::position() const () at /home/luca/project/pi/pi/piomxtextures_qt_driver/mediaplayer/openmaxilplayercontrol.cpp:518
#8  0xb026ffe4 in ?? () from /usr/local/Qt-rasp-5.4.1/qml/QtMultimedia/libdeclarative_multimedia.so
#9  0xb026ffe4 in ?? () from /usr/local/Qt-rasp-5.4.1/qml/QtMultimedia/libdeclarative_multimedia.so
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
@dimovnike
Copy link

I experience this issue on many running PI's too, but the only difference is that I can not restart the aplication. Also when trying to run the app again, strace says:

lags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0xb261c4b8, tls=0xb261c910, child_tidptr=0xb261c4b8) = 5699
ioctl(4, 0xc40c, 0x70a003)              = 0
ioctl(4, 0xc40c, 0x70a003)              = 0
ioctl(4, 0x400cc404, 0xbe9eb67c)        = 0
ioctl(4, 0xc40d, 0x70a003)              = 0
ioctl(4, 0xc40c, 0x70a003)              = 0
ioctl(4, 0xc010c408, 0xbe9eb668)        = -1 EAGAIN (Resource temporarily unavailable)
futex(0xb41a71b8, FUTEX_WAIT_PRIVATE, 0, NULL

... and waits. Also even with my application killed, the /opt/vc/bin/vcgencmd can not start, strace says it is waiting for the same FUTEX.

@popcornmix
Copy link
Contributor

Can you describe how to reproduce the problem?
E.g. provide binaries or instructions to build it, a sample file and instructions of how to run it.
I can look on the firmware side to see if there is an obvious problem.

@carlonluca
Copy link
Author

Thank you for answering. Unfortunately it won't be simple... the backtrace I provided was related to a run that lasted some days playing a single video file. As you can see there is one thread for audio and one for video. I wasn't able to increase the frequency of the locks, but I noticed that using multiple players seems to make it very very simple to lock. It just takes a few seconds. This is the resulting backtrace: http://pastebin.com/CmjRJAgm. In this case you can see two video/audio players and one audio player.

Seems pretty similar to me. If you consider this reasonable, I can provide what is needed to reproduce this in seconds/minutes. If we are lucky it is the same issue.

I'm assuming you start from a clean Raspbian (I'm using 2015-05-05) using a Pi1. Pi2 seems to work a little better.

  1. Download the Qt libraries from here: https://goo.gl/MhsBGA. lrunzip and untar.
  2. Untar Qt-rasp-5.5.0.tar and qtdeps.tar. Place the directory Qt-rasp-5.5.0 in /usr/local into the pi and qtdeps wherever you want (in the Pi).
  3. Allocate 256MB of memory for the GPU in /boot/config.txt.
  4. Place this QML file into the Pi: https://github.com/carlonluca/pi/blob/master/piomxtextures_samples/video_audio_multi.qml.
  5. Add the lib dir which is into qtdeps to LD_LIBRARY_PATH.
  6. Use the qmlscene command to start the QML file providing the proper arguments:

/usr/local/Qt-rasp-5.5.0/bin/qmlscene video_audio_multi.qml file:///home/pi/big_buck_bunny_720p_h264.mov file:///home/pi/sintel_trailer_720p.mp4 file:///home/pi/master_of_puppets.mp3

So provide two video files and an audio file (like wave or mp3), in this order. Use a URI, not a path.

I just tried this procedure and it seems to work. I hope I didn't forget anything. Please let me know if something goes wrong.
If after some minutes everything seems to work properly, try to kill and restart the app or restart the Pi and try again. There are cases in which, for some reason, it locks in a few seconds, some others where it keeps working.

I could even reproduce with this QML: https://github.com/carlonluca/pi/blob/master/piomxtextures_samples/video_position.qml. One video file only and a timer frequently asking for the current position. But it really requires days to lock with the baktrace of the first message.

@dimovnike
Copy link

Hello, I catched another lock up and created a coredump from GDB. Is it of any use (i can send it with the QT app)?

@carlonluca
Copy link
Author

@popcornmix I read a comment you wrote some time ago in #377:

"I believe mmal and openmax are thread safe, so you don't need additional locking.
Obviously you need to protect your own structures (e.g. when accessed from a mmal callback)."

And in fact in omxplayer omx calls seem not to be guarded. Can you confirm that OpenMAX libs in raspberry are thread-safe?

@popcornmix
Copy link
Contributor

Yes, openmax calls are thread safe.

@carlonluca
Copy link
Author

By doing a large amount of seeks I can now reproduce a similar situation "quickly". This backtrace is a little clearer:

(gdb) t a a where

Thread 13 (Thread 0xb3c52450 (LWP 2415)):
#0  0xb5890dcc in ioctl () at ../sysdeps/unix/syscall-template.S:82
#1  0xb4068014 in completion_thread () from /opt/vc/lib/libvchiq_arm.so
#2  0xb4094da0 in vcos_thread_entry () from /opt/vc/lib/libvcos.so
#3  0xb5a6ec00 in start_thread (arg=0xb3c52450) at pthread_create.c:306
#4  0xb5898728 in ?? () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:116 from /lib/arm-linux-gnueabihf/libc.so.6
#5  0xb5898728 in ?? () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:116 from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 12 (Thread 0xb3452450 (LWP 2416)):
#0  0xb5a75770 in __new_sem_wait (sem=0xb4090d70) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:60
#1  0xb4082bec in dispmanx_notify_func () from /opt/vc/lib/libbcm_host.so
#2  0xb4094da0 in vcos_thread_entry () from /opt/vc/lib/libvcos.so
#3  0xb5a6ec00 in start_thread (arg=0xb3452450) at pthread_create.c:306
#4  0xb5898728 in ?? () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:116 from /lib/arm-linux-gnueabihf/libc.so.6
#5  0xb5898728 in ?? () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:116 from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 11 (Thread 0xb2c52450 (LWP 2417)):
#0  0xb5a75770 in __new_sem_wait (sem=0xb408feec) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:60
#1  0xb407eb84 in tvservice_notify_func () from /opt/vc/lib/libbcm_host.so
#2  0xb4094da0 in vcos_thread_entry () from /opt/vc/lib/libvcos.so
#3  0xb5a6ec00 in start_thread (arg=0xb2c52450) at pthread_create.c:306
#4  0xb5898728 in ?? () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:116 from /lib/arm-linux-gnueabihf/libc.so.6
#5  0xb5898728 in ?? () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:116 from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 10 (Thread 0xb2452450 (LWP 2418)):
#0  0xb5a75770 in __new_sem_wait (sem=0xb4090c74) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:60
#1  0xb407f744 in cecservice_notify_func () from /opt/vc/lib/libbcm_host.so
#2  0xb4094da0 in vcos_thread_entry () from /opt/vc/lib/libvcos.so
#3  0xb5a6ec00 in start_thread (arg=0xb2452450) at pthread_create.c:306
#4  0xb5898728 in ?? () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:116 from /lib/arm-linux-gnueabihf/libc.so.6
#5  0xb5898728 in ?? () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:116 from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 9 (Thread 0xb1a48450 (LWP 2419)):
#0  0xb588b830 in __GI___poll (fds=0xa40e70, nfds=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:87
#1  0xb56d45f0 in ?? () from /lib/arm-linux-gnueabihf/libglib-2.0.so.0
#2  0xb56d45f0 in ?? () from /lib/arm-linux-gnueabihf/libglib-2.0.so.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

---Type <return> to continue, or q <return> to quit---
Thread 8 (Thread 0xaedde450 (LWP 2420)):
#0  0xb5a73504 in __pthread_cond_wait (cond=0xae95e8, mutex=0xae95cc) at pthread_cond_wait.c:153
#1  0xb4094e80 in _timer_thread () from /opt/vc/lib/libvcos.so
#2  0xb5a6ec00 in start_thread (arg=0xaedde450) at pthread_create.c:306
#3  0xb5898728 in ?? () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:116 from /lib/arm-linux-gnueabihf/libc.so.6
#4  0xb5898728 in ?? () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:116 from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 7 (Thread 0xae5de450 (LWP 2421)):
#0  0xb5a75770 in __new_sem_wait (sem=0xae977c) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:60
#1  0xb406aab0 in vchiu_queue_pop () from /opt/vc/lib/libvchiq_arm.so
#2  0xaf9b2b9c in ilcs_task () from /opt/vc/lib/libopenmaxil.so
#3  0xb4094da0 in vcos_thread_entry () from /opt/vc/lib/libvcos.so
#4  0xb5a6ec00 in start_thread (arg=0xae5de450) at pthread_create.c:306
#5  0xb5898728 in ?? () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:116 from /lib/arm-linux-gnueabihf/libc.so.6
#6  0xb5898728 in ?? () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:116 from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 6 (Thread 0xaddde450 (LWP 2422)):
#0  0xb588b830 in __GI___poll (fds=0xaf1728, nfds=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:87
#1  0xb56d45f0 in ?? () from /lib/arm-linux-gnueabihf/libglib-2.0.so.0
#2  0xb56d45f0 in ?? () from /lib/arm-linux-gnueabihf/libglib-2.0.so.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 5 (Thread 0xad5de450 (LWP 2423)):
#0  0xb5a73504 in __pthread_cond_wait (cond=0xb24d08, mutex=0xb24cf0) at pthread_cond_wait.c:153
#1  0xb5b4b658 in QWaitCondition::wait(QMutex*, unsigned long) () from /usr/local/Qt-rasp-5.5.0/lib/libQt5Core.so.5
#2  0xb6e04ad4 in ?? () from /usr/local/Qt-rasp-5.5.0/lib/libQt5Quick.so.5
Cannot access memory at address 0x2
#3  0xb6e04ad4 in ?? () from /usr/local/Qt-rasp-5.5.0/lib/libQt5Quick.so.5
Cannot access memory at address 0x2
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 4 (Thread 0xacca4450 (LWP 2424)):
#0  0xb5a75770 in __new_sem_wait (sem=0xae9804) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:60
#1  0xaf9b27e0 in ilcs_execute_function_ex () from /opt/vc/lib/libopenmaxil.so
#2  0xaf9b3658 in ilcs_pass_buffer () from /opt/vc/lib/libopenmaxil.so
#3  0xafab1de0 in COMXCoreComponent::EmptyThisBuffer(OMX_BUFFERHEADERTYPE*) () at /home/luca/project/pi/pi/piomxtextures_src/omxplayer_lib/OMXCore.cpp:391
#4  0xafa821d4 in COMXVideo::Decode(unsigned char*, int, double, double) () at /home/luca/project/pi/pi/piomxtextures_src/omxplayer_lib/OMXVideo.cpp:827
#5  0xafa95a3c in OMXPlayerVideo::Decode(OMXPacket*) () at /home/luca/project/pi/pi/piomxtextures_src/omxplayer_lib/OMXPlayerVideo.cpp:217
#6  0xafa95c7c in OMXPlayerVideo::Process() () at /home/luca/project/pi/pi/piomxtextures_src/omxplayer_lib/OMXPlayerVideo.cpp:258
#7  0xafa85298 in OMXThread::Run(void*) () from /usr/local/Qt-rasp-5.5.0/plugins/mediaservice/libopenmaxilmediaplayer.so
#8  0xb5a6ec00 in start_thread (arg=0xacca4450) at pthread_create.c:306
#9  0xb5898728 in ?? () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:116 from /lib/arm-linux-gnueabihf/libc.so.6
#10 0xb5898728 in ?? () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:116 from /lib/arm-linux-gnueabihf/libc.so.6
---Type <return> to continue, or q <return> to quit---
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 3 (Thread 0xac4a4450 (LWP 2425)):
#0  syscall () at ../ports/sysdeps/unix/sysv/linux/arm/eabi/syscall.S:38
#1  0xb5b4389c in QBasicMutex::lockInternal() () from /usr/local/Qt-rasp-5.5.0/lib/libQt5Core.so.5
#2  0xb5b43c1c in QMutex::lock() () from /usr/local/Qt-rasp-5.5.0/lib/libQt5Core.so.5
#3  0xafab66a8 in lock_mutex(QMutex*) () at /home/luca/project/pi/pi/piomxtextures_src/omxplayer_lib/OMXCore.cpp:52
#4  0xafab1dc0 in COMXCoreComponent::EmptyThisBuffer(OMX_BUFFERHEADERTYPE*) () at /home/luca/project/pi/pi/piomxtextures_src/omxplayer_lib/OMXCore.cpp:389
#5  0xafac06ec in COMXAudio::AddPackets(void const*, unsigned int, double, double, unsigned int) ()
    at /home/luca/project/pi/pi/piomxtextures_src/omxplayer_lib/OMXAudio.cpp:953
#6  0xafaaebc8 in OMXPlayerAudio::Decode(OMXPacket*) () at /home/luca/project/pi/pi/piomxtextures_src/omxplayer_lib/OMXPlayerAudio.cpp:256
#7  0xafaaef14 in OMXPlayerAudio::Process() () at /home/luca/project/pi/pi/piomxtextures_src/omxplayer_lib/OMXPlayerAudio.cpp:314
#8  0xafa85298 in OMXThread::Run(void*) () from /usr/local/Qt-rasp-5.5.0/plugins/mediaservice/libopenmaxilmediaplayer.so
#9  0xb5a6ec00 in start_thread (arg=0xac4a4450) at pthread_create.c:306
#10 0xb5898728 in ?? () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:116 from /lib/arm-linux-gnueabihf/libc.so.6
#11 0xb5898728 in ?? () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:116 from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 2 (Thread 0xabca4450 (LWP 2426)):
#0  syscall () at ../ports/sysdeps/unix/sysv/linux/arm/eabi/syscall.S:38
#1  0xb5b4389c in QBasicMutex::lockInternal() () from /usr/local/Qt-rasp-5.5.0/lib/libQt5Core.so.5
#2  0xb5b43c1c in QMutex::lock() () from /usr/local/Qt-rasp-5.5.0/lib/libQt5Core.so.5
#3  0xafa5df64 in QMutexLocker::QMutexLocker(QBasicMutex*) () at /opt/rpi/sysroot/usr/local/Qt-rasp-5.5.0/include/QtCore/qmutex.h:128
#4  0xafab44b4 in COMXCoreComponent::GetConfig(OMX_INDEXTYPE, void*) () at /home/luca/project/pi/pi/piomxtextures_src/omxplayer_lib/OMXCore.cpp:1234
#5  0xafab9884 in OMXClock::OMXMediaTime(bool) () at /home/luca/project/pi/pi/piomxtextures_src/omxplayer_lib/OMXClock.cpp:337
#6  0xafa647b8 in OMX_MediaProcessor::mediaDecoding() () at /home/luca/project/pi/pi/piomxtextures_src/omx_mediaprocessor.cpp:718
#7  0xafa695fc in QtConcurrent::VoidStoredMemberFunctionPointerCall0<void, OMX_MediaProcessor>::runFunctor() ()
    at /opt/rpi/sysroot/usr/local/Qt-rasp-5.5.0/include/QtConcurrent/qtconcurrentstoredfunctioncall.h:199
#8  0xafa5ff78 in QtConcurrent::RunFunctionTask<void>::run() () at /opt/rpi/sysroot/usr/local/Qt-rasp-5.5.0/include/QtConcurrent/qtconcurrentrunbase.h:130
#9  0xb5b47554 in ?? () from /usr/local/Qt-rasp-5.5.0/lib/libQt5Core.so.5
#10 0xb5b47554 in ?? () from /usr/local/Qt-rasp-5.5.0/lib/libQt5Core.so.5
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 1 (Thread 0xb3fd7000 (LWP 2414)):
#0  0xb588b830 in __GI___poll (fds=0xb69d18, nfds=9, timeout=396) at ../sysdeps/unix/sysv/linux/poll.c:87
#1  0xb56d45f0 in ?? () from /lib/arm-linux-gnueabihf/libglib-2.0.so.0
#2  0xb56d45f0 in ?? () from /lib/arm-linux-gnueabihf/libglib-2.0.so.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

This situation required around 10,000 seeks. Before calling openmax I acquire a mutex now. What I see here is that Thread 4 is waiting on a semaphore inside libopenmax and is keeping my mutex locked. According to the openmax specification for OMX_EmptyThisBuffer:

"This call is a non-blocking call since the component will queue the buffer and return immediately. The buffer will be emptied later at the proper time. If the parameter nInputPortIndex in the buffer header does not specify a valid input port, the component returns OMX_ErrorBadPortIndex. The component should return from this call within five milliseconds."

So, as I see that OMX_EmptyThisBuffer call is not returning, do you know of any possible reason why OMX_EmptyThisBuffer may behave like this? If I knew a possible reason for this I may be able to somehow fix my implementation.

In this situation the Pi is stuck. I won't be able to run applications like omxplayer or vcgencmd. If I try to debug vcgencmd with parameter "version" I get this:

(gdb) t a a where

Thread 2 (Thread 0xb6e36470 (LWP 2501)):
#0  0xb6efadcc in ioctl () at ../sysdeps/unix/syscall-template.S:82
#1  0xb6fa2014 in completion_thread () from /opt/vc/lib/libvchiq_arm.so
#2  0xb6fb0da0 in vcos_thread_entry () from /opt/vc/lib/libvcos.so
#3  0xb6f87c00 in start_thread (arg=0xb6e36470) at pthread_create.c:306
#4  0xb6f02728 in ?? () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:116 from /lib/arm-linux-gnueabihf/libc.so.6
#5  0xb6f02728 in ?? () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:116 from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 1 (Thread 0xb6ff9000 (LWP 2498)):
#0  0xb6f8e770 in __new_sem_wait (sem=0x12838) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:60
#1  0x000099a4 in vc_gencmd_read_response ()
#2  0x000091bc in main ()

The process never exits. Probably useless but I try any way.
@popcornmix do you have any other advice on how to investigate further and track down what I may be doing wrong?

@6by9
Copy link

6by9 commented Oct 14, 2015

It sounds like you've managed to get the GPU stuck such that it can't process the EmptyThisBuffer call.
There may be some useful logging if you add "start_debug=1" to config.txt, then "sudo vcdbg log assert" should show any asserts that have fired in the firmware (it directly accesses memory rather than using an IPC, so should work even if the GPU has locked).

NB There is a difference between blocking such that buffer queues can be made thread safe, rather than blocking for the entire buffer to be processed. In the Pi case, the buffer queues are on the GPU, so have to go via

@popcornmix
Copy link
Contributor

@carlonluca If you can provide a test app I can run (e.g. a tarball I can unpack and run an executable, I may be able to see how the GPU is stuck). But, get the assert log suggested by @6by9 first, as that may contain some clues.

@carlonluca
Copy link
Author

This is what I had before starting the test:

001018.913: assert( source ) failed; ../../../../../middleware/confzilla/cp_front_fdt.c::cp_front_fdt_load_builtin line 115
004962.490: assert( limit_turbo == last_limit_turbo ) failed; ../../../../../helpers/arm_loader/arm_loader.c::tempcheck_task line 2380
008167.483: assert( limit_turbo == last_limit_turbo ) failed; ../../../../../helpers/arm_loader/arm_loader.c::tempcheck_task line 2380

this is what I got after the lock:

pi@raspberrypi ~ $ sudo vcdbg log assert
001018.913: assert( source ) failed; ../../../../../middleware/confzilla/cp_front_fdt.c::cp_front_fdt_load_builtin line 115
004962.490: assert( limit_turbo == last_limit_turbo ) failed; ../../../../../helpers/arm_loader/arm_loader.c::tempcheck_task line 2380
008167.483: assert( limit_turbo == last_limit_turbo ) failed; ../../../../../helpers/arm_loader/arm_loader.c::tempcheck_task line 2380
1771862.166: assert( elem ) failed; ../../../../../middleware/dispmanx/dispmanx.c::dispmanx_element_change_source line 5551
4112655.155: assert( elem ) failed; ../../../../../middleware/dispmanx/dispmanx.c::dispmanx_element_change_source line 5551
2353721.195: assert( elem ) failed; ../../../../../middleware/dispmanx/dispmanx.c::dispmanx_element_change_source line 5551

I made a couple of tests and the second time I got nothing at all, just the first three lines.

The procedure is similar to what I described above, unfortunately running requires some libs, plugins etc... I can provide the newer version, I can lock even with one player only now, but if you want something quicker I could provide a raspbian image with everything in place. You just run the binary. Would that make things simpler for you?
Thanks you for your answers.

@popcornmix
Copy link
Contributor

I don't think any of the asserts are necessarily relevant.
A raspbian sdcard image is okay.

@carlonluca
Copy link
Author

Hello, I prepared the image (https://goo.gl/cL3fld). It is intended for armv6 as the issue seems to manifest quicker on Pi1. I built the code in debug so you can get a backtrace if you need it. What you have to do is simply flash it, boot it and, from the home of the pi user run the proper command.
I can reproduce similar situations in several ways, always hanging in ilcs_execute_function_ex. I can advice a couple.

  • Running this makes my Pi's hang quickly, even after a few seconds. This almost always locks:
./piomxtextures_pocplayer --multipleanimtest /home/pi/big_buck_bunny_1080p_h264.mov /home/pi/big_buck_bunny_1080p_h264.mov

I suggest you use a 720p video instead of the 1080p I provided in the home, because this player plays both videos concurrently. What you should see is something like this: https://youtu.be/MuPGAWp7vag. The framerate is going to be far worse as the one in the video is a Pi2 and without all those debug logs. It should hang pretty soon always with the same backtraces I already reported.

  • Running this instead takes much more time, but it typically hangs anyway:
/usr/local/Qt-rasp-5.5.0/bin/qmlscene samples/video_seek.qml file:///home/pi/big_buck_bunny_1080p_h264.mov

this uses just a single player, like omxplayer, but still seems to hang in ilcs_execute_function_ex as I reported above. This typically requires from 10,000 to 100,000 seeks to hang (500ms for a seek so it takes hours). I understand it is difficult to reproduce something like this, so the first option may be the best to start with.

Running a single player without further interaction with openmax seems to be ok (I had a run of over 45 days). But trying to get the current position often using OMX_GetConfig or similar may result in these situations I reported after hours or days of playback. These are the best codes I found to let you reproduce quickly.
Any possible advice you can provide may be useful to fix my code or to find a workaround. Please let me know if you need anything else. Thanks!

@carlonluca
Copy link
Author

@popcornmix were you able to reproduce the issue? Is there anything else I can do to help you with this?

@popcornmix
Copy link
Contributor

I've had a look. I don't see anything suspicious on gpu side. No memory corruption. All tasks call stacks are valid with no sign of deadlock. Everything is idle waiting for a command.

The gencmd hanging was suspicious. Quit the piomxtextures_pocplayer process and the gencmd completes.

I wondered if this was due to too many gencmd clients (we only support 3). I tried rebuilding the firmware to support 10 gencmd clients and it still hangs. Checking, I am only getting one VCHI_CALLBACK_SERVICE_OPENED and a few VCHI_CALLBACK_MSG_AVAILABLE when launching piomxtextures_pocplayer.

After piomxtextures_pocplayer has stalled, then gencmd causes a VCHI_CALLBACK_SERVICE_OPENED message to get though but no VCHI_CALLBACK_MSG_AVAILABLE which is why it hangs. When piomxtextures_pocplayer is quit I see the VCHI_CALLBACK_MSG_AVAILABLE and the two VCHI_CALLBACK_SERVICE_CLOSED messages.

So, this feels like a vchiq deadlock issue, rather than an openmax issue. I think I need to call in @pelwell for vchiq help here. Phil is it possible you can look at this? I have an sdcard image that provokes the hang in a small number of seconds.

@pelwell
Copy link
Contributor

pelwell commented Nov 3, 2015

OK. Leave it where I can look at it in the morning.

@pelwell
Copy link
Contributor

pelwell commented Nov 4, 2015

I think I understand this now. In the logging I captured there were a number of places where VCHIQ - the driver, the user library or the queue utility that the ILCS client uses - was stalling because a queue was full, but I now think this was a symptom of the real problem, which is an ILCS limit on fhe number of concurrent outstanding requests coupled with a complication of allowing application callbacks to make further requests.

When a message is sent that expects a response, the calling thread blocks until the response comes back. This is implemented by associating a semaphore with a transaction ID, and signalling the semaphore when a response with a matching transaction ID is received. For reasons of either convenience or resource limiting, there is a fixed number of these semaphores available (4). If a thread finds that all semaphores are in use then it must wait for one to become available due to the completion of an outstanding request.

This mechanism would be simple and safe were it not for the fact that callbacks into applications are allowed to make requests. Such callbacks are made in the context of the server thread, and if the server thread is blocked then no responses are processed, thus the consequence of the server thread failing to get a slot is deadlock.

The function in question (ilcs_execute_function_ex) pays lip service to the problem without actually doing enough to avoid it. My work around is for the server thread to try to claim the event in a non-blocking fashion, and on failure to process some messages before trying again. If the second attempt fails then it sleeps for 1 millisecond before going round the loop again. This logic has been enough to stop the test from stalling, without the need to increase the concurrency above 4.

You can find the patch in this gist, to be applied against the userland repo. Build using the buildme script, and copy the build products from build/lib to /opt/vc/lib .

@pelwell
Copy link
Contributor

pelwell commented Nov 5, 2015

With that patch applied the test has been running for 15 hours without stalling. Since there are going to be performance penalties caused by the server thread waiting and retrying, I'll also double the thread limit to 8 - the additional memory and search time is negligible.

@carlonluca
Copy link
Author

Thank you @pelwell for you work! I'll test your patch asap. This may explain why I was able to prevent a bit the lockup by calling omx only inside a mutex zone.
Do you think the seek lockup (shown above #449 (comment)) is another symptom of the same issue? I'm asking because in that case, using a mutex zone on my side was not enough for preventing lockup. However as you can see from the backtrace, the situation seemed to be exactly the same.

@pelwell
Copy link
Contributor

pelwell commented Nov 5, 2015

I would say that the seek lockup isn't another manifestation of the same problem - I would expect to see at least four threads waiting in ilcs_execute_function_ex, when in the traces above there is only one.

@carlonluca
Copy link
Author

Do you have any advice on how I may be able to investigate that problem further? Are there other possible reasons why that thread may be stuck in ilcs_execute_function_ex? Any other idea off the top of your head?

@pelwell
Copy link
Contributor

pelwell commented Nov 5, 2015

Sorry, no - yesterday was my first real skirmish with the OpenMAX code.

My suggestion would be to fetch the userland repo, modify buildme or the CMake files to build a debug version (this is easily buildable on the Pi), and copy the contents of build/lib into /opt/vc/lib. That way, the backtrace should at least tell you which line of the source it was on.

popcornmix added a commit that referenced this issue Nov 6, 2015
…m edges

See: #463

firmware: di_adv: Add config setting to add nop delays to shader
See: http://forum.kodi.tv/showthread.php?tid=231092&pid=2150605#pid2150605

firmware: vcilcs: Avoid a potential deadlock when very threaded
See: #449

firmware: vrf: Add spinlock around vrf acquire/release calls to avoid restoring an invalid p10 from ISR context

firmware: rpi_display: only ratelimit if the backlight is actually changed
See: raspberrypi/linux#1179

firmware: di_adv: Support multiple instances of qpu deinterlace at SD resolution
See: popcornmix/omxplayer#386

linux: rpi-ft5406: Use interruptible sleep to avoid high load reported
See: https://www.raspberrypi.org/forums/viewtopic.php?f=28&t=125034

linux: dts: Added overlay for Adafruit PiTFT 2.8 capacitive touch screen
See: raspberrypi/linux#1192

linux: config: Add MCP320X
See: raspberrypi/linux#1189

linux: Build i2c_gpio module and add a device tree overlay to configure it
See: raspberrypi/linux#1183
popcornmix added a commit to Hexxeh/rpi-firmware that referenced this issue Nov 6, 2015
…m edges

See: raspberrypi/firmware#463

firmware: di_adv: Add config setting to add nop delays to shader
See: http://forum.kodi.tv/showthread.php?tid=231092&pid=2150605#pid2150605

firmware: vcilcs: Avoid a potential deadlock when very threaded
See: raspberrypi/firmware#449

firmware: vrf: Add spinlock around vrf acquire/release calls to avoid restoring an invalid p10 from ISR context

firmware: rpi_display: only ratelimit if the backlight is actually changed
See: raspberrypi/linux#1179

firmware: di_adv: Support multiple instances of qpu deinterlace at SD resolution
See: popcornmix/omxplayer#386

linux: rpi-ft5406: Use interruptible sleep to avoid high load reported
See: https://www.raspberrypi.org/forums/viewtopic.php?f=28&t=125034

linux: dts: Added overlay for Adafruit PiTFT 2.8 capacitive touch screen
See: raspberrypi/linux#1192

linux: config: Add MCP320X
See: raspberrypi/linux#1189

linux: Build i2c_gpio module and add a device tree overlay to configure it
See: raspberrypi/linux#1183
@popcornmix
Copy link
Contributor

@pelwell's patch is in latest rpi-update libs.

@carlonluca
Copy link
Author

Thanks @popcornmix, I'm already testing it and I see there is a improvement in the second situation, but just like @pelwell said, the "seek test" still seems to manifest issues. But something may have changed, I'm following @pelwell's advice and I'll try to provide some more info assuming you are interested.

@pelwell
Copy link
Contributor

pelwell commented Nov 7, 2015

We're always interested, it's just a matter of time and priorities.

@carlonluca
Copy link
Author

I perfectly understand time is a limited resource and that this issue may not be affecting many users as not exactly a "regular" use. I really appreciate your help.

However I'm running more tests and what I can see is that the first of the two tests is now far better, but I can confirm I can lock the "seek test". This is the backtrace after a lockup that includes more info related to the userland libs: http://pastebin.com/Ek5P4y5q. What I see is that the situation is somehow similar, but we are probably in another case (vcos_thread_current() != &st->thread). Please note that in this case there is no mutex on my side before calling openmax. Not sure if this is relevant, but as you can see the main thread is itself also calling openmax to get the current position, in addition to the video and audio decoders. Also the media decoding thread (thread 13 in the backtrace), which is more or less what omxplayer.cpp does, may be calling openmax concurrently. Maybe I should be synchronising all these calls?

As you probably read above, my code somehow integrates into omxplayer code and is built over it. I therefore enabled the logs from omxplayer code and this is the last portion of 130MB of logs: http://pastebin.com/Mv6idTHA. Not sure if these can be of any use.

Of course this again can be a bug in my code, but it is difficult to say. Do you guys have any other advice?

@pelwell
Copy link
Contributor

pelwell commented Nov 7, 2015

That backtrace looks like the ARM is waiting for responses from the VPU, while the server thread is ready to receive such responses. In other words, I think the problem is likely to be in the VPU.

@carlonluca
Copy link
Author

Thanks for your help. Is there anything I can do to somehow collect more information about this? May be a bug in my code and more information may help finding the bug... Can I debug the issue in any way? Any other advice?

@pelwell
Copy link
Contributor

pelwell commented Apr 12, 2016

Increasing the limit is easy and can probably go in today. Apart from using slightly more memory it should have no negative consequences, and you've already tested it fairly thoroughly.

Adding some sort of diagnostic output when the limit is reached needs more care. If I post a small patch, as before, would you be happy to test it with a reduced limit?

@nenadlazovic
Copy link

ok.
pls release increased vchiu_queue (256) asap.
send us a patch for diagnostic output and I can test it.
thanks

@pelwell
Copy link
Contributor

pelwell commented Apr 12, 2016

Try this patch:

diff --git a/interface/vmcs_host/vcilcs.c b/interface/vmcs_host/vcilcs.c
index 75ba1f5..73b761e 100644
--- a/interface/vmcs_host/vcilcs.c
+++ b/interface/vmcs_host/vcilcs.c
@@ -385,6 +385,22 @@ static int ilcs_callback(VCHIQ_REASON_T reason, VCHIQ_HEADER_T *header, void *se
 #endif

    case VCHIQ_MESSAGE_AVAILABLE:
+#ifndef _VIDEOCORE
+      {
+        static int queue_warn = 0;
+        int queue_len = st->queue.write - st->queue.read;
+        if (!queue_warn)
+           queue_warn = st->queue.size/2;
+        if (queue_len >= queue_warn)
+        {
+           if (queue_len == st->queue.size)
+              VCOS_ALERT("ILCS queue full");
+           else
+              VCOS_ALERT("ILCS queue len = %d", queue_len);
+           queue_warn = queue_warn + (st->queue.size - queue_warn)/2;
+        }
+      }
+#endif
       vchiu_queue_push(&st->queue, header);
       break;

You will get a series of messages to stderr as you reach and exceed each warning level, starting at the halfway mark:

ILCS queue len = 128
ILCS queue len = 192
ILCS queue len = 224
ILCS queue len = 240
ILCS queue len = 248
ILCS queue len = 252
ILCS queue len = 254
ILCS queue len = 255
ILCS queue full
ILCS queue full

The queue len = messages only appear once for each threshold, but the full message is repeated to increase the chance that it gets seen in the event of a hang.

popcornmix added a commit that referenced this issue Apr 12, 2016
See: https://www.raspberrypi.org/forums/viewtopic.php?f=29&t=144087#p950902

bootcode: Ensure LED is switched off after halt on Pi3

firmware: vcilcs: Increase ILCS queue size to prevent(?) deadlock
See: #449
popcornmix added a commit to Hexxeh/rpi-firmware that referenced this issue Apr 12, 2016
See: https://www.raspberrypi.org/forums/viewtopic.php?f=29&t=144087#p950902

bootcode: Ensure LED is switched off after halt on Pi3

firmware: vcilcs: Increase ILCS queue size to prevent(?) deadlock
See: raspberrypi/firmware#449
@nenadlazovic
Copy link

Hi @pelwell,
i tested and I am getting behavior exactly as expected: at the time when log shows "ILCS queue full", my app hangs (see 2 cases below).

So I am ok with both improvements you made

  • vchiu_queue increase to 256
  • logging vchiu_queue usage (maybe to put warning threshold to 75%)

Whenever you can, pls release firmware with both improvements. THANKS.

Case 1
qml: Video, playMedia: mo.id: mediaObject_2_42_56_608_88
qml: Video, stopMedia: mo.id: mediaObject_2_42_53_821_11
ILCS queue len = 32
ILCS queue len = 48
ILCS queue len = 56
ILCS queue len = 60
ILCS queue len = 62
ILCS queue len = 63
ILCS queue full

Case 2
qml: Video, playMedia: mo.id: mediaObject_2_43_3_831_7
06:47:41.851 INFO: Seek to: 00:00:00
06:47:41.868 INFO: Stop command issued.
06:47:41.871 INFO: Seek to: 00:00:00
06:47:41.872 INFO: Stop command issued.
qml: Video, playMedia: mo.id: mediaObject_2_42_56_608_88
06:47:56.901 INFO: Seek to: 00:00:00
qml: Video, stopMedia: mo.id: mediaObject_2_42_53_821_11
ILCS queue len = 32
ILCS queue len = 48
ILCS queue len = 56
ILCS queue len = 60
ILCS queue len = 62
ILCS queue len = 63
ILCS queue full

@nenadlazovic
Copy link

@pelwell, thanks for adding logging to show usage state of vchiu_queue.
Is there a plan to release this improvement ?
Any rough estimate on when ?

THANKS.

@pelwell
Copy link
Contributor

pelwell commented Apr 16, 2016

It will probably go out this weekend.

popcornmix added a commit that referenced this issue Apr 17, 2016
kernel: Add Support for BoomBerry Audio boards
See: raspberrypi/linux#1397

kernel: Add support for the Digital Dreamtime Akkordion music player
See: raspberrypi/linux#1406

kernel: Add support for mcp7940x family of RTC
See: raspberrypi/linux#1397

firmware: vcilcs: Warn as message queue approaches fullness
See: #449

firmware: dtoverlay: Copy overrides before applying
firmware: dtmerge: Pack the merged DTB before writing
firmware: arm_ldconfig: Fix detection of kernel8.img
firmware: arm_loader: Enable DT by default, read addresses back from stub
See: #579

firmware: ldconfig: Add [none] section as a convenience as config.txt filter

firmware: pwm_sdm: Bugfixes
See: https://www.raspberrypi.org/forums/viewtopic.php?f=29&t=136445

firmware: gencmd: Add command to read current and historical throttled state
popcornmix added a commit to Hexxeh/rpi-firmware that referenced this issue Apr 17, 2016
kernel: Add Support for BoomBerry Audio boards
See: raspberrypi/linux#1397

kernel: Add support for the Digital Dreamtime Akkordion music player
See: raspberrypi/linux#1406

kernel: Add support for mcp7940x family of RTC
See: raspberrypi/linux#1397

firmware: vcilcs: Warn as message queue approaches fullness
See: raspberrypi/firmware#449

firmware: dtoverlay: Copy overrides before applying
firmware: dtmerge: Pack the merged DTB before writing
firmware: arm_ldconfig: Fix detection of kernel8.img
firmware: arm_loader: Enable DT by default, read addresses back from stub
See: raspberrypi/firmware#579

firmware: ldconfig: Add [none] section as a convenience as config.txt filter

firmware: pwm_sdm: Bugfixes
See: https://www.raspberrypi.org/forums/viewtopic.php?f=29&t=136445

firmware: gencmd: Add command to read current and historical throttled state
popcornmix pushed a commit to raspberrypi/userland that referenced this issue Apr 17, 2016
popcornmix pushed a commit to raspberrypi/userland that referenced this issue Apr 17, 2016
@popcornmix
Copy link
Contributor

Latest rpi-update optionally includes the warnings.
Set env variable ILCS_WARN=1 to enable.

XECDesign pushed a commit to RPi-Distro/firmware that referenced this issue May 4, 2016
See: https://www.raspberrypi.org/forums/viewtopic.php?f=29&t=144087#p950902

bootcode: Ensure LED is switched off after halt on Pi3

firmware: vcilcs: Increase ILCS queue size to prevent(?) deadlock
See: raspberrypi#449
XECDesign pushed a commit to RPi-Distro/firmware that referenced this issue May 4, 2016
kernel: Add Support for BoomBerry Audio boards
See: raspberrypi/linux#1397

kernel: Add support for the Digital Dreamtime Akkordion music player
See: raspberrypi/linux#1406

kernel: Add support for mcp7940x family of RTC
See: raspberrypi/linux#1397

firmware: vcilcs: Warn as message queue approaches fullness
See: raspberrypi#449

firmware: dtoverlay: Copy overrides before applying
firmware: dtmerge: Pack the merged DTB before writing
firmware: arm_ldconfig: Fix detection of kernel8.img
firmware: arm_loader: Enable DT by default, read addresses back from stub
See: raspberrypi#579

firmware: ldconfig: Add [none] section as a convenience as config.txt filter

firmware: pwm_sdm: Bugfixes
See: https://www.raspberrypi.org/forums/viewtopic.php?f=29&t=136445

firmware: gencmd: Add command to read current and historical throttled state
@popcornmix
Copy link
Contributor

@carlonluca Okay to close this?

@carlonluca
Copy link
Author

Yes, I think you can close this, thanks.

@atic-atac
Copy link

I'm getting exactly these symptoms right now, even with the latest firmware.

@popcornmix
Copy link
Contributor

Many different bugs in the calling code can cause this symptom. Yours is probably not related to this github issue.
If you believe there is a bug in the firmware, then provide a minimal test app that shows the problem (as a new github issue) and we'll try to help.

@atic-atac
Copy link

You mention bugs in the calling code, can you give any hints so I can check ?

neuschaefer pushed a commit to neuschaefer/raspi-binary-firmware that referenced this issue Feb 27, 2017
…m edges

See: raspberrypi#463

firmware: di_adv: Add config setting to add nop delays to shader
See: http://forum.kodi.tv/showthread.php?tid=231092&pid=2150605#pid2150605

firmware: vcilcs: Avoid a potential deadlock when very threaded
See: raspberrypi#449

firmware: vrf: Add spinlock around vrf acquire/release calls to avoid restoring an invalid p10 from ISR context

firmware: rpi_display: only ratelimit if the backlight is actually changed
See: raspberrypi/linux#1179

firmware: di_adv: Support multiple instances of qpu deinterlace at SD resolution
See: popcornmix/omxplayer#386

linux: rpi-ft5406: Use interruptible sleep to avoid high load reported
See: https://www.raspberrypi.org/forums/viewtopic.php?f=28&t=125034

linux: dts: Added overlay for Adafruit PiTFT 2.8 capacitive touch screen
See: raspberrypi/linux#1192

linux: config: Add MCP320X
See: raspberrypi/linux#1189

linux: Build i2c_gpio module and add a device tree overlay to configure it
See: raspberrypi/linux#1183
neuschaefer pushed a commit to neuschaefer/raspi-binary-firmware that referenced this issue Feb 27, 2017
See: https://www.raspberrypi.org/forums/viewtopic.php?f=29&t=144087#p950902

bootcode: Ensure LED is switched off after halt on Pi3

firmware: vcilcs: Increase ILCS queue size to prevent(?) deadlock
See: raspberrypi#449
neuschaefer pushed a commit to neuschaefer/raspi-binary-firmware that referenced this issue Feb 27, 2017
kernel: Add Support for BoomBerry Audio boards
See: raspberrypi/linux#1397

kernel: Add support for the Digital Dreamtime Akkordion music player
See: raspberrypi/linux#1406

kernel: Add support for mcp7940x family of RTC
See: raspberrypi/linux#1397

firmware: vcilcs: Warn as message queue approaches fullness
See: raspberrypi#449

firmware: dtoverlay: Copy overrides before applying
firmware: dtmerge: Pack the merged DTB before writing
firmware: arm_ldconfig: Fix detection of kernel8.img
firmware: arm_loader: Enable DT by default, read addresses back from stub
See: raspberrypi#579

firmware: ldconfig: Add [none] section as a convenience as config.txt filter

firmware: pwm_sdm: Bugfixes
See: https://www.raspberrypi.org/forums/viewtopic.php?f=29&t=136445

firmware: gencmd: Add command to read current and historical throttled state
@dimovnike
Copy link

dimovnike commented Apr 14, 2017

Hello, what configs are necessary to activate thew "ILCS queue full" logs in the latest debian jessie? Is setting env var ILCS_WARN=1 enough?

The firmware version i have is:

raspberrypi-firmware (1.20170303-1) jessie; urgency=medium

  * firmware as of b51046a2b2bb69771579a549d157205d9982f858

 -- Serge Schneider <[email protected]>  Fri, 03 Mar 2017 14:30:23 +0000

@popcornmix
Copy link
Contributor

Yes, ILCS_WARN should be supported with that firmware

@dimovnike
Copy link

Thanks for the answer. We are having a very similar issue (backtrace attached) but its very hard to reproduce. It just happens once in 2-3 days on a random PI (we are running aboout 30 pi's ). However, I enabled ILCS_WARN but there is no sign of queue being full. When this happens. the application can not be restarted and full system restart is needed (the second backtrace is taken after the restart when the app is stuck).

Any idea how to debug this? What extra logs I can activate in order to gather more info about it?

backtrace.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants