Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

starting omxplayer in a while true loop fails after some time #181

Closed
turingmachine opened this issue Apr 2, 2014 · 33 comments
Closed

starting omxplayer in a while true loop fails after some time #181

turingmachine opened this issue Apr 2, 2014 · 33 comments

Comments

@turingmachine
Copy link

Given the following bash script:

while true; do omxplayer -g /media/test15s.mp4; done

After some iterations (between 5 and 100) the screen stays black.

That's the console output on failure:

....
....
V:PortSettingsChanged: [email protected] interlace:0 deinterlace:0 par:1.00 layer:0
have a nice day ;)
Video codec omx-h264 width 1280 height 720 profile 100 fps 25.000000
Subtitle count: 0, state: off, index: 1, delay: 0
V:PortSettingsChanged: [email protected] interlace:0 deinterlace:0 par:1.00 layer:0
have a nice day ;)
Video codec omx-h264 width 1280 height 720 profile 100 fps 25.000000
Subtitle count: 0, state: off, index: 1, delay: 0
V:PortSettingsChanged: [email protected] interlace:0 deinterlace:0 par:1.00 layer:0
have a nice day ;)
Video codec omx-h264 width 1280 height 720 profile 100 fps 25.000000
Subtitle count: 0, state: off, index: 1, delay: 0
V:PortSettingsChanged: [email protected] interlace:0 deinterlace:0 par:1.00 layer:0
have a nice day ;)
Video codec omx-h264 width 1280 height 720 profile 100 fps 25.000000
Subtitle count: 0, state: off, index: 1, delay: 0
V:PortSettingsChanged: [email protected] interlace:0 deinterlace:0 par:1.00 layer:0

omxplayer and omxplayer.bin are running. The omxplayer.log only contains a correct shutdown sequence of the last run

15:35:12 T:130463960   DEBUG: Normal M:14930169 (A:0 V:14960000) P:0 A:-14.93 V:0.03/T:0.20 (0,0,1,0) A:0% V:0% (0.00,0.00)
15:35:12 T:130488243   DEBUG: Normal M:14953985 (A:0 V:14960000) P:0 A:-14.95 V:0.01/T:0.20 (0,0,1,0) A:0% V:0% (0.00,0.00)
15:35:12 T:130509153   DEBUG: Normal M:14975404 (A:0 V:14960000) P:0 A:-14.98 V:-0.02/T:0.20 (0,1,1,0) A:0% V:0% (0.00,0.00)
15:35:12 T:130509600    INFO: COMXVideo::IsEOS
15:35:12 T:130509767   DEBUG: OMXClock::OMXStop
15:35:12 T:130518024   DEBUG: OMXThread::Run - Exited thread with  id -1311849376
15:35:12 T:130519617   DEBUG: OMXThread::StopThread - Thread stopped
15:35:12 T:130521346   DEBUG: OMXThread::Run - Exited thread with  id -1303460768
15:35:12 T:130522045   DEBUG: OMXThread::StopThread - Thread stopped
15:35:12 T:130541511   DEBUG: COMXCoreComponent::Deinitialize : OMX.broadcom.video_scheduler handle 0xc927d0
15:35:12 T:130566078   DEBUG: COMXCoreComponent::Deinitialize : OMX.broadcom.video_decode handle 0x7a93f0
15:35:12 T:130569519   DEBUG: COMXCoreComponent::Deinitialize : OMX.broadcom.video_render handle 0x7ad8d0
15:35:12 T:130608555   DEBUG: COMXCoreComponent::Deinitialize : OMX.broadcom.text_scheduler handle 0x802980
15:35:12 T:130628912   DEBUG: OMXThread::Run - Exited thread with  id -1244167072
15:35:12 T:130629609   DEBUG: OMXThread::StopThread - Thread stopped
15:35:12 T:130633560   DEBUG: COMXCoreComponent::Deinitialize : OMX.broadcom.clock handle 0x7a4ba0

attaching to the omxplayer.bin process with strace -f -F -p shows:

# strace -f -F -p 5422
Process 5422 attached with 8 threads - interrupt to quit
[pid  5429] futex(0x1474bbc, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid  5428] futex(0x1474a2c, FUTEX_WAIT_PRIVATE, 3935, NULL <unfinished ...>
[pid  5427] futex(0xb6e72780, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid  5426] futex(0xb6e72684, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid  5425] futex(0xb6e73508, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid  5424] ioctl(5, 0xc014c407 <unfinished ...>
[pid  5423] read(0,  <unfinished ...>
[pid  5422] futex(0xb5d614c8, FUTEX_WAIT, 5423, NULL
@alimov
Copy link

alimov commented Apr 3, 2014

Hi i have same or similar problem

running loop script :

while /bin/true
do
    for vid in /media/data/vids/*
    do
        /usr/bin/omxplayer  -o $sound -p $vid >> /var/log/omxplayer.log
    done
done

running video with h264 codec an HD resolution

sometimes on tv is only black screen but omxplayer process running
ps axf
8356 pts/0 Sl+ 0:24 _ /usr/bin/omxplayer.bin -o hdmi -p /media/data/video1.m4v

i running omxplayer on sceen. After load screen omxplayer say:

/usr/bin/omxplayer: line 67: 30400 Aborted
LD_LIBRARY_PATH="$OMXPLAYER_LIBS${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}" $OMXPLAYER_BIN "$@"

@Avoncliff
Copy link

Hi I am also bugged by this problem.
I observe that while the loop is running I have an ever increasing number of tasks shown in ps ax they all look like:
5274 ? Ss 0:00 dbus-daemon --fork --print-address 5 --print-pid 6 --
I think there is one of these left for each loop .
Hope this is of some help.

@turingmachine
Copy link
Author

@Avoncliff are you sure you use a recent version? This looks like a bug in dbus-daemon handling that is resolved by now.

@Avoncliff
Copy link

Only current binary from apt, so if that is now fixed sorry to trouble you. I have resolved my problem, by reverting to a very old version that loops for at least days on end with no problem. Again not sure of a version number as it did not have -v command, but at least Jan 2013 old.
Thanks for the efforts.

@popcornmix
Copy link
Owner

@Avoncliff You should try a recent build from http://omxplayer.sconde.net

@Tito1337
Copy link

Tito1337 commented May 4, 2014

I can confirm this bug, it is very annoying.

omxplayer reads the video till the end but then fails to exit, so the loop can't go on. Killing it works though, and the loop resumes as usual.

I tried everything (last release from http://omxplayer.sconde.net, firmware update, removing USB devices, disabling overclocking...) without results. I had to implement a crude watchdog that kills any ill-behaving omxplayer.bin

@Tito1337
Copy link

Tito1337 commented May 4, 2014

So I've tried nearly every build from http://omxplayer.sconde.net

Between these two commits ( 38f05ee...c0dd950 ), there is fd8fe4a : "Add high level locking to OMXVideo". I don't know if it's the right one but It sure looks like a good place to start digging.

I'm afraid my skills pretty much stop there, I hope this helps and that somebody will figure this out. If necessary, I'm able to compile intermediate versions to narrow it down a bit more.

For everybody else having this issue, revert to 0.3.2git2013102338f05ee or less

@turingmachine
Copy link
Author

I can confirm that 38f0ee is not affected. Affected versions can run up 300 iterations of the wile true loop until they break.

I've built a recent version (46616c5) that got fd8fe4a reverted and the issue still persisted.

ec440e9 looks very obscure to me, @popcornmix can you shed some light on what changes are introduces in this commit?

@Tito1337
Copy link

Tito1337 commented May 7, 2014

I compiled many versions between 38f05ee and c0dd950. I'm now pretty sure the bug was introduced by 5419655 !

Note that I conducted my tests without screen connected to my Raspberries, so it used the composite output. Reading the commit diff, I wonder if the issue will be the same with HDMI.

EDIT : Later versions don't lock up as quickly as 5419655. Maybe I've spoken too fast.

@Tito1337
Copy link

Tito1337 commented May 7, 2014

I have spoken too fast. See issues #124 and #12 and commit e239f05.

But there still is an issue because 38f05ee is 100% not affected.

e239f05 corrects the most frequent bug introduced by 5419655, but there still is something wrong between 38f05ee and now. Maybe it's another bug of 5419655, maybe of a later commit...

@popcornmix
Copy link
Owner

can you apply e239f05 to tree at 5419655?
I assume that will be better than 5419655, but is it worse than commit before 5419655 ?

@Tito1337
Copy link

Tito1337 commented May 7, 2014

I did exactly what you asked* and I can confirm that it is better than 5419655 but still worse than 38f05ee. The behaviour is consistent with current master branch : it hangs after a little less than 300 iterations (I've seen 38f05ee work for more than 1500 iterations before I stopped it)

So my working theory is that 5419655 is still responsible, and while e239f05 surely helps it doesn't fix everything...

*Procedure for somebody else willing to test :
git diff 22ed64d e239f05 > e239f05.diff
git checkout 5419655
patch < e239f05.diff

If needed, I can upload binaries somewhere.

@popcornmix
Copy link
Owner

If you are able to produce a patch that applies to head of tree that restores the 38f05ee reliability I'd be interested. I'll try to have a look at some point, but too busy right now.

@Tito1337
Copy link

Tito1337 commented May 7, 2014

Yeah I understand you have better things to do. I'll continue my investigations, share my findings here, and hopefully I or somebody else will be able to produce a patch ;-)

@Tito1337
Copy link

Tito1337 commented May 8, 2014

While hunting for the bug I tried some good ol' printf debugging... omxplayers hangs at the very end of the exit procedure, while doing

 m_keyboard->Close();

The simple solution (to avoid that line being executed) is to launch omxplayer with --no-keys option. I've attained more than 3000 iterations without hang ! (working with 0.3.5git2014040946616c5_armhf.deb, but I'm sure it will work with any version later than e239f05)

I have no clue as to why this simple action works 300 times but not the 301th. I'm also scratching my head on why the bug didn't affect 38f05ee and what differences did 5419655 make to the keyboard handling. Maybe some weird memory leak, kernel or firmware bug ?

tl;dr : found solution, use --no-keys.

@turingmachine
Copy link
Author

@Tito1337: Awesome!

@popcornmix
Copy link
Owner

I don't understand the connection of keyboard code and 5419655.
Can you trace it down any finer? (i.e. where in Keyboard::Close() it hangs?)

@Tito1337
Copy link

Tito1337 commented May 8, 2014

I'm still investigating with printf debugging :D So I've traced it to the Keyboard thread : when everything should be shutting down at the end, it calls StopThread() and the line

pthread_join(m_thread, NULL);

that waits for the thread to finish, hangs, probably because of a lock that never gets released.

I'm no expert but I think we face a race condition :

Race conditions have a reputation of being difficult to reproduce and debug, since the end result is nondeterministic and depends on the relative timing between interfering threads. Problems occurring in production systems can therefore disappear when running in debug mode, when additional logging is added, or when attaching a debugger, often referred to as a "Heisenbug".

Maybe the simple order or timing differences introduced in 5419655 made this bug appear.

I tried to use valgrind (with --tool=helgrind) to check for thread lockups but there is a known, wontfix, incompatibility with Raspbian.

At this point I still don't know how to find where the lock is created but I would put my money on DBUS. I'll keep you updated if I find something

@popcornmix
Copy link
Owner

I think the valgrind failure can be avoided by bypassing the accelerated memcpy library.
Comment out (with a #)

/usr/lib/arm-linux-gnueabihf/libcofi_rpi.so

in /etc/ld.so.preload

gdb might be a better tool for debugging this type of hang - you can run "thread apply all bt" to get the backtrace of all threads. You will find keyboard thread is blocked calling something...

@Tito1337
Copy link

Tito1337 commented May 8, 2014

Thanks for you help, if you hadn't guessed I never used valgrind or gdb before !

Your valgrind workaround seems to work, but it is very slow so I don't know if I will ever obtain one of those problematic states.

I also tried with gdb as you suggested but there seem to be a similar issue :

Program received signal SIGILL, Illegal instruction.
0xb5c45600 in ?? () from /usr/lib/arm-linux-gnueabihf/libcrypto.so.1.0.0

Is there a workaround for that one?

EDIT : found one

(gdb) handle SIGILL nostop

magdesign referenced this issue in magdesign/PocketVJ May 21, 2014
updated to work with actual omxplayer, since the keyboard caused to crash omxplayer[[https://github.com/popcornmix/omxplayer/issues/181|issue 181]]
@andyjagoe
Copy link

@Tito1337 Thanks very much for your work on this. Omxplayer stability has been a persistent issue for me with omxplayer builds after July 2013 (when it was very reliable and could run in a loop for weeks on end)...but I have not been able to isolate and resolve the problem.

Have you learned anything more about the issue?

Recent builds have been more difficult...and using them my application will now frequently hang Videocore. Jamesh commented on my forum post (http://www.raspberrypi.org/forums/viewtopic.php?t=77808) that:

"Videocore has run out of memory, something is allocating and not releasing. Difficult to tell if it's Videocore side or ARM side (The ARM can do stuff that causes Videocore allocations, but in those circumstances the ARM also need to deallocate)."

I think what's happening is that omxplayer does not shut down cleanly (or at all) after playing a video in a number of situations. Sending SIGINT works in some cases to get omxplayer to shutdown/return...but not all...and then the only thing that works is sending a SIGKILL. I suspect the cumulative effect is Videocore hanging.

@Tito1337
Copy link

Like I said before, the hang is due to a lock in the thread listening to keyboard inputs. I don't know what circumstances lock the thread but there is a simple solution : disable the keyboard listener by using the --no-keys option.

My investigations are totally stuck as I can't pinpoint the exact origin of the deadlock, but the --no-keys solutions works 100% for the bug I was chasing so maybe the VideoCore issue is another bug

@Bruddy
Copy link

Bruddy commented Jun 6, 2014

This is my first post so apologies if I don't understand this fully. I have the same problem of Omxplayer stopping from time to time when called from a bash script. I have tried the command "omxplayer --no-keys file.mp4" and I get an error of "unrecognised option"
Can anyone help?
Bruddy

@Tito1337
Copy link

Tito1337 commented Jun 6, 2014

Hi Bruddy, what version of omxplayer do you use? All recent versions should
have the --no-keys option.

You can download the latest build (in the form of a .deb package) from
http://omxplayer.sconde.net and then install it on your Raspberry Pi :
$ sudo dpkg -i omxplayer_*.deb

(Note that this thread is for talking about this specific bug, not general support, so keep it clean ;-))

@Bruddy
Copy link

Bruddy commented Jun 6, 2014

That may be my problem. I am using Raspbian and apt-get install and it says that I have the latest version but I presume their repositories are probably not fully up to date?

@popcornmix
Copy link
Owner

@Bruddy
raspbian only updates occasionally, so installing a deb file from sconde is recommended if you need an up to date version.

@Tito1337
Copy link

@lampone1967 This is not the right place.

Your issue is due to omxplayer-sync, not omxplayer itself. You can contact the original developper here : https://github.com/turingmachine/omxplayer-sync/issues?labels=question

@revolunet
Copy link

unfortunately the --no-keys option doesnt work here with Version : 6ee9a0a [master].
I have a simple 25sec AVI video, and it freezes randomly at the end.
Any idea ?

@justinasjaronis
Copy link

I face similar problem. I am trying to play MP4 video and it hangs in the end. The latest working build for me is http://omxplayer.sconde.net/builds/omxplayer_0.3.0~git20130729~efd1049_armhf.deb

@jehutting
Copy link

@revolunet : what do you exactly mean with "unfortunately the --no-keys option doesnt work here with Version : 6ee9a0a [master]."
If you mean that with the --no-keys option it still freezes at the end, I would be interested in the video.

@justinasjaronis: hope the modification and commit for issue 266 works for you. Otherwise I would be interested in the video, too.

@revolunet
Copy link

Hey there,
Yes, the --no-keys option doesnt help, still freezing :/
Tried various file formats and in the end i had to create a video composed of 50x sequences which reduces the risk 50x times :)
Send me an email to [email protected] and i'll send you the faulty video.
Thanks !

@OMID-313
Copy link

OMID-313 commented May 5, 2016

Dear @popcornmix and @turingmachine and @Tito1337 ,
Did you found a stable solution to this freeze problem !?

@jehutting
Copy link

@OMID-313 Have a look at #437. This thread is newer and still under investigation.
Only if you have this freezing issue with OMXPlayer as a stand-alone program (and NOT when it runs as part of omxd) I am interested. Please post (in #437) your info. (An omxplayer.log or sample file are welcome).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

12 participants