Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python interpreter hangs on exit when using 'AUTO' thread_type on Linux #909

Closed
Breakthrough opened this issue Mar 13, 2022 · 7 comments
Closed
Labels

Comments

@Breakthrough
Copy link

Breakthrough commented Mar 13, 2022

Overview

Setting thread_type = 'AUTO' prevents the Python interpreter from gracefully exiting and can hang indefinitely. I have reproduced the issue with PyAV version 8 and 9, on Ubuntu 16.04, 18.04, and 20.04 (see reproduction below). I also believe the issue affects the threading example in the cookbook, which makes no mention of the possible issues nor workarounds that exist.

I have not been able to reproduce the issue on Windows, and I am unable to test this on OSX.

Expected behavior

PyAV should not prevent the program from terminating when using AUTO threading mode, or if this is unavoidable, the threading example in the cookbook should better indicate that this can happen and how it can be mitigated.

The caveats section implies this issue only affects subinterpreters, but I'm not sure that's the case. I'm also curious if it would be possible to disable logging by default when PyAV is shutting down which would likely resolve most user facing issues.

Actual behavior

Any application using PyAV on Linux can occasionally hang on termination when using thread_type = 'AUTO' so long as at least one frame has been decoded.

Reproduction

Verified using the following minimum working example:

import av
print('Starting test.')
container = av.open('video.mp4')
container.streams.video[0].thread_type = 'AUTO'
# Have to decode at least one frame to trigger the bug.
next(container.decode(video=0))
print('Test complete.')

# No longer hangs after uncommenting the following line:
#av.logging.restore_default_callback()

Running this in a loop via bash will output something like:

$ while true; do python3 test_av_hang.py; done
Starting test.
Test complete.
Starting test.
Test complete.
Starting test.
Test complete.

The output will then stop suddenly.

I discovered the issue on Travis CI, and was able to reproduce locally in both a VM and on an actual Ubuntu distro. I had difficulty reproducing the issue with only a single-core VM, but it seems to fail rather quickly on any multicore Linux machine. Adding av.logging.restore_default_callback() at the end of the program resolves the issue.

This issue also occurs with the threading example in the cookbook, which makes no mention of this problem.

The caveats page implies this only affects sub-interpreters so I initially overlooked that during my initial analysis. Disabling logging before the program terminates seems to resolve the issue (e.g. calling av.logging.restore_default_callback() before the program terminates). Thus I'd like to ask, is it possible to automatically do this when PyAV is being destroyed, or can a timeout be set in the event things deadlock? Or can the documentation be updated to reflect that this can happen in normal usage, not just subprocesses/subinterpreters?

If my understanding of this is incorrect I do apologize.

Versions

  • OS: Ubuntu 16.04, 18.04, 20.04. Unable to reproduce on Windows. Did not test OSX.
  • Python: Reproducible on 3.6, 3.7, 3.8, and 3.9.
  • PyAV runtime: tested both 8 and 9 with binary wheels (default from using pip install av as this is what most end users of the project will do)

P.S. Thank you all for the awesome project!

@Breakthrough
Copy link
Author

Breakthrough commented Mar 13, 2022

I also saw the caveats section on subinterpreters, but I overlooked that initially because I thought that only applied to subinterpreters. I can also confirm that using a BytesIO object definitely does cause similar lockups. Disabling logging seems to help, thus I'd like to ask, why isn't this the library default?

I also think this behaviour should be documented in the threading example in the cookbook, as there was no mention of this issue there, and I believe it is reproducible using that same code sample.

Thank you.

@jlaine
Copy link
Member

jlaine commented Mar 27, 2022

Could this be related to #751?

@Breakthrough
Copy link
Author

I think so - one thing I wasn't able to figure out is why using a file handle makes it more likely to encounter the lockups. Will close this for now in lieu of #751, thanks for the link.

@jlaine
Copy link
Member

jlaine commented Mar 27, 2022

OK thanks for reporting back. Before we leave this issue: I was not able to trigger your bug with your minimal reproduction on Debian with an mp4 file I have. I suspect it's because it's not triggering the "right" log output to hang, do you have a file (ideally in the FATE suite) which I can use?

@Breakthrough
Copy link
Author

Interesting, that might explain why some users may not encounter this issue. I was using the following clip:
https://github.com/Breakthrough/PySceneDetect/blob/resources/tests/resources/goldeneye.mp4

@jlaine
Copy link
Member

jlaine commented Mar 27, 2022

Interestingly I'm not seeing a hang if I ensure the container is closed cleanly!!

import av

print('Starting test.')
with av.open('video.mp4') as container:
    container.streams.video[0].thread_type = 'AUTO'
    # Have to decode at least one frame to trigger the bug.
    next(container.decode(video=0))
print('Test complete.')

@Breakthrough
Copy link
Author

Breakthrough commented Mar 28, 2022

Wow interesting - I tried just setting container to None or calling del on it... I could swear I tried calling close manually before but I guess I didn't test that correctly. Thank you so much for that, I'll make sure to do that.

I think container should call close() in it's __del__ in addition to __exit__, which will also fix the hang in the cookbook examples.

Edit: Submitted a PR. Thank you so much for looking into this by the way!

Breakthrough added a commit to Breakthrough/PySceneDetect that referenced this issue Mar 28, 2022
Breakthrough added a commit to Breakthrough/PyAV that referenced this issue Mar 28, 2022
Currently, close() is only called when using a context manager. This can
demonstrably lead to hangs when using AUTO threading mode as
shown in the cookbook example.

As @jlaine discovered in PyAV-Org#909, calling `close()` on the Container
seems to resolve the issue, likely by making sure the threads have
been joined by the time __dealloc__ runs.
vovamushak added a commit to vovamushak/PySceneDetect that referenced this issue Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants