Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock when staring sdrplay3 and sdrplay_api_server in state "ApiVersion Error: sdrplay_api_ServiceNotResponding" #14

Open
kirat68 opened this issue Oct 23, 2021 · 6 comments
Assignees

Comments

@kirat68
Copy link

kirat68 commented Oct 23, 2021

Olá Franco,

Here an other deadlock. Opened a new issue because it is not related to the first deadlock I opened days ago.
To understand how I get in this situations. I am building a server that will control SDRs. As server it should stays online all the time and absolutely avoid crashes or deadlocks. That's why I am testing intensively start-stop-delete of sdrs. The first deadlock I mentioned days ago was occurring between1 and 100 cycles. This one occurred after 1317 cycles. Back to the problem.

From time to time sdrplay_api stops and responds "ApiVersion Error: sdrplay_api_ServiceNotResponding". Seems to be in this state, when trying to call sdrplay3, that this lock happens. sdrplay3 logged this lines just before the deadlock. So I suppose it occurres within the sdrplay3 code:
gr::log :ERROR: rsp1a0 - sdrplay_api_Init() Error: sdrplay_api_Fail
gr::log :INFO: rsp1a0 - total samples: [0,0]
gr::log :ERROR: rsp1a0 - sdrplay_api_ReleaseDevice() Error: sdrplay_api_ServiceNotResponding

I called ctrl+c and dumped the following trace hopping it will help.

Stack trace of thread 483136:
#0 0x00007fd82cb39d8d ___pthread_mutex_trylock (libc.so.6 + 0x9cd8d)
#1 0x00007fd829128918 _Z17sdrplay_MutexLockPvm (libsdrplay_api.so.3.07 + 0x5918)
#2 0x00007fd829128a08 sdrplay_api_LockDeviceApi (libsdrplay_api.so.3.07 + 0x5a08)
#3 0x00007fd82934dabf ZN2gr8sdrplay38rsp_implC2EhRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKNS0_13stream_args_tESt8functionIFbvEE (libgnuradio-sdrplay3.so.1.0.0git + 0x1babf)
#4 0x00007fd82935fe52 _ZN2gr8sdrplay310rsp1a_implC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKNS0_13stream_args_tE (libgnuradio-sdrplay3.so.1.0.0git + 0x2de52)
#5 0x00007fd82935ff70 _ZN2gr8sdrplay35rsp1a4makeERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKNS0_13stream_args_tE (libgnuradio-sdrplay3.so.1.0.0git + 0x2df70)
#6 0x00007fd8293a3d80 n/a (sdrplay3_python.cpython-39-x86_64-linux-gnu.so + 0x24d80)
#7 0x00007fd82939bc21 n/a (sdrplay3_python.cpython-39-x86_64-linux-gnu.so + 0x1cc21)
#8 0x000000000053a8eb cfunction_call (python3.9 + 0x13a8eb)
#9 0x000000000051c24b _PyObject_MakeTpCall (python3.9 + 0x11c24b)
#10 0x0000000000537bb0 method_vectorcall.lto_priv.0 (python3.9 + 0x137bb0)
#11 0x000000000053430e slot_tp_init (python3.9 + 0x13430e)
#12 0x000000000051c6e7 type_call (python3.9 + 0x11c6e7)
#13 0x00007fd82c18af27 n/a (gr_python.cpython-39-x86_64-linux-gnu.so + 0x40f27)
#14 0x00000000005380a1 PyObject_Call (python3.9 + 0x1380a1)
#15 0x0000000000512d15 _PyEval_EvalFrameDefault (python3.9 + 0x112d15)
#16 0x000000000050f5e9 _PyEval_EvalCode (python3.9 + 0x10f5e9)
#17 0x0000000000526e6b _PyFunction_Vectorcall (python3.9 + 0x126e6b)
#18 0x0000000000512d15 _PyEval_EvalFrameDefault (python3.9 + 0x112d15)
#19 0x000000000050f5e9 _PyEval_EvalCode (python3.9 + 0x10f5e9)
#20 0x0000000000526e6b _PyFunction_Vectorcall (python3.9 + 0x126e6b)
#21 0x0000000000510da9 _PyEval_EvalFrameDefault (python3.9 + 0x110da9)
#22 0x0000000000526c43 _PyFunction_Vectorcall (python3.9 + 0x126c43)
#23 0x0000000000533e7a slot_tp_init.lto_priv.0 (python3.9 + 0x133e7a)
#24 0x000000000051c187 _PyObject_MakeTpCall (python3.9 + 0x11c187)
#25 0x0000000000516021 _PyEval_EvalFrameDefault (python3.9 + 0x116021)
#26 0x0000000000526c43 _PyFunction_Vectorcall (python3.9 + 0x126c43)
#27 0x000000000053799d method_vectorcall.lto_priv.0 (python3.9 + 0x13799d)
#28 0x0000000000515de0 _PyEval_EvalFrameDefault (python3.9 + 0x115de0)
#29 0x0000000000526c43 _PyFunction_Vectorcall (python3.9 + 0x126c43)
#30 0x0000000000510f92 _PyEval_EvalFrameDefault (python3.9 + 0x110f92)
#31 0x0000000000526c43 _PyFunction_Vectorcall (python3.9 + 0x126c43)
#32 0x0000000000510f92 _PyEval_EvalFrameDefault (python3.9 + 0x110f92)
#33 0x0000000000526c43 _PyFunction_Vectorcall (python3.9 + 0x126c43)
#34 0x0000000000537a44 method_vectorcall.lto_priv.0 (python3.9 + 0x137a44)
#35 0x0000000000634a0a t_bootstrap (python3.9 + 0x234a0a)
#36 0x000000000062a698 pythread_wrapper (python3.9 + 0x22a698)
#37 0x00007fd82cb35927 start_thread (libc.so.6 + 0x98927)
#38 0x00007fd82cbc59e4 __clone (libc.so.6 + 0x1289e4)

@fventuri
Copy link
Owner

@kirat8 - thanks for reporting this serious issue.

In order to be able to help you, it is important that I am able to reproduce and troubleshoot that problem here.

Do you mind sharing the 'minimal' code that triggers that deadlock?
By 'minimal', I mean that, if for example your original code starts the RSP, changes the frequency, and then stops the RSP, but you find out that the problem occurs even if you just start and stop the RSP (without the 'change frequency' step), please send just the simpler code; this way it is easier for me to find the root cause.

Thanks,
Franco

@kirat68
Copy link
Author

kirat68 commented Oct 23, 2021

I understand but I think going to be difficult because of the number of modules involved. You will loose your time. I thought the trace would be helpful but no problem, I will try to narrow down the issue. I have to wait for a new unresponsiveness of the api and this occurres rarely. So I am launching again the test after having improved my code. When and if the lock appears again, I will try to reproduce the issue in the interpreter with less code. I will let you know ...

@kirat68
Copy link
Author

kirat68 commented Oct 23, 2021

OK tried around and found a way to "reproduce" the problem. It is related to the a non responding api service that occures more often that I would like.
You can use the attached grc file. Just the sdr streaming into a sink.
Start it. Close it. everything fine.
Start it, then stop the api server service to simulate a non response (do not kill it it will restart automatically), then try to stop the python grc app. It hangs ...
block-sdr-head-test.txt
I hope this will help ...

@kirat68
Copy link
Author

kirat68 commented Oct 25, 2021

Same problem if you just restart the sdrplay api server.
Start the grc file, then restart the drplay api server, then try to close the running grc. It hangs.

@fventuri
Copy link
Owner

Tarik, thanks for your patience with this issue.

I added a timeout (currently set to 60ms, since it seems to work in my tests here, but you can change it if you think it is too low or too high) to the work() function to prevent it from waiting forever when the SDRplay API service gets terminated or crashes.

You can find the code with this change in the new branch add_timeout_to_work_function (https://github.com/fventuri/gr-sdrplay3/tree/add_timeout_to_work_function).
You can change the timeout value (in ms) in this line in the source file lib/rsp_impl.cc: https://github.com/fventuri/gr-sdrplay3/blob/add_timeout_to_work_function/lib/rsp_impl.cc#L560

In order to test this scenario, I simplified the GRC workflow you sent me; you can find the Python script I used for my tests attached to this comment.

Franco

test_add_timeout_to_work_function.py.gz

@kirat68
Copy link
Author

kirat68 commented Nov 2, 2021

Thank you Franco.
Will put this branch in my tests and let you know if I observe anything.

@fventuri fventuri self-assigned this Nov 3, 2021
fventuri added a commit that referenced this issue Jun 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants