-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streaming improvements #14
Conversation
BTW, no crash by changing sample rate now on my side. It may not be perfect, but definitely a "less code, less bugs" trend here. |
Let me put in the bug fixes to |
@zuckschwerdt No problem, do as you wish, I can always re-merge my changes later. In the meantime, I've measured the What I measure with the current code having: buffer_size = 8 * MTU with samplerate of 10Msps:
Another attempt at 5 Msps:
I've also measured So basically irrespective of our processing power, @zuckschwerdt How high samplerate you could reach on your side without crackling sound on CubicSDR ? |
I see three topics worth addressing individualy here: Tightening the copy loop. iio_channel_read() calls iio_channel_convert() on each sample, this isn't inlined and expensive. Your changes for well known cases will vectorize (should confirm that). Wouldn't call it direct copy though, I'm using that for the case which lends itself to DMA. Default buffer size. I suspect the code is doing something like a log. A comment would be helpful ;)
I might have botched that table, but the code needs a cleanup and documentation anyway. Dropping the refill_thread. As you mention the current code is just synchronized.
on the bottom of recv() we get concurrency. recv() will then only block if you call it too fast. |
The IIO internals guide has some information on buffers.
But this might all only apply to code running on the Zynq. IIO abstraction through USB or network might always block as I gather? Using audio stutter in Cubic as test I can go up to 5M without refill_thread, 6M starts to stutter. With 32K buffer size and refill_thread at the tail of recv() I can get 6M without stutter. But nothing above is smooth. |
Unless I'm wrong, the only thing to add compared to my
I don't get that part. Which code ?
The code is all wrong there I'm afraid and need to be re-written based on var conds (is_not_full, is_not empty) principles and must not mix atomic variables in the process. And you need an additional bytes buffer to accumulate the samples, because I was just doing that based on Basically we have to find a buffer_size which maximize For testing buffer sizes, the most practical way would be exposing the buffer length as a SoapySDR setting that can be changed before starting to stream. This would be done implementing The MTU size is not related to perfomance at all, but is a compromise between not too big reads that would not assure 60fps, and not too-small ones to prevent too many
I suspect the DMA optimization is only valid on the device own memory space as well. You'll always have to go through the USB pipe. That would be all for this Week-end for me. I think we made good progress already 👍 Regards, Vincent |
The table is about the set_buffer_size_by_samplerate selected sizes and resulting fps/latency. |
My comment about "Tightening the copy loop." was that it's a good thing to do and should go in as a separate PR. It's not a simple memcpy "direct copy", but a tighter loop that the compiler might vectorize. |
Call it "can_do_raw_buffer_handling" if you wish.
Too late ! I'm already using it. And it works ! :)
Ah OK. My tests shows that at least on my machine 131072 for set_buffer size is the fastest setting. The time Now for more reasonable buffer sizes, a well behaved Looks like it is not the case, because the amount of time is almost 10 ms for 2Msps to 10 Msps for the same amount of samples, as if the |
Why did you pick 16384 as RX full scale? The data is 12 bit aligned to LSB. I.e. exclude the sign bit and you get -2048 – +2047. Scale that to -1 – 1 by dividing with 2048. |
I favor putting distinct topics into distinct PRs. Just for the sake of others following this, the changelog, and perhaps bisecting changes if needed. (This assumes we squash merge. If you rebase interactive to distinct topic commits that would work too.) I guess to really nail a perfect buffer size or a formula depending on sample rate we need to test various IIO abstraction: native, USB, network. My hunch is as native already has tripple-buffering and would be best with small buffers. Network might be best aligned around network MTU and USB performs badly with small buffers in my experience. Btw. SDRPlay, HackRF and AirspyHF have fixed 64k buffers, BladeRF has 4k. |
Yes, and we are in no hurry to merge in master. It is just a way to make an experimental branch so that others can add commits on it.
That is just fine apparently when I tested. The only difference is that there will be one |
Using either a refill_thread (with or without the refill at tail of recv) or without the refill_thread or even with triple buffering in the refill thread it always tops out at 6.3 Msps / 25.3 MBps for me (SoapyRateTest). |
Thanks for the extensive testing. I guess we can definitely strip
Well technically we could implement //the 3 bytes (in being a uint8_t*)
uint16_t part0 = uint16_t(*(in++));
uint16_t part1 = uint16_t(*(in++));
uint16_t part2 = uint16_t(*(in++));
//the 2 resulting I/Q in a int_16 format.
int16_t i = int16_t((part1 << 12) | (part0 << 4));
int16_t q = int16_t((part2 << 8) | (part1 & 0xf0)); in Apparently the SoapySDRServer can stream (packed) CS12 which could be unpacked on the application side by the more normal CS16/CF32. 6.3Msps x 32 bits is 200Mbit/s alone and I've read somewhere that USB bandwidth is reserved for TX. Out of curiosity I tried SDRSharp with the PlutoSDR plugin, which apparently use the Net interface. Samplerates up to 10Msps are available, but any setting > 5Msps is marked "(not supported)", go figure. |
@zuckschwerdt Charles (@cjcliffe) and I are in contact with Robin Getz of Analog Devices by e-mail maybe we can ask him the maximum Msps we could get from the device before starting a lost quest. |
Great! I briefly though about CS12 too, but wasn't sure that's even an option in SoapyRemote. So 8Msps with CS12 and 10Msps at CS8, I'll explore that soon. Are you ok with me splitting out the fast-copy changes and perparing that for merge, then we can focus on buffer size and (non-)threading here. |
Excuse my French, do you mean just adding the fast-copy enhancements to master for now ? yes, do it, by all means ! The best way is probably opening a new PR containing just that. |
I already have CS12 working (just a hack for now). I'll add that after the fast-copy for testing. |
I just tested at 6.3Msps on ClubicSDR, on a 200Kz FM stereo station. Same as you, with 6.4Msps starts the cracking sound.
So you really plan to run the module on the Pluto itself, together with a SoapySDRServer ? Wow. ! |
The client should support the conversion between CS12 over the network to CS16 or CF32: https://github.com/pothosware/SoapyRemote/blob/master/client/ClientStreamData.hpp#L11 So if you advertise the native format as CS12, SoapyRemote should be ok. Caveat is that CS12 could be packed differently than expected, so thats something to look out for. |
Cross compiling and running stuff on the Pluto isn't hard. It's just a pain to setup the compiler. Will SoapyRemote always use the native format (CS16) for network? If the SoapyPlutoSDR module offers CS12 and CS8, and the client side requests that, will SoapyRemote on the server-side pick that for transport? I'll investigate in a day or so. |
Okay, I can answer this now. If SoapyRemote is running on the Pluto and an application requests CS8 or CS12 then SoapyRemote is smart enough to use that over the network:
for CS8, and for CS12:
CS8 works right now and I also got working CS12 code on top of #16. PR soon. |
Looking at I get the impression that CS12 is |
@zuckschwerdt, looking at https://github.com/pothosware/SoapyRemote/blob/master/client/ClientStreamData.cpp#L83 using pen-and-paper lets say types are described in [MSBit ... LSBit], I get: Looks like I/Q are MSBit aligned here. |
e576421
to
0c3c0d1
Compare
- Simplified getting Stream* using the same tricks as SoapySDRPlay, - Revised some scoped locks BEWARE: TX has not been modified accordingly yet.
0c3c0d1
to
327a511
Compare
@zuckschwerdt I have rebased the |
I'll soon test. First notes: I'd expect the rx/tx streams to be independed. I.e. if I need two |
…ndent closeStream
3c8f126
to
ccce098
Compare
@zuckschwerdt You are right. I fixed this, plus replaced |
The Pluto is advertised as full-duplex. I don't know how much of that is true, perhaps independent control of both streams would be possible. E.g. running different rx/tx threads, with start/stop at different times. |
…d code to use them properly
Indeed. So I made a change to return 2 distinct SoapySDR::Stream* for RX and TX. |
…otect everything including streaming with this spn-lock
If you follow that path, you'll fall into the rabbit hole and soon realize that everything have to be locked by the same I've tried to put The coward solution is to let Unfortunatly they are not standard issue in C++11, but strangely it provides the So in the last commit, I made my own |
Part of the headache I had with the threading was concurrent access to stop() and recv(). There might still be a thread draining the buffer when stop() is called, and killing the buffer is a good way to crash while recv() is underway. Also something to look out for. Perhaps removing buf can be defered to the destructor. But that also means once buf is set up it can't safely be changed. |
Yes that is the mean issue. With the "lock everything" strategy, a Basically Settings and Streaming ops get interleaved properly, so I think we may change any buffer sizing at runtime and it would still work. |
Sorry, I meant that with a lockless or atomic variant. With the mutex lock it guards nicely, but avoiding a mutex would probably yield better performance. |
Well it is a called a mutex but here is just a CAS loop, which will almost always succeed the first time anyway because the actual contention is rare. I don't think we'll see any runing performance impact on streaming itself compared to no lock at all. The only visible effect would be that a changing setting will wait for the current streaming operation to complete to take effect. At that particular time the spin-lock is likely to be slower than anything else because of its wasting of CPU cycles. |
RX and TX are indeed quite independent, so I've made changes to have different RX and TX locks, plus some other cleanups. |
Works very well and fixed spurious crash on start/stop for me.
and
which I gather are equivalent, just not so nice looking. |
My bad, |
Tested and works well for me. Can be merged (squash merge please) when you see fit, in my opinion. |
Alea jacta est ! |
Hello Charles, @zuckschwerdt and @guruofquality.
Thanks to suggestions and tests by @zuckschwerdt, I've incremented them by opening this PR to add improvements of my own together with the others.
@guruofquality I don't know if @zuckschwerdt can push on this PR ?
Anyway, here are the improvements :
iio_buffers
has their own buffer management.int16_t
into the destination format and don't neediio_channel_convert
at all.Of course, this is a work in progress:
refill_buffer
op turns out to be too slow. I hope not.