You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now retina::codec::VideoFrame implements Buf and supports only H.264. It always has two chunks: an AVC length prefix and the header+body of the (single) picture NAL (note I'll need to extend it for multiple NALs). The idea was to support zero-copy, but I think right now it's kind of the worst of all worlds:
if the NAL is fragmented (most of the bytes and maybe even most of the NALs on a typical stream), it copies it all into a BytesMut with a guesstimated size. It might copy again to grow partway through, and then probably (I haven't looked at profiles) copies it again on BytesMut::freeze due to tokio-rs/bytes#401.
if the caller wants a single contiguous buffer, they'll end up copying it themselves. (Maybe with bytes::Buf::copy_to_bytes, maybe not.)
you can only iterate through it once, which might be a problem for some usages.
AudioFrame and MessageFrame provide a Bytes directly; there's no good reason all three frame bytes shouldn't present their data in the same way.
I'd prefer to follow one of two paths. I haven't made up my mind as to which:
truly support zero-copy via data(&self) -> impl Buf<'self>. VideoFrame needs a Vec of chunks. If you want to iterate multiple times, you can just call data as many times as you want. You can use Buf::chunks_vectored + std::io::Write::write_vectored / tokio::io::AsyncWriteExt::write_vectored.
put everything into a single buffer. Accumulate Bytes in a reused Vec in Depacketizer::push then concatenate them during Depacketizer::pull when the total size is known, avoiding the extra copies mentioned above.
Arguments in favor of zero-copy (custom Buf implementation with multiple chunks):
People like zero-copy; it's generally assumed to be more efficient (but see below).
One neat API trick this would allow (for H.264) is selecting the Annex B encoding (00 00 00 01 between NALs) or the AVC encoding (length prefix between NALs) via something like h264(&self) -> Option<&H264Frame> then data_as_annexb(&self) -> impl Buf or data_as_avc(&self, len_prefix_size: usize) -> impl Buf. With the single-buffer approach I'd probably make folks choose when setting up the depacketizer instead. (I guess it'd also be fairly efficient to have a &mut accessor on the VideoFrame which does a one-way conversion from 4-byte AVC to Annex B, but that's a weird API.) I can imagine someone doing some fan-out thing where they actually want both encodings.
Arguments in favor of a single Bytes or Vec<u8>:
More convenient / simpler to get right IMHO. The zero-copy APIs seem half-baked/finicky, eg tokio's tokio::io::AsyncWriteExt::write_buf just writes the first chunk, and there's no write_all_vectored in either std::io::Write or tokio::io::AsyncWriteExt.
It might actually be more efficient. The answer isn't obvious to me. With my IP cameras, the IDR frames can be half a megabyte, fragmented across hundreds of RTP packets of 1400ish bytes. Just the &[Bytes] is then tens of kilobytes (four pointers per element). Far too big for the no-alloc path of SmallVec<[Bytes; N]>. And if someone's doing a writev call later, they have to set up/iterate through hundreds of IoSlices to write the whole thing at once. (And you can't even reuse a Vec<IoSlice> between iterations without trickery because it expects to operate on a mutable slice of initialized IoSlice objects.) It wouldn't surprise me if zero-copy is actually slower.
I'm not sure but I think Bytes is mostly just a tokio thing. async std folks might use just Vec<u8> or something instead. Although I'll likely keep using Bytes internally anyway to keep the individual packets around between push and pull.
Right now I'm leaning toward single Bytes. I might try benchmarking both but if the performance is close I think simplicity should win.
The text was updated successfully, but these errors were encountered:
scottlamb
changed the title
API design: should depacketizers provide data as impl Buf or Bytes?
API design: should depacketizers provide data as impl Buf or Bytes/Vec<u8>?
Jun 9, 2021
Right now
retina::codec::VideoFrame
implementsBuf
and supports only H.264. It always has two chunks: an AVC length prefix and the header+body of the (single) picture NAL (note I'll need to extend it for multiple NALs). The idea was to support zero-copy, but I think right now it's kind of the worst of all worlds:BytesMut
with a guesstimated size. It might copy again to grow partway through, and then probably (I haven't looked at profiles) copies it again onBytesMut::freeze
due to tokio-rs/bytes#401.AudioFrame
andMessageFrame
provide aBytes
directly; there's no good reason all three frame bytes shouldn't present their data in the same way.I'd prefer to follow one of two paths. I haven't made up my mind as to which:
data(&self) -> impl Buf<'self>
.VideoFrame
needs aVec
of chunks. If you want to iterate multiple times, you can just calldata
as many times as you want. You can useBuf::chunks_vectored
+std::io::Write::write_vectored
/tokio::io::AsyncWriteExt::write_vectored
.Bytes
in a reusedVec
inDepacketizer::push
then concatenate them duringDepacketizer::pull
when the total size is known, avoiding the extra copies mentioned above.Arguments in favor of zero-copy (custom
Buf
implementation with multiple chunks):00 00 00 01
between NALs) or the AVC encoding (length prefix between NALs) via something likeh264(&self) -> Option<&H264Frame>
thendata_as_annexb(&self) -> impl Buf
ordata_as_avc(&self, len_prefix_size: usize) -> impl Buf
. With the single-buffer approach I'd probably make folks choose when setting up the depacketizer instead. (I guess it'd also be fairly efficient to have a&mut
accessor on theVideoFrame
which does a one-way conversion from 4-byte AVC to Annex B, but that's a weird API.) I can imagine someone doing some fan-out thing where they actually want both encodings.Arguments in favor of a single
Bytes
orVec<u8>
:tokio::io::AsyncWriteExt::write_buf
just writes the first chunk, and there's nowrite_all_vectored
in eitherstd::io::Write
ortokio::io::AsyncWriteExt
.&[Bytes]
is then tens of kilobytes (four pointers per element). Far too big for the no-alloc path ofSmallVec<[Bytes; N]>
. And if someone's doing awritev
call later, they have to set up/iterate through hundreds ofIoSlice
s to write the whole thing at once. (And you can't even reuse aVec<IoSlice>
between iterations without trickery because it expects to operate on a mutable slice of initializedIoSlice
objects.) It wouldn't surprise me if zero-copy is actually slower.Bytes
is mostly just a tokio thing. async std folks might use justVec<u8>
or something instead. Although I'll likely keep usingBytes
internally anyway to keep the individual packets around betweenpush
andpull
.Right now I'm leaning toward single
Bytes
. I might try benchmarking both but if the performance is close I think simplicity should win.The text was updated successfully, but these errors were encountered: