-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoiding MMAL excessive buffering #496
Comments
Ping @deborah-c as I've discussed this requirement before. I think the answer is currently no, and it's not simple to achieve. I have a similar issue with kodi's use of mmal. I handle it by looking at timestamps being submitted to mmal and the timestamps of pictures returned. If this gets large (e.g. 1 second) I stop submitting new frames until the difference has reduced. |
Yes, I think the point is that the MMAL API user can't really make a good guess when it should stop or start sending input. My thinking is that this should be much easier for the decoder itself, as it knows whether it's actually processing something, or just waiting for new data. I'm hoping it's possible. This would probably require new API. Alternatively, the input buffer sizes could reflect the size of the internal decoder's buffer (which would allow the user to estimate the buffering), but I bet this would be a complicated solution. |
We'll see what @deborah-c says. We did discuss a new api to limit the number of "userdatas" (metadata attached to the encoded video packet) and so block when too many packets were inside codec, but there are complications. The main one is deadlock if you request fewer userdatas than is required to output a picture (which with hierarchical B frames can be quite a large number). |
…ck structures used See: raspberrypi/firmware#496 firmware: audioplus: Remove spurious semi-colon that breaks test mode firmware: vdec3: mpeg2: Don't use display_ext info for calculating PAR See: raspberrypi/firmware#507 firmware: Initial support for setting voltage based on ring osc
I've implemented something that could help. The new parameter MMAL_PARAMETER_VIDEO_MAX_NUM_CALLBACKS can limit the number. Note: if set to too low a number (e.g. less than number of pictures in DPB) video_decoder will stall. Because you may not know the size of DPB, You can set to -1 which uses the DPB size. |
Sounds very interesting! Why did you change the buffer parameters away from "recommended"? |
There is an issue in kodi for live streams (e.g. PVR) where it attempts to estimate clock skew between back end and front end by monitoring it's internal queue to the hw decoder. The recommended values are still recommended for most use cases. You want to keep the hw codec's buffers as full as possible to avoid ever underrunning. But having 20 x 80K buffers does mean during a low bitrate part of stream you may quickly swallow 20 frames in the queue. Fewer larger buffers doesn't allow that to happen (but may not utilise memory as efficiently). In reality none of this should be required, and I'm sure could be fixed elsewhere in kodi, but sometimes just changing your behaviour to be more like everyone else's is the simpler option. Still not sure if this will fix the issue (debugging remote user's PVR issues is not easy), or if the performance impact will be problematic (possibly I'll have to make this change live stream only in the future), but it's another option to play with if you don't like the existing behaviour. |
Hm, then why not the buffer_size_min/buffer_num_min? Or would that be too extreme on the other end, and not buffer enough data? To be honest, I think a media player which requires buffering of input data should actually buffer it somewhere else, and not in the decoder itself. The decoder should only buffer as much is required to keep its internal pipeline busy. But maybe others would disagree. |
I think buffer_size_min=2048 and buffer_num_min=1 so for large video frames there would be many round trips to gpu which would hurt performance. How much is needed to keep it's internal pipeline busy though? On a Pi1 when arm cpu usage is high there could be hundreds of milliseconds before new data appears. All of this can be done from arm side. It makes little difference if mmal_queue_timedwait blocks because the the codec is refusing to consume data, or if you sleep explicitly when the timestamps of submitted frames exceed the timestamps of returned frames by some margin. Ideally you may need to know the number of pictures in dpb and framerate of video to ensure codec has enough data, but it should be possible. |
I wanted to use this to achieve fixed buffering in the ffmpeg MMAL decoder. But it doesn't work: I can't find a way to distinguish a discarded frame from a frame that takes unexpectedly long to decode. I can't just use a large timeout either, because 1) that can stall decoding (without getting new input), 2) even if it gets input leads to more buffering, even if the buffering is strictly speaking outside of the GPU, 3) is not truly reliable. I looked at the kodi code, and even that doesn't seem to get away with timeouts and heuristics that double-guess what's going on inside of the GPU. I suppose I could somehow try decoding a few frames more than needed to achieve the fixed delay, and then when it "underruns" consider it a dropped frame. But even then it's hard to tell exactly when it "underruns". It could be that the input is still a bit "in flight". The decoder could still be working, but there's no way to tell. Would it be enough to check the input queue buffer to know whether the decoder is busy? This all sounds fragile and racy. Timeouts will just interrupt playback if they block, or cause timeout and cause glitches with high bitrate video. To make it even worse, discarded frames are not all that uncommon. (Although the decoder seems to repeat frames for some packets that contain no frames.) I could modify the libavcodec API to output 0 to N frames per decode call (instead of the forced "lock step" packet->frame data flow). Then I could just copy all the kodi hacks. But even then I'd have to live with that timeout, and variable buffering. (While I don't mind the variable buffering, it prevents me from determining AVI timestamps - which either requires a fixed delay so that a fixed size PTS buffer can be used, or exact information about which frames were dropped.) Maybe I'm just missing something? Any tips? |
About discarded frames - what do you set |
I don't set it at all. It's possibly what I'm looking for - what exactly does it do? |
When error concealment is enabled corrupted pictures are not returned. Disabling error concealment means you see every picture and can choose whether to display them or not (using the MMAL_BUFFER_HEADER_FLAG_CORRUPTED flag). |
What happens with packets that do not contain a frame at all (e.g. vop_coded=0)? |
Not certain. @deborah-c or @6by9 may be able to say. |
Sorry, too far down the stack for my knowledge. |
MMAL actually does repeat the previous frame for packets with no frames. But sometimes it still drops output. I didn't verify what's actually in the packets in this case, though, or what's different about packets which skip output. Here is a sample file: https://0x0.st/X18.avi |
Also, on the subject of excessive buffering and avoiding timeouts, I think it would be ideal if input buffers were returned only after the corresponding packet was decoded. (Apparently this is not the case.) Then an API user could actually know whether a frame just takes longer to decode, or if it was dropped. I think currently there is no way of detecting this situation. |
And in addition, silently discarding packets make it impossible to reassociate the timestamp for avi with b-frames (avi has no proper PTS). |
You've now increased the minimum input buffer count to be some variable value dependent on the stream and reference frames. How do you set that ahead of time without making it arbitrarily high? OpenMAX IL and MMAL are both designed around streaming data down pipelines, not breaking it down into constituent elements. Sorry, not going on the list of things I'll look at (video decode isn't my area of knowledge anyway). Up to @popcornmix if he wants to look at it. |
I guess the input buffers work more like a unix pipe, rather than a packet queue? (The difference is that the former is just a FIFO that shuffles the data across, rather than having some inherent logic about its contents.) It doesn't have to work like I suggested. I just want a way to know when a packet was discarded, instead of resulting in an output frame. In any case, something is needed to get bounded buffering without running into deadlocks in corner cases. I also need something to determine AVI timestamps.
That's completely orthogonal. The fixed-size output ring buffer doesn't really match with what you describe, by the way. Output buffers are discrete elements, and also the user has to take care to recycle them, so they're not just "streamed down" a pipeline. |
Sounds like a reasonable comparison, though with hints available as to where frame boundaries may be.
Others would be able to correct me on this, but I believe internally there is a callback when the codec has finished with a frame. If so, then it may be possible to produce an empty IL/MMAL buffer if the codec has discarded a frame. As long as you pay attention to nFilledLen / length before processing a frame, that would give you your signalling for AVI timestamps without upsetting contents. The encoder certainly works in that manner if rate control decides not to use any bits on a frame. |
That would be reasonable. The idea is similar to how MMAL_PARAMETER_VIDEO_DECODE_ERROR_CONCEALMENT works.
Not sure if I understand. What is nFilledLen? |
Video_decode is not my area, but reading back I see popcornmix commented #496 (comment)
So have we already got that mechanism in place delivering a buffer with flags set.
https://github.com/raspberrypi/userland/blob/master/interface/vmcs_host/khronos/IL/OMX_Core.h#L421 |
Yes, but it might be different because a corrupted frame is still a frame, which might contain a partially decoded picture, and which users even might want to see. In this case, the frame would be just empty. Not a problem in my opinion, though. |
Too far into codec detail for me again. |
Too involved for a quick solution from me. |
Thanks for taking a look anyway. I still hope some mechanism can be provided in the future for determining whether a packet was discarded. (Or something similar.) |
Have you tried using MMAL_PARAMETER_VIDEO_MAX_NUM_CALLBACKS ? It's working well in latest kodi builds and ensures minimal frames are buffered in codec. |
Yes. It works, but doesn't quite achieve what I want. The subtle difference is that while it limits the maximum, I still can't ensure that it stays close to the desired buffering (without that performance would suffer). My problem with silently discarded packets/frames is that they will decrease the buffering each time. While it's easy to estimate the number of frames buffered in the decoder, you can't get a definite number due to the asynchronous behavior. In the end, distinguishing a dropped frame and a frame that takes longer to decode seems impossible (at least if you want to be certain).
I wasn't aware of such a behavior. Currently I'm not setting MMAL_BUFFER_HEADER_FLAG_USER0 on any buffers. What exactly would it change? As far as I'm aware, I do get callbacks for all buffers sent to the ports. |
The userdata callback structures are used internally by video_decode for returning metadata with the decoded pictures. MMAL_PARAMETER_VIDEO_MAX_NUM_CALLBACKS limits the number of userdata callback structures in the pool, which causes blocking (and hence a bound on number of frames inside decoder) which is desirable in some cases. However under some circumstances the metadata is not requires.
So, if you have a timestamp, or a non-useless flag then all is good. If not, then the limiting of buffer won't apply. Always setting |
On what will it block? If the number of output callbacks reaches the configured max limit? (I don't think this will help me in the situation I've described.) I'm not sure if I fully grasp the consequences of setting the user flag. I'll play with it a bit next week or so. |
The excessive buffering issue reported is because the video decode hardware has a large buffer (1-2MB) of input data. That means it can swallow many dozens of frames when bitrate is low. The decoded frames returned may be a few seconds older than the current encoded frames being submitted. MMAL_PARAMETER_VIDEO_MAX_NUM_CALLBACKS provides a means of limiting the number of frames inside the decoder to avoid this issue. However it only apples to frames that require user metadata to be returned. Setting The |
OK, I suppose I was observing mostly correct buffering because I usually set a timestamp on the buffer.
Completely unrelated, but couldn't the decoder skip decoding of flagged frames if they're non-ref frames? |
Yes. If I wasn't fitting in with Kodi's expected behaviour |
Tried it - this didn't help in this specific case. It still skips some frames in the sample file I have. (But maybe it's not meant to fix this specific case.) |
…ck structures used See: raspberrypi#496 firmware: audioplus: Remove spurious semi-colon that breaks test mode firmware: vdec3: mpeg2: Don't use display_ext info for calculating PAR See: raspberrypi#507 firmware: Initial support for setting voltage based on ring osc
It looks like you can just keep feeding input buffers to MMAL, and it will immediately return most of the input buffers, all while not outputting any frames. I managed to send 127 buffers to the h264 decoder, even though the input and output pool both have only 20 buffers, and I didn't feed it new output buffers (i.e. the decoder only got 20 output buffers in total). It must be buffering a lot of data internally.
I'd like to avoid feeding it that much, and would rather block a while until I get a new frame.
Is there a way to determine whether MMAL will output a new frame with the input buffers sent so far? MMAL would have to answer with either "yes" or "potentially needs more input data".
The text was updated successfully, but these errors were encountered: