-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mmal/h264: corrupt buffers returned by encoder #443
Comments
Do you have to change the connection on the input side to use I420, and return the buffers to your app? Or is just setting buffer_num on the output to >1 sufficient? I'm a little surprised that fragmenting frames is an issue as that used to be the way the codec was always driven. Under Android we had to avoid it due to issues in the Android framework if you did so. The output data is just saved into a circular buffer and read out as appropriate to fill the buffers - the codec itself doesn't know any different. The buffer at the wrap point in the circular buffer will always get fragmented - IIRC that is at about 2MB. You can avoid that by setting the MMAL parameter MMAL_PARAMETER_MINIMISE_FRAGMENTATION (added due to the Android limitation above). You may be able to provoke this more easily if you drop the buffer_size on the output port. As long as it is >=buffer_size_min then the system will run with it. |
I've found that the buffer_num change alone is sufficient, but without the other changes instances of corruption become less frequent: only 19 cases out of 1000, in one test. I may have misunderstood, but setting the fragmentation parameter on encoder_output at the same time as MMAL_PARAMETER_VIDEO_ENCODE_INLINE_HEADER et al. are set resulted in the encoder jamming up, after some varying number of clips has been emitted (4 -- 20, typically). Nothing useful logged, even when using VC_LOGLEVEL=mmal:trace, regarding why it gets stuck though. buffer_size is already equal to buffer_size_min and buffer_size_recommended (65536). Increasing buffer_size, arbitrarily to 262144, leads to zero corrupt clips (out of 1000), but of course there's probably no fragmentation going on either, except perhaps on I-frames. |
OK on the mods. The only thing I can think of is that the IPC mechanism will be significantly more stressed with you pulling all the images back to the ARM before encoding. Inline headers was never used with minimise fragmentation, so I guess may have some nasty lurking. I'm a little surprised as the mechanisms should be independent, but I haven't looked at the code. It will be an issue in the encode component rather than MMAL, hence the lack of any useful logging. Thanks for checking on buffer_size - I couldn't remember what the settings are, and it was possibly a useful one. Are you on a Pi 1 or Pi 2? And what firmware version and kernel version are you using? |
Understood; the below may make this irrelevant, but is there any way to get debugging passed back from the encoder, or is it mostly invisible from the ARM side (
This is on a Pi 2, with most testing done with kernel/firmware from raspberrypi/firmware/8aca5762f984f6decbeda294cee8418966c3d8d3, but also reproduces on an up-to-date Raspbian image, and with an rpi-update done last week. Interestingly, I've now tried a Pi 1, running both 8aca576 and last week's firmware, and cannot reproduce the issue, even with I420 buffers bouncing through the ARM side. The issue is still present on a Pi 2 booted with nosmp.
Seems like you might be on to something here, so far I haven't been able to hit it when using zero copy! |
@pelwell You know more about caching differences on Pi2. logging 64 or 192 should give out more useful information from the codec, but the camera is also on that flag. 4 or 8 will dump out logging from the lower level codec, but that is unlikely to be of much use without source. |
I have been looking at this today. By putting cache flushes into vchiq (which it has largely escaped up to now) I can make it slower, but my simple test is failing just as often with and without them. There must be some other factor at work. |
Hmm. VPU side should be treating all of those buffers as direct alias addresses (mmal_server.c mmal_do_allocate_buffers() has MEM_FLAG_DIRECT). Zero copy should remove the use of bulk transfers completely which does move the goalposts somewhat. Can you remind me (via email if necessary) of the Pi2 cache structure. Pi1 the ARM had a dedicated L1, and then went to SDRAM via the VPU L2. |
Pi1: ARM->L1->GertMMU->0x40000000 alias (so allocating/coherent with GPU L2) |
Update: I can see anomalous cache behaviour using a test programme. The ARM is writing stale data back over bulk data transmitted from the VPU, and this despite attempts to invalidate the destination. |
I've just pushed changes to the active kernel branches that make allowances for the increased cache line size on 2836, and include an extra cache invalidate after a bulk transmit from VC. There is a corresponding firmware patch, and when both are present it passes my tests. The extra invalidate should mean that the additional flush in the MMAL code is unnecessary, but I've left it in place. |
kernel: BCM270X_DT: I2S needs function Alt2 See: raspberrypi/linux#1046 kernel: vchiq_arm: Two cacheing fixes See: #443 kernel: BCM270X_DT: Overlay for the Fen Logic VGA666 board firmware: arm_loader: Increase stack and ensure icache flush is done before threads in execute multi firmware: arm_loader: Switch to vpu queues and more profile logging firmware: clocks: Allow arm to be overclocked to 1.6GHz firmware: gpioman: Don't force all pin pulls to their defaults firmware: arm_loader: Fix length on get palette mailbox call See: raspberrypi/linux#1026 firmware: vchiq: Better error handling firmware: vchiq: Make fragment size vary with cache line size See: #443
kernel: BCM270X_DT: I2S needs function Alt2 See: raspberrypi/linux#1046 kernel: vchiq_arm: Two cacheing fixes See: raspberrypi/firmware#443 kernel: BCM270X_DT: Overlay for the Fen Logic VGA666 board firmware: arm_loader: Increase stack and ensure icache flush is done before threads in execute multi firmware: arm_loader: Switch to vpu queues and more profile logging firmware: clocks: Allow arm to be overclocked to 1.6GHz firmware: gpioman: Don't force all pin pulls to their defaults firmware: arm_loader: Fix length on get palette mailbox call See: raspberrypi/linux#1026 firmware: vchiq: Better error handling firmware: vchiq: Make fragment size vary with cache line size See: raspberrypi/firmware#443
Firmware is rebuild with @pelwell fix. Not sure if it is the problem reported here, but worth running rpi-update and retesting. |
I'm delighted to say the issue no longer reproduces for me when using the latest firmware. Thank you very much for the work in producing a fix. |
No problem, and thanks for the feedback. |
kernel: BCM270X_DT: I2S needs function Alt2 See: raspberrypi/linux#1046 kernel: vchiq_arm: Two cacheing fixes See: raspberrypi#443 kernel: BCM270X_DT: Overlay for the Fen Logic VGA666 board firmware: arm_loader: Increase stack and ensure icache flush is done before threads in execute multi firmware: arm_loader: Switch to vpu queues and more profile logging firmware: clocks: Allow arm to be overclocked to 1.6GHz firmware: gpioman: Don't force all pin pulls to their defaults firmware: arm_loader: Fix length on get palette mailbox call See: raspberrypi/linux#1026 firmware: vchiq: Better error handling firmware: vchiq: Make fragment size vary with cache line size See: raspberrypi#443
Under particular circumstances, files saved from the h264 encoder's output do not decode cleanly using FFmpeg. A typical FFmpeg message for these frames would be
though the "error while decoding" line may or may not be present.
This error seems almost exclusively to occur when multiple output buffers are allocated (though I have seen one instance of this error while using a single buffer) and it only seems to happen when an output frame is split across multiple buffers. A guess might be some locking bug.
I've made a minimal patch against RaspiVid.c that reproduces the issue (using a pool of 3 buffers), uploaded to https://gist.github.com/malc0/d7d3b0d90af0e39d335c . When running the resultant binary with
-ss 2000 -t 0 -sp -o %d.h264 -td 1000,1000 -n
within 100 files I get one or more with corruption (-ss 2000
I suspect simply makes the frames noisier and hence bigger: appropriate clips tend to be >1.5Mb,-n
is necessary due to the change inconnect_ports()
). I'm detecting corruption using:Do let me know if there's any more tests I can do, or pertinent information I can provide.
The text was updated successfully, but these errors were encountered: