-
Notifications
You must be signed in to change notification settings - Fork 516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support 5.1 (side) AAC encoded by FFmpeg (actual channel layout is stored in PCE) #598
Comments
@joeyparrish Thanks for filing the issue. We do not support more than one data block in an ADTS frames right now. It is a nice feature to support. Move to backlog for now until we have resources to work on it. |
Thought I'd take a quick look at this issue... If I download your .ts file, I can reproduce the problem. Interestingly, if I open your file in VLC, I hear some odd bit of music that is very stuttery (about 0.25 seconds of silence, 0.25 seconds of music, repeated through all 30 seconds of the clip). VLC claims the audio is ADTS stereo... so I'm wondering if I'm missing a codec or something. I downloaded the 720p mkv file from here: https://download.blender.org/durian/movies/ as I'm currently taking a look at this at work, and the internet here is apparently worthless (it was going to take 22+ hrs to download the 4k version). I use the same command you list:
and I get the following towards the end of the transcode session:
I can play the resulting audio just fine in VLC. Running the packager with:
I get:
The resulting mp4 plays just fine in VLC as well, and it does detect that it is AAC with channels "3F2R/LFE". Is there something special about the 4k version of the video that I need to use it? Am I missing a codec or command-line parameter somewhere along the line? Is the "frame size is not set" message indicative of the problem, being that this problem occurs when there is more than one block per frame? |
In my experience, the blender servers are very slow. Here's an alternate link for the 4k version I'm using: https://storage.googleapis.com/shaka-streamer-assets/sample-inputs/Sintel.2010.4k.mkv Let me see if I can reproduce your success with the 720p version, or if I can see the differences between them. |
Just checking all the details to make sure we're doing the same thing:
|
Oh, interesting on the MD5! I also tried with a lower quality mp4 with 5.1 from some mirror earlier yesterday and couldn't reproduce the issue with it. I've built the packager from source. As I am able to repro the issue with your .ts file, I'm presuming it has not been fixed. But perhaps I should rolling the source back to a specific release. I'm on CentOS, and it looks like I'm running ffmpeg 3.4.6 (CentOS is always so far behind on software versions). I can try getting a newer ffmpeg or building from source. Perhaps the older version of ffmpeg is packaging the audio differently. I'm somewhat of a novice when it comes to the inner workings of codecs, so forgive me if I ask dumb questions (and feel free to ignore me if I'm wasting your time. I am a professional programmer with many years of experience, and I have a bit of a personal interest in this issue being fixed, so I wanted to give it a go. But I may be foolhardy in hoping that this is an issue I can jump in and learn enough about to tackle effectively). That said, I'm curious: if the audio can be packaged in within a single ADTS frame, I'm curious what's gained by changing it to use multiple frames. In fact, https://wiki.multimedia.cx/index.php/ADTS suggests
Makes me wonder if it's actually a bug in a newer version ffmpeg. Going to see if I can debug and see just how many frames are being listed in the header. |
Not at all! I'm so glad you started looking at this and comparing notes with me! It could well be a bug in ffmpeg. (Or an "accidental feature" as Larry Wall once called a bug in Perl.) If we can change ffmpeg to use 1 AAC frame per ADTS frame, then I don't need this fixed in the packager at all. I hunted quite a lot through ffmpeg's docs for a flag to change this behavior, but I never found one. Perhaps we can find the change in the code by narrowing down affected versions and diffing the source. |
If it ends up being an ffmpeg bug, well... I guess I'll have to complain less about CentOS always being so far behind in software versions in the future, as that may have revealed the source of the issue! :D
Hopefully so! Going to try debugging just to see what value I get for the frame count. Also, the 4k vid just downloaded, so I'll give that a whirl as well. |
Looks like on the failing run, the first four ADTS packets are fine. The fifth one has a "num_blocks_minus_1" of 1. If I comment-out the NOTIMPLEMENTED() macro and just let the function return false to skip the packet: There doesn't seem to be a pattern. Actually, scratch those frame numbers. I also see multiple failure logs on startup: These errors are from the first 17 frames, which have a channel_configuration_ of 0. After that, the channel_configuration_ is 2, which explains why when I try to load the file in VLC, it claims the audio is in stereo. I'm really beginning to think this might be an ffmpeg issue. |
Running ffmpeg on the 4k video you linked me to produces a .ts file I can play in VLC and run through the packager just fine. So, I'm tempted to go look at ffmpeg, or one of its codecs. Not sure how to track down which codec I'd need to look at. Edit: according to their docs (https://www.ffmpeg.org/ffmpeg-codecs.html#aac) it's native to ffmpeg. So I'll start looking there. I'm curious if you can play your .ts file in VLC without issue? I believe VLC uses ffmpeg for decoding, no? So if a newer version plays your file fine, then at least their encoder and decoder both handle whatever changes they've made. And I suppose that could mean there's a newer spec of the ADTS packet (or one of the outer containers; that page lists a PES container and a TS packetizer). Since shaka reads these files natively, perhaps shaka's parser might need to be updated to handle any new specs. |
|
Just built ffmpeg directly from source, ran the command, and got a .ts file that fails to load in VLC! Looking at differences in the logs from the two ffmpeg versions:
|
I think it's the PCE log. There's already a bug in for it: https://trac.ffmpeg.org/ticket/6974 (also https://trac.ffmpeg.org/ticket/6983) From the comments, it's been a problem since Nov 9, 2017. A suggested workaround there is to add There is discussion there about the fact that mediainfo (with "-f" option) claims the satellite speaker positions are "Side" instead of "Back". I confirmed this is the case with the output file. However, it is also the case with the .ts file produced by my old copy of ffmpeg. Since we are explicitly specifying 5.1 (not 5.1(side)) to ffmpeg, this makes me think it's just an issue with mediainfo. Also confused why the audio is defaulting to 5.1(side). Will do more research. But seems like, for the moment, you may have a workaround! |
|
This is amazing. Thank you so much for helping track this down! |
For sure! Looking at the final comment on: https://trac.ffmpeg.org/ticket/6965 The fact that shaka cannot decode PCE elements may also be considered an issue here. But the fact that VLC is also choking on the content leads me to believe it really is an issue with ffmpeg. After some further testing, I may submit a bug. |
@PhilBax Thanks a lot for looking into this issue! Also FFmpeg never generates ADTS frame header with number_of_raw_data_blocks_in_frame > 0. See adts_write_frame_header implementation. So it does look like there is a bug in shaka packager: It appears that shaka packager may have mistaken PCE as an ADTS frame and trying to parse it. I checked our logic in LooksForSyncWord, which is indeed problematic.
The code assumes that it is an ADTS frame when it sees the sync word, and then tries to parse it. There is a following check to verify that the data afterwards starts with sync word. That check should be done before parsing the frame, i.e. it should be something like this:
This is actually the original logic before the refactoring in this commit: 2a2493e#diff-6ae092cf89f3ec6efdd42778e77a49b5. I think we should get that back. @PhilBax Would you be interested in trying the above fix and see if it works? |
Okay, looking into it more: In ac3_parser.c, line 144 we have: This runs the channel mode through a table to convert it to a channel layout. It converts the 5.0 channel mode to On the AAC side, aacenc.c line 977 checks if the incoming channel layout is supported, but the change referenced in the previous bug report made the AAC encoder only recognize So, I guess a question for the ffmpeg guys is: should AC3 by default be outputing "5.1(side)" or should it be updated to output "5.1"? I think regardless, the AAC encoder is correctly handling 5.1(side) audio as something that needs PCE encoding, and shaka-packager is not properly handling that kind of audio. So, forcing the AAC encoder to write the audio track as 5.1 instead of 5.1(side) is probably perfectly legitimate. And in addition, the shaka-packager probably should to be fixed to handle AACs with PCE encoding. |
Sure thing! I'll try it out tomorrow morning and let you know what happens!
…On Wed, Sep 25, 2019, 5:03 PM Kongqun Yang ***@***.***> wrote:
@PhilBax <https://github.com/PhilBax> Thanks a lot for looking into this
issue!
Also FFmpeg never generates ADTS frame header with
number_of_raw_data_blocks_in_frame > 0. See adts_write_frame_header
<https://github.com/FFmpeg/FFmpeg/blob/a0ac49e38ee1d1011c394d7be67d0f08b2281526/libavformat/adtsenc.c#L123>
implementation.
So it does look like there is a bug in shaka packager: It appears that
shaka packager may have mistaken PCE as an ADTS frame and trying to parse
it.
I checked our logic in LooksForSyncWord
<https://github.com/google/shaka-packager/blob/6b4a75b6bacdac6b0b9f598c365c8f84f0a7fd70/packager/media/formats/mp2t/es_parser_audio.cc#L35>,
which is indeed problematic.
if (!audio_header->IsSyncWord(cur_buf))
continue;
if (!audio_header->Parse(cur_buf, raw_es_size - offset))
continue;
// Check whether there is another frame |size| apart from the current one.
const size_t remaining_size = static_cast<size_t>(raw_es_size - offset);
const int kSyncWordSize = 2;
if (remaining_size >= audio_header->GetFrameSize() + kSyncWordSize &&
!audio_header->IsSyncWord(&cur_buf[audio_header->GetFrameSize()])) {
continue;
}
The code assumes that it is an ADTS frame when it sees the sync word, and
then tries to parse it. There is a following check to verify that the data
afterwards starts with sync word. That check should be done before parsing
the frame, i.e. it should be something like this:
if (!audio_header->IsSyncWord(cur_buf))
continue;
// Check whether there is another frame |size| apart from the current one.
const size_t remaining_size = static_cast<size_t>(raw_es_size - offset);
const int kSyncWordSize = 2;
const int frame_size = /* Get Frame size without parsing the frame */
if (frame_size < audio_header->GetMinFrameSize())
continue;
if (remaining_size < frame_size)
return false;
if (remaining_size >= frame_size + kSyncWordSize &&
!audio_header->IsSyncWord(&cur_buf[frame_size])) {
continue;
}
if (!audio_header->Parse(cur_buf, frame_size))
continue;
This is actually the original logic before the refactoring in this commit:
2a2493e#diff-6ae092cf89f3ec6efdd42778e77a49b5
<2a2493e#diff-6ae092cf89f3ec6efdd42778e77a49b5>.
I think we should get that back.
@PhilBax <https://github.com/PhilBax> Would you be interested in trying
the above fix and see if it works?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#598>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABZODHYKNU4RLOLGFEOFIGDQLPN2TANCNFSM4HZEHMVQ>
.
|
@kqyang How should I get the frame size without parsing the frame? Especially since, as I understand it, AACs with PCE encoding have extra data in the frame, no? |
A change in ffmpeg v4 led to a change in the ADTS output structure for 5.1 surround sound. It seems that the channel layout changed from 5.1 to 5.1(side). This change corresponds to its triggering of a Packager issue in which the ADTS structure is not supported. To work around this and packager 6-channel audio, explicitly set the channel format in ffmpeg. See also shaka-project/shaka-packager#598, where this was discussed. The solution was found on https://trac.ffmpeg.org/ticket/6974 . Closes #6 Change-Id: Ief88bdcae9756d0c264ec36ee1160940193cfead
@PhilBax What I meant was not parsing the other irrelevant fields, e.g. something like this: size_t GetAdtsFrameSize(const uint8_t* data, size_t num_bytes) {
Yes, the frame size returned above includes PCE data. |
oh! Sorry, I see that in the refactor commit you pointed me toward. Will hook it up! |
Ok, so I did the following:
It handles my normal ts files fine, but when I run it on the surround.ts that joey submitted higher up on this thread (with PCE encoding), I get
:( |
@PhilBax Thanks for trying! Can you submit a pull request for the changes you made so I can take a look? |
Done! |
Thanks. Were you running the debug build or release build? If you were running release build, can you try debug build? It may give more debug information.
|
I was running Release. Debug gives me a ton of these: Presumably because, when PCE encoding is enabled, channel_configuration is meant to be 0, as the channel configuration is defined in the PCE data. |
Yes, I think you are right, so we actually have two different but related problems, both caused by the PCE data. For Can you update the code to allow it to be 0? There may be a few places you need to update. |
Updated the pull request with that change. The resulting AAC in the mp4 file now matches the one in the .ts file. That is to say, mediainfo now shows the following for the AAC track in both files:
For documentation purposes, mediainfo reports:
The latest VLC (3.0.8) plays the 5.1 version of both the .ts and .mp4 fine. However, the 5.1(side) version of both the .ts and .mp4 alternates between sound and silence every ~0.25 sec. It reports the audio as being in stereo. The default video player on CentOS (Totem) plays the 5.1 version of both the .ts and .mp4 fine. However, it cannot play the 5.1(side) .ts file. Additionally, the 5.1(side) .mp4 file plays clearly, but the everything is shifted into the left speaker (I only have stereo headphones attached to this computer). It reports the audio as being 6-channel. Amarok plays both sets of mp4 and ts files no problem. With my stereo-only headphones, if I'm being picky, it seems like the violin is slightly quieter and to-the-left in the 5.1(side) version. Probably worth testing on a 5.1 setup. Rythmbox behaves exactly like Totem (5.1 ts/mp4 play fine, 5.1(side) mp4 plays in left speakers, 5.1(side) ts doesn't load). I figure it must be relying on the gstreamer plugins, which is what Totem uses as well. I'm not sure exactly what all of this means. I don't know if ffmpeg is writing out poorly-formed AAC+PCE files, if no player is reading them in properly (except, perhaps, Amarok), or if 5.1(side) is simply not a "valid" AAC channel layout. When it comes to shaka, specifically, the fact that VLC loads but incorrectly plays both the .ts and .mp4 versions of the 5.1(side) audio make me think shaka is behaving correctly, and simply "passing on" any issues to output file. This is backed up by the fact that Amarok also treats both the .ts and .mp4 the same. However, the fact that the gstreamer codecs won't touch the 5.1(side) .ts file, but will load the .mp4 file but shift all the audio to the left makes me think maybe shaka isn't behaving 100% as-expected. Up to y'all whether you want to consider the issue "fixed" for now. |
@PhilBax Excellent! Thanks for the detailed investigation and experiments. Do you mind doing one additional check? Can you see what happens with the mp4 muxed by FFmpeg? $ ffmpeg -i surround.ts -c:a copy -c:v copy surround.mp4 It worth checking whether it behaves the same way as the mp4 generated by shaka with your change. |
Good call! Just tried, and the ffmpeg-generated mp4 behaves the same as the shaka-generated mp4.
|
@PhilBax Thanks for the confirmation. I think we can merge your pull request and consider the issue "fixed". Before merging your pull request, can you clean it up? I also left a few review comments in the pull request. |
One last issue. We do not parse PCE to get the actual number of channels, so it will show a channel count of 0 in mp4 metadata and DASH / HLS manifests. @joeyparrish Do you see any problems with that during playback (in Shaka Player)? |
Yes, that could break channel-count-related logic in the player. Or at least cause surprising track choices that didn't honor channel-related configs. |
I can also confirm that Shaka Player does not play my streams which contain 5.1(side) AAC as encoded by ffmpeg |
System info
Operating System: gLinux
Shaka Packager Version: v2.3.0
Issue and steps to reproduce the problem
Packager Command:
surround.zip
What is the expected result?
I would expect to see the output file generated.
What happens instead?
Instead, I get lots of this:
Followed by this:
And no output.mp4 is generated.
If Instead, I convert the 5.1 surround into stereo, everything works:
# Use -ac 2 this time to convert down to stereo (attached below) ffmpeg -i Sintel.2010.4k.mkv -f mpegts -map 0:a -c:a aac -b:a 192k -t 30 -ac 2 stereo.ts packager input=stereo.ts,stream=audio,output=output.mp4 --mpd_output output.mpd
stereo.zip
The text was updated successfully, but these errors were encountered: