Skip to content

VP8 performance tweaks

Brion Vibber edited this page Feb 26, 2017 · 6 revisions

The biggest single performance hog in VP8 decoding is the in-loop deblocking filter -- it takes at least 1/3 of CPU time in Safari at 480p with existing transcodes, and is hard to optimize without SIMD instructions:

  • works with signed chars, which is SIMD-friendly...
  • requires clamping at multiple stages
  • requires multiplication and shifting in a couple places
  • does addition that needs clamping

Due to the clamping, multiplication etc needs, can't just do 4 bytes next to each other in a 32-bit word.

It might be friendly to GPU usage, but that's a large undertaking.

Simple loop filter

The recommendation in the spec for "low-power devices" is to encode with the "simple" version of the loop filter forced, which can be done by passing -profile 1 to vpxenc, or -profile:v 1 to ffmpeg. This requires re-encoding files, and has a quality trade-off, but makes a HUGE difference in decode speed.

Performance

Safari:

  • early 2015 MacBook Pro 13: 720p (comfy) / 1080p (barely)
  • iPad Pro 9.7: 720p (comfy) / 1080p (barely)
  • iPad Air: 480p (comfy)
  • iPhone 5c: 240p (moderate)

Edge:

  • early 2015 MacBook Pro 13: 720p (comfy) / 1080p (barely)
  • Atom laptop: 480p

IE:

  • Atom laptop: 240p

This performance sits between the current VP8 performance (with "normal" loop filter used) and the current Theora performance (with Theora's decreased complexity), and in my opinion closes the gap enough that I'd be willing to use VP8 alone, with no Theora version, for adaptive streaming in the future. (Theora requires reapplying header packets when switching resolutions, and Ogg has weird page/packet properties that could make switching streams difficult. VP8 should work with off the shelf dash players, with only slight modifications.)

Quality trade-off

The simple deblocking filter is not as effective and can leave slightly more visible blocking in high-motion scenes and clear blue skies, etc. However it still usually looks better than Theora in my informal testing. Could increase bitrate moderately to compensate, but I'm not convinced it's necessary.

Clone this wiki locally