H.264 Decoder Overview

H.264 Advanced Video Coding is an ITU standard for encoding and decoding video with a target coding efficiency twice that of H.262 (MPEG2). For example, it enables PAL resolution video to be transmitted at 1Mbit/sec. Like other video coding standards, H.264 specifies how to reconstruct or decode video from coded bits but does not specify how to encode video. H.264 shares many of the techniques used in other video codecs and adds new variations to improve coding efficiency. Coding efficiency is defined in terms of the log of the ratio of number of bits required to encode a video over the number of bits in the original video.

H.264 uses a variety of techniques to reduce the number bits necessary to encode video. It uses intra-prediction, to predict a video block from other video blocks within the same frame. It uses inter-prediction to predict video blocks from blocks in previous frames.

H.264 operates on 4x4 as well as 8x8 pixel blocks, unlike previous standards that only operaed on 8x8 blocks.

Network Adaptation Layer

Variable Length Coding

Discrete Cosine Transformation and Quantization

Intra Prediction: Spatial Correlation

Inter Prediction: Motion estimation and motion compensation

H.264, like many other video compression, relies on

To reduce computational complexity and to promote uniformity of implementations, H.264 uses an integerized approximation of the Discrete Cosine Transformation.

Deblocking Filter

H.264 incorporates a "deblocking" filter to smooth the artifacts caused by operating on square blocks of pixels. This filter is incorporated into the encoding loop.

H.264 in Software

We started this work by examining the ITU reference source code for H.264 and the ffmpeg open source implementation.

In the course of the project, we have learned that the standard reference implementations are of no use in studying the computational complexity. The reference implementations perform poorly on general purpose processors and are unusable as a starting point for the design of hardware accelerators. Standard software implementations obscure data dependences and concurrency. We switched to ffmpeg, which has been optimized for a variety of CPUs including x86 and ARM.

We found that the software implementations of H.264 were unsuitable as a starting point for hardware acceleration because everything is put into global memory and control flow is explicitly sequential. Because a small number of large structures are used and passed throughout the source code, it is very difficult to find and expose any concurrency or parallelism.

H.264 in Bluespec

Our approach was to re-implement H.264 decoder in Bluespec, keeping the modularity of the algorithm and exposing its parallelism. Each of the components in the H.264 block diagram is implemented as a transactor. We define a transactor as ... The transactors are connected by FIFOs, decoupling the execution of each component.


   // Instantiate the modules

   INalUnwrap     nalunwrap     <- mkNalUnwrap();
   IEntropyDec    entropydec    <- mkEntropyDec();
   IInverseTrans  inversetrans  <- mkInverseTrans();
   IPrediction    prediction    <- mkPrediction();
   IDeblockFilter deblockfilter <- mkDeblockFilter();

   // Internal connections
   
   mkConnection( nalunwrap.ioout, entropydec.ioin );
   mkConnection( entropydec.ioout, inversetrans.ioin );
   mkConnection( inversetrans.ioout, prediction.ioin );
   mkConnection( prediction.ioout, deblockfilter.ioin );


Architectural Exploration

Algorithm Exploration

- check wen mei hu for references

ARMO/h264-paper (last edited 2006-10-24 22:06:22 by JameyHicks)