H.264 Video Decoder
Goals
- Bluespec implementation of H.264 decoder to support HD (720p) video
- Full implementation of H.264 on FPGA
- Architecture exploration for H.264 decoder blocks using bluespec
- Low power implementation using sub-threshold logic and dynamic voltage scaling
Source Code
Paper
References
Overview of MPEG-4 Part-9 reference HW description
- Basically, its goal is to provide reference HW description (in HDL form) of certain H.264 modules, as an alternative to the reference software implementation of those H.264 modules. This is to make it easier to do conformance testing of hardware implementation of H.264 modules.
A Software/Hardware Platform For Rapid Prototyping of Video and Multimedia Designs ( SwHwPlatRapidProtoVideoVirtualSocket05.pdf )
This paper describes a platform architecture (called the virtual socket platform) that allows hardware accelerator prototypes (for H.264 modules) to be integrated easily with H.264 reference software. This facilitates the conformance testing of the hardware accelerator prototypes. An example for the platform architecture is given using the FPGA-based WildCard II PC card.
Advances in Hardware Architectures for Image and Video Coding — A Survey ( AdvHwArchImgVideoCodeSurveyProcIeee05.pdf )
- Although the focus is on MPEG-4 Part 2 (and JPEG 2000), it provides some good information about fundamentals of hardware design for video codec. Also, information on motion estimation architectures is quite relevant because they are used for H.264 in later papers.
Algorithms and DSP Implementation of H.264/AVC ( AlgoDspImplH264Y06.pdf )
- This paper gives a good overview of some fast algorithms that can be used to save computation in H.264 intra prediction, motion estimation, and mode decision. It also discusses how to optimize H.264 code for DSP.
A 125uW, Fully Scalable MPEG-2 and H.264/AVC Video Decoder for Mobile Applications ( FullyScaleMpeg2H264DecdIssccY06.pdf )
- Both MPEG-2 and H.264/AVC are integrated in a single chip. Some interesting ideas related to H.264 are the 4x4 block level pipeline, and the prediction circuit and pixel caching (for intra prediction and deblocking filter). There are other potentially interesting ideas (e.g. 1×4 decoding order, context switch buffer, etc.) but no details are given.
On The Way to an H.264 HW/SW Reference Model: A SystemC Modeling Strategy to Integrate Selected IP-Blocks with the H.264 Software Reference Model ( H264HwSwRefModelSystemC05.pdf )
- The motivation here is to be able to integrate H.264 hardware accelerator modules with H.264 reference software so that conformance testing can be easily done. The proposed method is to use SystemC to model H.264 hardware accelerator modules, and to make some changes to H.264 reference software so that it can be integrated into SystemC simulation environment. The paper itself doesn't provide much detail as to how the integration is done, but the author provided some clarification in his email response.
H.264/AVC interpolation optimization ( H264InterpolOptimizeY05.pdf )
- This paper discusses how to efficiently implement sub-pixel interpolation on a processor with subword processing capability (e.g., ARM11's SIMD instructions), taking into account memory access and memory usage.
A 63-mW H.264/MPEG-4 Audio/Visual Codec LSI With Module-Wise Dynamic Voltage/Frequency Scaling ( H264Mpeg4AudioVisualLsiDynVFScaleJnl06.pdf )
- An interesting idea in this paper is the use of fine-grained clock gating policy; i.e., clock gating is applied at the level of submodules inside an individual accelerator. The paper also addresses issues related to changing voltage/frequency on the fly (i.e., the module operation is not required to stop during the voltage/frequency transition).
Hardware Architecture Design of Video Compression for Multimedia Communication Systems ( HwArchVideoCompressCommagAug05.pdf )
- The paper gives an overview of a particular design of H.264 encoder. More details can be found in their follow-up paper "Analysis and Architecture Design of an HDTV720p 30 Frames/s H.264/AVC Encoder".
MPEG4 AVC/H.264 decoder with scalable bus architecture and dual memory controller ( Mpeg4AvcDecdScaleBusDualMem04.pdf )
- The main idea seems to be the use of a separate local bus and memory controller for accessing the reference frame buffer (i.e., separate from the system bus that is used for accessing the frame buffer). The paper asserts that the local bus should be 96 bit wide, which seems to be incorrect.
Overview of the H.264/AVC Video Coding Standard ( OverviewH264CodeStdTrans03.pdf )
- A good overview of H.264 standard. Particularly, it provides a good explanation of the concepts of slice group and adaptive frame-field coding.
A platform-based MPEG-4 advanced video coding (AVC) decoder with block level pipelining ( PlatMpeg4AvcDecdBlkPipe03.pdf )
- A 4-stage macroblock-level pipelining is used. Hardware designs of MC and IQ-IDCT are briefly given.
A 160kGate 4.5kB SRAM H.264 Video Decoder for HDTV Applications ( SRamH264DecdHdtvIssccY06.pdf )
- Some interest ideas in this paper include optimization techniques for CABAD, hardware sharing for different prediction modes, and data reuse in MC. It also minimizes the internal SRAM size by having a DMA-like unit that transfers unused data to an external memory via a dedicated bus and memory controller.
Analysis and Architecture Design of an HDTV720p 30 Frames/s H.264/AVC Encoder ( AnalArchHdtvH264EncdTrans06.pdf )
- An overview of a hardware design of H.264 encoder. Some interesting ideas include the 4-stage macroblock pipelining, architecture designs for integer and fractional motion estimation, hardware sharing for different modes of intra prediction, and CAVLC architecture.
Architecture design for deblocking filter in H.264/JVT/AVC ( ArchDesignDeblkFilterH264Y03.pdf )
- This is probably one of the first papers on hardware design of deblocking filter, which is cited by most others. The key idea is the use of "advanced" processing order of boundaries, which allows more data reuse than the basic processing order used in software implementation of deblocking filter. Since then, there have been many papers that give better designs.
An Efficient Deblocking Filter with Self-Transposing Memory Architecture For H.264/AVC ( EffDeblkFilterSelfTransposeMemArchH264Y06.pdf )
- The main idea is the use of a specialized memory unit that automatically transpose a 4x4 (or 8x8) block after a write and a read. It is not clear what is its advantage as compared with a simple transpose register array.
An In-Place Architecture for the Deblocking Filter in H.264/AVC ( InPlaceArchDeblkFilterH264Trans06.pdf )
- This paper presents a deblocking filter hardware that uses the horizontal-vertical interleaved processing order, which is the processing order that allows maximum data reuse (which minimizes the local memory size). Pipelining is not used (or used minimally). This is probably because the horizontal-vertical interleaved processing order inherently introduces pipeline hazards.
A near optimal deblocking filter for H.264 advanced video coding ( NearOptimalDeblkFilterH264Y06.pdf )
- This paper describes a deblocking filter hardware based on 5-stage pipeline. It uses a processing order that is similar to the horizontal-vertical interleaved processing order, and it adds extra logic to resolve pipeline hazards.
A Pipelined Hardware Implementation of In-loop Deblocking Filter in H.264/AVC ( PipelineHwImplDeblkFilterH264Trans06.pdf )
- This paper describes a deblocking filter hardware based on 4-stage pipeline. It avoids using the horizontal-vertical interleaved processing order due to pipeline hazards; instead, it uses a new processing order (which interleaves every 4 blocks instead of every block) that allows less data reuse but doesn't result in pipeline hazards. The benefit of using pipelining is that the hardware can run at a higher clock rate (due to shorter critical path).
