Power and Speed-Efficient Code Transformation of Video Compression Algorithms for RISC Processors

Nachtergaele, Lode; Gijbels, Toon; Bormans, Jan; Catthoor, Francky; Bolsens, Ivo

doi:10.1023/A:1008135917341

Power and Speed-Efficient Code Transformation of Video Compression Algorithms for RISC Processors

Published: 01 February 2001

Volume 27, pages 161–169, (2001)
Cite this article

Journal of VLSI signal processing systems for signal, image and video technology Aims and scope Submit manuscript

Lode Nachtergaele¹,
Toon Gijbels¹,
Jan Bormans¹,
Francky Catthoor¹ &
…
Ivo Bolsens¹

61 Accesses
2 Citations
Explore all metrics

Abstract

Upcoming multi-media compression applications will require high memory bandwidth. In this paper, we estimate that a software reference implementation of an MPEG-4 video decoder typically requires 200 Mtransfers/s to memory to decode 1 CIF (352×288) Video Object Plane (VOP) at 30 frames/s. This imposes a high penalty in terms of power but also performance.

However, we also show that we can heavily improve on the memory transfers, without sacrificing speed (even gaining about 10% on cache misses and cycles for a DEC Alpha), by aggressive code transformations. For this purpose, we have manually applied an extended version of our data transfer and storage exploration (DTSE) methodology, which was originally developed for custom hardware implementations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Survey on Performance Comparison of Video Coding Algorithms

VC-Bench: A Video Coding Benchmark Suite for Evaluation of Processor Capability

Survey on Algorithm and VLSI Architecture for MPEG-Like Video Coder

Article 05 September 2016

References

L. Nachtergaele, F. Catthoor, B. Kapoor, S. Janssens, and D. Moolenaar, “Low Power Data Transfer and Storage Exploration for h. 263 Video Decoder System. ” IEEE journal on Selected Areas in Communication, vol. 16, no. 1, 1998, pp. 120–129.
Article Google Scholar
P. Baglietto, M.Maresca, M.Migliardi, and N. Zingirian, “Image Processing on High-Performance Risc Systems, ” Proceeding of the IEEE, vol. 84, no. 7, 1996, pp. 917–930.
Article Google Scholar
D.A. Patterson and J.L. Hennessy, “Computer Architechture: A Quantitative Approach, ” Morgan Kaufmann Publishers, Inc., 1996.
F. Catthoor, M. Janssen, L. Nachtergaele, and H. De Man, “System-Level Data-Flow Transformation Exploration and Power-Area Trade-Offs Demonstrated on Video Codecs, ” Journal of VLSI Signal Processing, vol. 18, no. 1, 1998, pp. 39–50, Special issue on System Level Trade-off Analysis in Signal Processing.
Article Google Scholar
E. De Greef, F. Catthoor, and H. De Man, “Array Placement for Storage Size Reduction in Embedded Multimedia Systems, ” In Proceedings of the International Conference on Application Spe-cific Systems.Architectures and Processors, pp. 66–75, Zurich, Switzerland, July 1997. IEEE.
E. De Greef, F. Catthoor, and H. De Man, “Program Trans-formation Strategies for Memory Size and Power Reduction of Pseudo-Regular Multimedia Subsystems Mapped on Multi-Processor Architectures, ” IEEE Transactions on Circuits and Systems for Video Technology, vol. 8, no. 6, 1998, pp. 719–733.
Article Google Scholar
T. Sikora, “The MPEG-4 Video Standard Verification Model, ” IEEE Transactions on Circuits and Systems for Video Technology, vol. 7, no. 1, 1997, pp. 19–31.
Article Google Scholar
Digital Video Coding at Telenor R & D.Telenor's h.263 soft-ware. version 1.3. February 1995. http://www.nta.no./ brukere/DVC/h263software/.
K. Rijkse, “Video Coding for NarrowTelecommunication Channels at < 64 kbit/s, ” Technical Report, Telenor R & D,1995.
F. Catthoor, S. Wuytack, E. De Greef, F. Fransen, L. Nachtergaele, and H. De Man, “System-Level Transformations for Low Data Transfer and Storage, ” In Low Power CMOS Design, B. Brodersen and A. Chandrakasa (Eds.), IEEE Press, 1997, pp. 609–618.
S.-M. Moon and K. Ebcioglu, “A Study on the Number of Mem-ory Ports in Multiple Issue Machines, ” In IMICRO'S 26,Nov. 1993, pp. 49–58.
A. Faruque and D. Fong, “Performance Analysis Through Mem-ory of a Proposed Parallel Architecture for the Efficient Use of Memory in Image Processing Application, ” in Proc.SPIE'91, Visual Communications and Image Processing, Boston, MA, Oct. 1991, pp. 865–877.
E. Torrie, M. Martonosi, M. Hall, and C.-W. Tseng, “Characterizing the Memory Behavior of Compiler-Parallelized Applications, ” IEEE Trans.on Parallel and Distributed Systems, vol. 7, no. 12, 1996, pp. 1224–1236.
Article Google Scholar
O. Arregi, C. Rodriquez, and A. Ibarra, “Evaluation of the Op-tional Strategy for Managing the Register File, ” Microprocessing and Microprogramming, vol. 30, 1990, pp. 143–150.
Article Google Scholar
F. Bodina, W. Jalby, D. Winndheiser, and C. Eisenbeis, AQuantitative Algorithm For Data Locality Optimization, ” Technical Report, IRISA/INRIA, Rennes, France, 1992.
Google Scholar
D. McCrackin, “Eliminating Interlocks in Deeply Pipelined Processors by Delay Enforced Multistreaming, ” IEEE Trans.on Computers, vol. C-40, no. 10, 1991, pp, 1125–1132.
Article Google Scholar
R. Allen and K. Kennedy, “Vector Register Allocation, ” IEEE Transactions on Computers, vol. 41, no. 10, 1992, pp. 1290–1316.
Article Google Scholar
M. Al-Mouhamed and S. Seiden, “A Heuristic Storage for Min-imizing Access Time of Arbitrary Data Paterns, ” IEEE Trans. on Parallel and distributed Systems, vol. 8, no. 4, 1997, pp. 441–447.
Article Google Scholar
M. Dubois and J.-C. Wang, “Analytical Modeling of Data Shar-ing in Cache Based Multiprocessors, ” Technical Report CENG 89–18, University Southern California, June 1989.
K. Gharachorloo, A. Gupta, and J. Hennessy, “Performance Evaluation of Memory Consistency Models for Shared-Momory Multiprocessors, ” in Fourth Intnl.Conf.on Arch.Support for Progr.Lang.and Oper.Systems, April 1991, pp. 245–257.
L. Liu, “Issues in Multi-Level Cache Design, ” in Proc.IEEE Int.Conf.on Computer Design, Cambridge, MA, Oct. 1994, pp. 46–52.
P. Stenström, “A Survey of Cache Coherence Schemes for Mul-tiprocessors, ” IEEE Computer, vol. 23, no. 6, 1990, pp. 12–24.
Article Google Scholar
J.D. Gee and A.J. Smith, “Analysis of Multiprocessor Memory Reference Behavior, ” in IICCD, New York, Oct. 1994, pp. 53–59.
L. Choi and P.-C. Yew, “A Compiler-Durected Cache Coherence Scheme With Improved Intertask Locality, ” in Proc.Supercom-puting, Washington DC, Nov. 1994.
A. Choir and M. Ruschitzka, “Managing Locality Sets: The Model and Fixed-Size Bufferss, ” IEEE Trans.on Computers, vol. 422, no. 2, 1993, pp. 190–204.
Google Scholar
M. Mace, Memory Storage Patterns in Parallel Processing, Boston: Kluwer Academic Publishers, 1987.
Book Google Scholar
W. Li and K. Pingali, “A Singular Loop Transformation Frame-work Based on Non-Singular Matrices, ” in Proc.5th Annual Workshop on Languages and Compilers for Parallelism, Aug. 1992.
D.A. Padua and M.J. Wolfe, “Advanced Compiler Optimizations for Supercomputers, ” Communications of the ACM, vol. 29, no. 12, 1986, pp. 1184–1201.
Article Google Scholar
S.P. Amarasinghe, J.M. Anderson, M.S. Lam, and C.W. Tseng, “The SUIF Compiler for Scalable Parallel Machines, ” in Proceedings of the Seventh SIAM Conference on Parallel Processing for Scientific Computing, 1995.
J.Z. Fang and M. Lu, “An Iteration Partition Approach for Cache or Local Memory Thrashing on Parallel Processing, ” IEEE Trans.on Computers, vol. C-42, no. 5, 1993, pp. 529–546.
Article Google Scholar
D. Kulkarni, M. Stumm, and R.C. Unrau, “Implementing Flexible Computation Rules with Subexpression-Level Loop Transferormations, ” in Proceedings of the Euro-Par95, Aug. 1995.
N. Manjikian and T. Abdelrahman, “Reduction of Cache Conflicts in Loop Nests, ” Technical Report CSRI-318, Computer Systems Research Institue, Tornato, Canada, March 1995.
Google Scholar
M. Jimenez, J. Llaberia, A. Fernandez, and E. Morancho, “A Unified Transformation Technique for Multi-Level Blocking, ” in Proc.EuroPar Conference, Lyon, France, Aug. 1996, pp. 402–405.
L. Nachtergaele, D. Moolenaar, B. Vanhoof, F. Catthoor, and H. De Man, “System-Level Power Optimization of Video Codecs on Embedded Cores: A Systematic Approach, ” Journal on VLSI Signal Processing, vol. 18, no. 2, 1998, pp. 89–109, Special issue “Future directions in the design and implementation of DSP systems”.
Article Google Scholar
J. Bormans, K. Denolf, S. Wuytac, L. Nachtergaele, and I. Bolsens, “Integrating System-Level Low Power Methodologies into a Real-Life Design Flow, ” In PATMOS'99 Ninth International Workshop Power and Timing Modeling.Optimization and Simulation, Kos Island, Greece, Oct. 1999, pp. 19–28.

Download references

Author information

Authors and Affiliations

Interuniversity Micro Electronics Centrum (IMEC), Leuven, Belgium
Lode Nachtergaele, Toon Gijbels, Jan Bormans, Francky Catthoor & Ivo Bolsens

Authors

Lode Nachtergaele
View author publications
You can also search for this author in PubMed Google Scholar
Toon Gijbels
View author publications
You can also search for this author in PubMed Google Scholar
Jan Bormans
View author publications
You can also search for this author in PubMed Google Scholar
Francky Catthoor
View author publications
You can also search for this author in PubMed Google Scholar
Ivo Bolsens
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nachtergaele, L., Gijbels, T., Bormans, J. et al. Power and Speed-Efficient Code Transformation of Video Compression Algorithms for RISC Processors. The Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology 27, 161–169 (2001). https://doi.org/10.1023/A:1008135917341

Download citation

Published: 01 February 2001
Issue Date: February 2001
DOI: https://doi.org/10.1023/A:1008135917341

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Power and Speed-Efficient Code Transformation of Video Compression Algorithms for RISC Processors

Abstract

Access this article

Similar content being viewed by others

A Survey on Performance Comparison of Video Coding Algorithms

VC-Bench: A Video Coding Benchmark Suite for Evaluation of Processor Capability

Survey on Algorithm and VLSI Architecture for MPEG-Like Video Coder

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Power and Speed-Efficient Code Transformation of Video Compression Algorithms for RISC Processors

Abstract

Access this article

Similar content being viewed by others

A Survey on Performance Comparison of Video Coding Algorithms

VC-Bench: A Video Coding Benchmark Suite for Evaluation of Processor Capability

Survey on Algorithm and VLSI Architecture for MPEG-Like Video Coder

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation