A templated programmable architecture for highly constrained embedded HD video processing

Thevenin, Mathieu; Paindavoine, Michel; Schmit, Renaud; Heyrman, Barthelemy; Letellier, Laurent

doi:10.1007/s11554-018-0808-6

A templated programmable architecture for highly constrained embedded HD video processing

Special Issue Paper
Published: 30 July 2018

Volume 16, pages 143–160, (2019)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

Mathieu Thevenin¹,
Michel Paindavoine²,
Renaud Schmit¹,
Barthelemy Heyrman² &
…
Laurent Letellier¹

295 Accesses
1 Citation
Explore all metrics

Abstract

The implementation of a video reconstruction pipeline is required to improve the quality of images delivered by highly constrained devices. These algorithms require high computing capacities—several dozens of GOPs for real-time HD 1080p video streams. Today’s embedded design constraints impose limitations both in terms of silicon budget and power consumption—usually 2 mm\(^2\) for half a Watt. This paper presents the eISP architecture that is able to reach 188 MOPs/mW with 94 GOPs/mm\(^2\) and 378 GOPs/mW using TSMC 65-nm integration technology. This fully programmable and modular architecture, is based on an analysis of video-processing algorithms. Synthesizable VHDL is generated taking into account different parameters, which simplify the architecture sizing and characterization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recent progress in InGaZnO FETs for high-density 2T0C DRAM applications

Article 21 September 2023

Shengzhe Yan, Zhaori Cong, … Qing Luo

Ptychography

Open-source design of integrated circuits

Article Open access 09 January 2024

Patrick Fath, Manuel Moser, … Harald Pretl

Notes

An in-depth study of this point could help optimize the results obtained, but it is beyond the scope of this paper.

References

Chalamalasetti, S.R., Purohit, S., Margala, M., Vanderbauwhede, W.: MORA—an architecture and programming model for a resource efficient coarse grained reconfigurable processor. In: 2009 NASA/ESA conference on adaptive hardware and systems, IEEE, pp 389–396 (2009). https://doi.org/10.1109/AHS.2009.37
Chao, W.M., Chen, L.G.: Pyramid architecture for 3840 x 2160 quad full high definition 30 frames/s video acquisition. Circ Syst Video Technol IEEE Trans 20(11), 1499–1508 (2010). https://doi.org/10.1109/TCSVT.2010.2077770
Article Google Scholar
Chen, J.C., Chien, S.Y.: CRISP: coarse-grained reconfigurable image stream processor for digital still cameras and camcorders. IEEE Trans Circ Syst Video Technol 18(9), 1223–1236 (2008). https://doi.org/10.1109/TCSVT.2008.928529
Article Google Scholar
Chen, P.Y., Lien, C.Y., Lin, Y.M.: A real-time image denoising chip. In: Circuits and systems, 2008. ISCAS 2008. IEEE international symposium on, pp. 3390–3393 (2008). https://doi.org/10.1109/ISCAS.2008.4542186
Chen, T.H., Chen, J.C., Cheng, T.Y., Chien, S.Y.: CRISP-DS: dual-stream coarse-grained reconfigurable image stream processor for HD digital camcorders and digital still cameras. In: Solid-state circuits conference, 2009. A-SSCC 2009. IEEE Asian, IEEE, pp. 193–196 (2009). https://doi.org/10.1109/asscc.2009.5357150
Conti, F., Schilling, R., Schiavone, P.D., Pullini, A., Rossi, D., Gurkaynak, F.K., Muehlberghuber, M., Gautschi, M., Loi, I., Haugou, G., Mangard, S., Benini, L.: An iot endpoint system-on-chip for secure and energy-efficient near-sensor analytics. IEEE Trans Circ Syst I Regular Papers 64(9), 2481–2494 (2017). https://doi.org/10.1109/TCSI.2017.2698019
Article Google Scholar
David, R., Chillet, D., Pillement, S., Sentieys, O.: DART: a dynamically reconfigurable architecture dealing with future mobile telecommunications constr. In: Proceedings 16th international parallel and distributed processing symposium, IEEE Comput. Soc, pp. 156+ (2002). https://doi.org/10.1109/IPDPS.2002.1016554
Desoli, G., Chawla, N., Boesch, T., Singh, S.P., Guidetti, E., Ambroggi, F.D., Majo, T., Zambotti, P., Ayodhyawasi, M., Singh, H., Aggarwal, N.: 14.1 a 2.9tops/w deep convolutional neural network soc in fd-soi 28nm for intelligent embedded systems. In: 2017 IEEE international solid-state circuits conference (ISSCC), pp. 238–239 (2017). https://doi.org/10.1109/ISSCC.2017.7870349
Di Carlo, S., Prinetto, P., Rolfo, D., Trotta, P.: AIdi: an adaptive image denoising FPGA-based IP-core for real-time applications. In: Adaptive hardware and systems (AHS), 2013 NASA/ESA conference on, pp. 99–106 (2013). https://doi.org/10.1109/AHS.2013.6604232
Du, Y., Du, L., Li, Y., Su, J., Chang, M.F.: A streaming accelerator for deep convolutional neural networks with image and feature decomposition for resource-limited system applications. CoRR abs/1709.05116:1–5 (2017). http://arxiv.org/abs/1709.05116 (1709.05116)
Evain, S., Diguet, J.P.: Houzet D (2006) NoC design flow for TDMA and QoS management in a GALS context. EURASIP J Embedded Syst 1, 4–4 (2006)
Google Scholar
Franzen, R.: Kodak lossless true color image suite (1999). http://r0k.us/graphics/kodak/
Garcia-Lamont, J., Aleman-Arce, M., Waissman-Vilanova, J.: A digital real time image demosaicking implementation for high definition video cameras. In: Electronics, robotics and automotive mechanics conference, 2008. CERMA ’08, pp. 565–569 (2008). https://doi.org/10.1109/CERMA.2008.78
Gentile, A., Wills, D.S.: Portable video supercomputing. IEEE Trans Comput 53(8), 960–973 (2004). https://doi.org/10.1109/TC.2004.48
Article Google Scholar
Global Sources: Mobile phone camera modules—mobile phones spur output growth, r&d activities in camera modules segment. Glob Sour Part 1–4: NA (2009)
Gonzalez, R.: Xtensa: a configurable and extensible processor. Micro IEEE 20(2), 60–70 (2000). https://doi.org/10.1109/40.848473
Article Google Scholar
Goossens, K., Hansson, A.: The aethereal network on chip after ten years: goals, evolution, lessons, and future. In: Proceedings of the 47th design automation conference, ACM, New York, NY, USA, DAC ’10, pp. 306–311 (2010). https://doi.org/10.1145/1837274.1837353
Goossens, K., Dielissen, J., Radulescu, A.: Aethereal network on chip: concepts, architectures, and implementations. Design Test Comput IEEE 22(5), 414–421 (2005). https://doi.org/10.1109/MDT.2005.99
Article Google Scholar
Hartmann, M., Pantazis, V., Vander Aa, T., Berekovic, M., Hochberger, C.: Still image processing on coarse-grained reconfigurable array architectures. J Signal Process Syst 60(2), 225–237 (2010). https://doi.org/10.1007/s11265-008-0309-0
Article Google Scholar
Jin, W., He, G., He, W., Mao, Z.: A 12-bit \(4928 \times 3264\) pixel cmos image signal processor for digital still cameras. Integr VLSI J 59, 206–217 (2017). https://doi.org/10.1016/j.vlsi.2017.06.005
Article Google Scholar
Juan, E.S.S.: Optimizing VLIW architecture for multimedia application. PhD thesis, Universitat Politècnica de Catalunya (2007)
Kapasi, U., Rixner, S., Dally, W., Khailany, B., Ahn, J., Mattson, P., Owens, J.: Programmable stream processors. Computer 36(8), 54–62 (2003). https://doi.org/10.1109/MC.2003.1220582
Article Google Scholar
Khailany, B.K., Williams, J., Long, E.P., Rygh, M., Tovey, D.W., Dally, W.J.: A programmable 512 GOPS stream processor for signal, image, and video processing. Solid State Circ IEEE J 43(1), 202–213 (2008). https://doi.org/10.1109/JSSC.2007.909331
Article Google Scholar
Khawam, S., Nousias, I., Milward, M., Yi, Y., Muir, M., Arslan, T.: The reconfigurable instruction cell array. IEEE Trans Very Large Scale Integr (VLSI) Syst 16(1), 75–85 (2008). https://doi.org/10.1109/TVLSI.2007.912133
Article Google Scholar
Lopez, D., Llosa, J., Valero, M., Ayguade, E.: Widening resources: a cost-effective technique for aggressive ILP architectures. In: Microarchitecture, 1998. MICRO-31. Proceedings. 31st annual ACM/IEEE international symposium on, pp. 237–246 (1998). https://doi.org/10.1109/MICRO.1998.742785
Millberg, M., Nilsson, E., Thid, R., Kumar, S., Jantsch, A.: The nostrum backbone-a communication protocol stack for networks on chip. In: VLSI design, 2004. Proceedings. 17th international conference on, pp. 693–696 (2004). https://doi.org/10.1109/ICVD.2004.1261005
Paindavoine, M., Boisard, O., Carbon, A., Philippe, J.M., Brousse, O.: Neurodsp accelerator for face detection application. In: Proceedings of the 25th edition on great lakes symposium on VLSI, ACM, New York, NY, USA, GLSVLSI ’15, pp. 211–215 (2015). https://doi.org/10.1145/2742060.2743769. http://doi.acm.org/10.1145/2742060.2743769
Philippe, J.M., Carbon, A., Schmit, R.: Neurodsp: a multi-purpose energy-optimized accelerator for neural networks. In: Design, automation and test in Europe (DATE) 2016 conference, p. UB06.9 (2016). https://www.date-conference.com/date16/conference/session/UB06
Rixner, S., Dally, W.J., Khailany, B., Mattson, P., Kapasi, U.J., Owens, J.D.: Register organization for media processing. In: Sixth international symposium on high-performance computer architecture, 2000. HPCA-6, pp. 375–386 (2000)
Rossi, D., Pullini, A., Loi, I., Gautschi, M., Gürkaynak, F.K., Bartolini, A., Flatresse, P., Benini, L.: A 60 GOPS/W, \(-1.8\)–0.9 V body bias ULP cluster in 28 nm UTBB FD-SOI technology. Solid State Electron 117, 170–184 (2016). https://doi.org/10.1016/j.sse.2015.11.015
Article Google Scholar
Saidani, T., Lacassagne, L., Falcou, J., Tadonki, C., Bouaziz, S.: Parallelization schemes for memory optimization on the cell processor: a case study on the harris corner detector. Transaction HiPEAC 3, 177–200 (2011)
Google Scholar
Seo, S., Dreslinski, R.G., Woh, M., Chakrabarti, C., Mahlke, S., Mudge, T.: Diet soda: a power-efficient processor for digital cameras. In: 2010 ACM/IEEE international symposium on low-power electronics and design (ISLPED), pp. 79–84 (2010). https://doi.org/10.1145/1840845.1840862
Singh, H., Lee, M.H., Lu, G., Kurdahi, F.J., Bagherzadeh, N., Chaves Filho, E.M.: MorphoSys: an integrated reconfigurable system for data-parallel and computation-intensive applications. IEEE Transactions on Computers 49(5), 465–481 (2000). https://doi.org/10.1109/12.859540
Article Google Scholar
Sparsoe, J.: Design of networks-on-chip for real-time multi-processor systems-on-chip. In: Application of concurrency to system design (ACSD), 2012 12th international conference on, pp. 1–5 (2012). https://doi.org/10.1109/ACSD.2012.27
Texier, M., Piriou, E., Thevenin, M., David, R.: Designing processors using mass, a modular and lightweight instruction-level exploration tool. In: Design and architectures for signal and image processing (DASIP), 2011 conference on, pp. 1–6 (2011). https://doi.org/10.1109/DASIP.2011.6136870
Thevenin, M., Letellier, L.: Device for the parallel processing of a data stream. International Patent WO/2010/037570 PCT/EP2009/057033:1 (2008)
Thevenin, M., Paindavoine, M., Letellier, L., Heyrman, B.: Embedded processor extensions for image processing. In: Proc. SPIE 7001, photonics in multimedia II, vol 7001, pp. 70,010B–11 (2008). https://doi.org/10.1117/12.780852

Download references

Acknowledgements

Authors are grateful to Nicola Martin, Dominique Debize, John Rander and Jacques Bouchard for their valuable assistance in proofreading and improving accuracy in written skills in English.

Author information

Authors and Affiliations

CEA, LIST—CEA Saclay, Saclay, France
Mathieu Thevenin, Renaud Schmit & Laurent Letellier
University of Burgundy, Burgundy, France
Michel Paindavoine & Barthelemy Heyrman

Authors

Mathieu Thevenin
View author publications
You can also search for this author in PubMed Google Scholar
Michel Paindavoine
View author publications
You can also search for this author in PubMed Google Scholar
Renaud Schmit
View author publications
You can also search for this author in PubMed Google Scholar
Barthelemy Heyrman
View author publications
You can also search for this author in PubMed Google Scholar
Laurent Letellier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mathieu Thevenin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Thevenin, M., Paindavoine, M., Schmit, R. et al. A templated programmable architecture for highly constrained embedded HD video processing. J Real-Time Image Proc 16, 143–160 (2019). https://doi.org/10.1007/s11554-018-0808-6

Download citation

Received: 11 December 2017
Accepted: 18 July 2018
Published: 30 July 2018
Issue Date: 14 February 2019
DOI: https://doi.org/10.1007/s11554-018-0808-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A templated programmable architecture for highly constrained embedded HD video processing

Abstract

Access this article

Similar content being viewed by others

Recent progress in InGaZnO FETs for high-density 2T0C DRAM applications

Ptychography

Open-source design of integrated circuits

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A templated programmable architecture for highly constrained embedded HD video processing

Abstract

Access this article

Similar content being viewed by others

Recent progress in InGaZnO FETs for high-density 2T0C DRAM applications

Ptychography

Open-source design of integrated circuits

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation