Modeling and Efficient Simulation of Complex System-on-a-Chip Architectures

Kupriyanov, Olexiy

Modeling and Efficient Simulation of Complex System-on-a-Chip Architectures

Files

812_OlexiyKupriyanovDissertation.pdf (1.56 MB)

Language

en

Document Type

Doctoral Thesis

Issue Date

2009-03-17

Issue Year

2008

Authors

Kupriyanov, Olexiy

Abstract

The presented monograph discussed important aspects of modeling and cycle-accurate simulation in the domain of adaptive processor systems as well as specific architectures at instruction set and register transfer abstraction levels, respectively. The focus is mainly set to the task of generating efficient (in terms of speed and host processor resources usage) simulators. For this purpose, another task of modeling the adaptive multi-processor architectures is discussed. The problem in simulation, namely the integration of different simulation cores at different abstraction levels (instruction set and register transfer levels) into a coupled co-simulation environment is only briefly discussed in the present work. The most remarkable result of this thesis is a design framework which integrates novel design paradigms like model-based design, automated generation of simulator and interactive debugging environment, adaptive multi-processor SoCs, coarse-grained hardware dynamic reconfiguration, power consumption estimation for architecture/compiler co-exploration, and rapid prototyping. Thus, the presented work has shown new directions in electronic design automation of embedded SoC architectures. Contributions The key contributions of the present thesis can be subdivided into three different fields, namely modeling, simulation, as well as design and implementation. Modeling For the purpose of narrowing the gap between the hardware model and the corresponding simulation and compiler tools, Chapter 3 proposed an appropriate modeling architecture description language MAML (MAchine Markup Language) and a methodology for the automated design of embedded parallel processor arrays for computationally intensive media-oriented algorithms. The proposed language was initially aimed at designing of single application-specific instruction set processors (cf. Section 3.1) and was extended here for modeling, simulation of domainspecific adaptive multi-processor architectures (cf. Section 3.2). The parameters extracted from MAML can be used on the one hand for generation of a fast interactive cycle-accurate simulation, and on the other hand for compiler retargeting. An essential property of the proposed modeling approach is a clear separation in description of behavior and structure of the architectural parameters. Thus, this allowed to distinguish between a functional and geometrical parametrization of multi-processor designs. Since the aspects of reconfigurability take a more and more important place in the design of state-of-the-art embedded systems, the modeling of adaptive interconnection became essential. Therefore, a novel concept for modeling of adaptive interconnection networks in the field of parallel processor embedded architectures was introduced in Section 3.3. Two main interconnection concepts were presented and formally compared against each other: (1) Distributed interconnection (i.e. interconnectwrapper concept) and (2) Centralized approach (i.e. based on switch-boxes). The equivalence of distributed and centralized interconnection concepts was formally proven which also proved the ability of the interconnect-wrappers to efficiently model many different reconfigurable inter-processor networks. Finally, by introducing the concept of clustered domain-based specification of processors and interconnection network, the code size of the proposed architecture description language could be optimized and shown by specification of a tightly coupled processor array with MAML in Section 3.4. Simulation The cycle-accurate simulation approaches proposed in this monograph consider the two following abstraction levels of modeling: • Instruction set level: Based on the modeling concepts presented in Chapter 3 for adaptive multi-processor architectures, Chapter 4 introduced an appropriate methodology for automated generation of efficient (in terms of speed) instruction set cycle-accurate simulators. • Register transfer level: Based on a given register transfer level description, Chapter 5 proposed the techniques for automatic generation of event-driven simulators also for unconventional specific architectures such as peripheral devices and custom computing machines that do not follow a von Neumann structure. A methodology for automatic generation of instruction set level simulators was proposed. Basically, the simulation speed could be optimized here by evaluating only those parts of the circuit whose values are affected from one to another simulation cycle. So, the simulation of a certain processing element was performed only as soon as a so-called simulation event occurred. The simulation event is determined by the change of the contents of at least one of the input or internal registers of one processor. Moreover, the occurrences of the simulation events (simulation-event-pattern) mostly could be described by a periodic behavior (activity-period). This allowed a priori to predict the exact simulation schedule for each processing element. Instruction set level simulators can be automatically generated for static and adaptive multi-processor architectures. Hereby, Section 4.1 proposed an event-driven methodology which led to much more efficient simulation of static multi-processor architectures. The simulation speed could be noticeably optimized due to evaluating only those parts of the parallel processor architecture whose values were affected from one to another simulation cycle. Section 4.2 provided a new dynamic mixed compiled/interpretive simulation approach to allow for efficient instruction set cycleaccurate simulation of adaptive or reprogrammable multi-processor architectures. Further, the problem of automated generation of simulators and other relevant design tools driven by the modeling methodology proposed in Chapter 3 was addressed in Section 4.3. As for register transfer level simulation of specific architectures, simulation speed optimization concepts were proposed in Chapter 5. Namely, a new mixed register transfer/instruction set level compiled simulation technique was proposed. Hereby, the compiled cycle-accurate simulator could be automatically generated from a given RTL description. Also, simulators of unconventional processors such as peripheral devices, building components of FPGAs, and custom computing machines that do not have a von Neumann architecture could be handled and simulated. In order to optimize the simulation speed, two new graph decomposition algorithms were introduced in Section 5.2. Section 5.3 proposed further optimization of simulation speed by a third algorithm which introduced intermediate registers to minimize the number of evaluations of combinational elements in the RTL circuit. A comparison of the two decomposition algorithms was provided in Section 5.4. Also, the performance and code size of the simulator generated by both algorithms were compared with each other and the simulator’s superior performance compared to existing commercial simulators was shown experimentally. The results indicated a high simulation speed with flexibility, cycle-accuracy, and bit-truth of entirely register transfer level designs. Design and Implementation For evaluation of simulation and modeling methodologies, in particular for tightly coupled adaptive processor architectures, a design framework has been implemented (cf. Section 4.3). The framework is capable to automate the generation of a simulation model and other related design tools either from the interactive graphical input or directly from the introduced architecture description model (cf. Chapter 3) at instruction set level. The tools developed in the context of proposed design framework centrally use an architecture description language presented in this monograph for modeling the adaptive multi-processor architectures. Also, the introduced instruction set cycle-accurate simulation approach was used to automatically generate the efficient mixed interpretive/compiled simulators for adaptive parallel processor architectures. Thereby, an appropriate simulation and debugging environment with interactive visualization was provided. At the backend, a highly parameterizable template in VHDL was used to instantiate an adaptive multi-processor architecture implementation with the given architectural parameters. Moreover, the framework provided a modeling basis for energy consumption estimation for the entire processor architecture. Using the results of simulation profiling, an initial coarse estimation of the switching activity could be obtained and interactively visualized over time.

Abstract

Die vorliegende Arbeit erörtert neue Aspekte der Modellierung und zyklusgenauer Simulation von adaptiven Prozessorsystemen, sowie spezielle Architekturen auf Befehlssatz- und Register-Transfer Ebene. Der thematische Schwerpunkt ist die Aufgabe der Generierung von effizienten (in Bezug auf Performanz und Prozessor-Ressourcen Nutzung) Simulatoren. Um Simulatoren automatisch generieren zu können, wurde das Problem der Modellierung von Prozessorarchitekturen diskutiert. Das bemerkenswerteste Ergebnis dieser Arbeit ist ein innovatives Design-Framework für den modellbasierten Entwurf, die automatisierte Generierung von interaktiven Simulatoren und Debug-Umgebungen für adaptive Multiprozessorarchitekturen. Somit hat die präsentierte Arbeit neue Wege in dem automatisierten Entwurf eingebetteter Systeme aufgezeigt. Zunächst erfolgt eine kurze Zusammenfassung der wesentlichen Beiträge dieser Arbeit. Die wesentlichen Beiträge dieser Dissertation können in drei unterschiedliche Bereiche gegliedert werden: Modellierung, Simulation sowie Entwurf und Implementierung. Beitrag im Bereich Modellierung Um eine Brücke zwischen dem generischem Architekturmodell und den entsprechenden Simulationswerkzeugen schlagen zu können, wurde die Architekturbeschreibungssprache MAML (Machine Markup Language) erweitert. Die Architekturbeschreibungssprache MAML wurde ursprünglich zur Generierung von Compilern von VLIW-artigen ASIP-Architekturen benutzt (Abschnitt 3.1). Durch die in dieser Arbeit dargestellte Erweiterung (Abschnitt 3.2) wurde die Modellierung von adaptiven, massiv-parallelen Prozessorsystemen ermöglicht. Solche Architekturen sind besonders gut für berechnungsintensive Anwendungen, wie zum Beispiel Video- und medizinische Bildverarbeitung, geeignet. Die aus MAML extrahierten Parameter können einerseits zur Generierung von interaktiven, schnellen zyklusgenauen Simulatoren und andererseits für die Compiler-Generierung verwendet werden. Eine wesentliche Eigenschaft des vorgeschlagenen Ansatzes ist eine klare Trennung in die Beschreibung des Verhaltens und der Struktur. Ferner erlaubt diese zwischen einer funktionalen und geometrischen Parametrierung des Multiprozessorentwurfs zu unterscheiden. Beiträge im Bereich Simulation Zwei unterschiedliche Methoden zur Generierung von zyklusakkuraten Simulatoren wurden in der vorliegenden Schrift diskutiert. Der erste Ansatz basiert auf MAML-Architekturmodellen für adaptive, massivparallele Prozessorarchitekturen (Abschnitt 4) und ermöglicht die automatische Generierung von effizienten (in Bezug auf Geschwindigkeit) Simulatoren auf Instruktionssatzebene. Die Simulationsgeschwindigkeit konnte hier optimiert werden durch das Bewerten von nur den Teilen der Schaltung, deren Registerinhalte von einem zu anderem Simulationszyklus beeinflusst worden sind. Hierzu wurde die Simulation von einem bestimmten Prozessorelement nur in dem Falle durchgeführt, wenn das so genannte Simulationsereignis angekommen ist. Die Instruktionssatz-Simulatoren konnten für sowohl statische (Abschnitt 4.1) als auch adaptive (Abschnitt 4.2) massivparallele Prozessorsysteme automatisch generiert werden. Um die Simulation von Peripherie, Bauteilen oder benutzerdefinierten Komponenten zu ermöglichen, wurde im Abschnitt 5 eine zweite Methode für die Register-Transfer Ebene vorgeschlagen. Hier konnte der kompilierte zyklusakkurate Simulator ebenfalls aus einer RTL-Beschreibung automatisch generiert werden. Um die Geschwindigkeit der Simulation zu optimieren, wurden im Abschnitt 5.2 zwei neue Graphzerlegungsalgorithmen eingeführt. Um die Anzahl der Berechnungen von kombinatorischen Elementen der RTL-Schaltung zu minimieren, wurde im Abschnitt 5.3 eine weitere Optimierung der Simulationsgeschwindigkeit durch die Einführung so genannter Zwischenregister vorgeschlagen. Ein Vergleich der beiden Zerlegungsalgorithmen wurde im Abschnitt 5.4 durchgeführt. Die Ergebnisse zeigten eine hohe Simulationsgeschwindigkeit, die mit der Flexibilität und Genauigkeit einer zyklusakkuraten Simulation auf der Register-Transfer Ebene verglichen werden kann. Beiträge im Bereich des Entwurfs und der Implementierung Ein wesentlicher Beitrag der vorliegenden Arbeit ist in der Kombination von den Gebieten der Modellierung und Simulation zu finden. Durch die Verwendung der MAML-Modellierung und die Unterstützung dieser Modelle durch neuartige Simulationsmethodiken, ist es gelungen, ein innovatives Design-Framework zum Entwurf von adaptiven Multiprozessorarchitekturen auf Instruktionssatz Ebene zu entwickeln (Abschnitt 4.3). Das Framework ist fähig eine automatische Erstellung des Simulationsmodells und der entsprechenden Entwurfswerkzeuge direkt aus einer MAML-Spezifikation oder durch die interaktive grafische Eingabe durchzuführen. Die automatisch generierten Simulatoren und andere Werkzeuge wie Compiler benutzen die Architekturbeschreibungssprache MAML zentral.