Multiversioning hardware transactional memory for fail-operational multithreaded applications

  • Modern safety-critical embedded applications like autonomous driving need to be fail-operational, while high performance and low power consumption are demanded simultaneously. The prevalent fault tolerance mechanisms suffer from disadvantages: Some (e.g. triple modular redundancy) require a substantial amount of duplication, resulting in high hardware costs and power consumption. Others, like lockstep, require supplementary checkpointing mechanisms to recover from errors. Further approaches (e.g. software-based process-level redundancy) cannot handle the indeterminism caused by multithreaded execution. This paper presents a novel approach for fail-operational systems using hardware transactional memory for embedded systems. The hardware transactional memory is extended to support multiple versions, enabling redundant atomic operations and recovery in case of an error. In our FPGA-based evaluation, we executed the PARSEC benchmark suite with fault tolerance on 12 cores. The evaluationModern safety-critical embedded applications like autonomous driving need to be fail-operational, while high performance and low power consumption are demanded simultaneously. The prevalent fault tolerance mechanisms suffer from disadvantages: Some (e.g. triple modular redundancy) require a substantial amount of duplication, resulting in high hardware costs and power consumption. Others, like lockstep, require supplementary checkpointing mechanisms to recover from errors. Further approaches (e.g. software-based process-level redundancy) cannot handle the indeterminism caused by multithreaded execution. This paper presents a novel approach for fail-operational systems using hardware transactional memory for embedded systems. The hardware transactional memory is extended to support multiple versions, enabling redundant atomic operations and recovery in case of an error. In our FPGA-based evaluation, we executed the PARSEC benchmark suite with fault tolerance on 12 cores. The evaluation shows that multiversioning can successfully recover from all transient errors with an overhead comparable to fault tolerance mechanisms without recovery.show moreshow less

Download full text files

Export metadata

Statistics

Number of document requests

Additional Services

Share in Twitter Search Google Scholar
Metadaten
Author:Rico AmslingerGND, Christian PiatkaGND, Florian HaasGND, Sebastian Weis, Theo UngererORCiDGND, Sebastian AltmeyerGND
URN:urn:nbn:de:bvb:384-opus4-951809
Frontdoor URLhttps://opus.bibliothek.uni-augsburg.de/opus4/95180
Series (Serial Number):Reports / Technische Berichte der Fakultät für Angewandte Informatik der Universität Augsburg (2022-01)
Publisher:Institut für Informatik, Universität Augsburg
Place of publication:Augsburg
Type:Report
Language:English
Year of first Publication:2022
Publishing Institution:Universität Augsburg
Release Date:2022/05/10
Pagenumber:30
Institutes:Fakultät für Angewandte Informatik
Fakultät für Angewandte Informatik / Institut für Informatik
Fakultät für Angewandte Informatik / Institut für Informatik / Lehrstuhl für Embedded Systems
Dewey Decimal Classification:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Licence (German):Deutsches Urheberrecht mit Print on Demand