- AutorIn
- Simon Wunderlich Technische Universität Dresden
- Titel
- Network Coding Strategies for Multi-Core Architectures
- Zitierfähige Url:
- https://nbn-resolving.org/urn:nbn:de:bsz:14-qucosa2-765176
- Erstveröffentlichung
- 2021
- Datum der Einreichung
- 04.02.2021
- Datum der Verteidigung
- 20.07.2021
- Abstract (EN)
- Random Linear Network Coding (RLNC) is a new coding technique which can provide higher reliability and efficiency in wireless networks. Applying it on the fifth generation of cellular networks (5G) is now possible due to the softwarization approach of the 5G architecture. However, the complex computations necessary to encode and decode symbols in RLNC are limiting the achievable throughput and energy efficiency on todays mobile computers. Most computers, phones, TVs, or network equipment nowadays come with multiple, possibly heteregoneous (i.e. slow low-power and fast high-power) processing cores. Previous multi core research focused on RLNC optimization for big data chunks which are useful for storage, however network operations tend to use smaller packets (e.g. Ethernet MTUs of 1500 byte) and code over smaller generations of packets. Also latency is an increasingly important performance aspect in the upcoming Tactile Internet, however latency has received only small attention in RLNC optimization so far. The primary research question of my thesis is therefore how to optimize throughput and delay of RLNC on todays most common computing architectures. By fully leveraging the resources of todays consumer electronics hardware, RLNC can be practically adopted in todays wireless systems with just a software update and improve the network efficiency and user experience. I am generally following a constructive approach by introducing algorithms and methods, and then demonstrating their performance by benchmarking actual implementations on common consumer electronics hardware against the state of the art. Inspired by linear algebra parallelization methods used in high performance computers (HPC), I’ve developed a RLNC encoder/decoder which schedules matrix block tasks for multiple cores using a directed acyclic graph (DAG) based on data dependencies between the tasks. A non-progressive variant works with pre-computed DAG schedules which can be re-used to push throughput even higher. I’ve also developed a progressive variant which can be used to minimize latency. Both variants are achieving higher throughput performance than the fastest currently known RLNC decoder, with up to three times the throughput for small generation size and short packets. Unlike previous approaches, they can utilize all cores also on heterogeneous architectures. The progressive decoder greatly reduces latency while allowing to keep a high throughput, reducing the latency up to a factor ten compared to the non-progressive variant. Progressive decoders need special low-delay codes to release packets early instead of waiting for more dependent packets from the network. I'm introducing Caterpillar RLNC (CRLNC), a sliding window code using a fixed sliding window over a stream of packets. CRLNC can be implemented on top of a conventional generation based RLNC decoder. CRLNC combines the resilience against packet loss and fixed resource boundaries (number of computations and memory) of conventional generation based RLNC decoders with the low delay of an infinite sliding window decoder. The DAG RLNC coders and the Caterpillar RLNC method together provide a powerful toolset to practically enable RLNC in 5G or other wireless systems while achieving high throughput and low delay as required by upcoming immersive and machine control applications.
- Freie Schlagwörter (DE)
- RLNC, Netzwerk-Codierung, Multi-Core, DAG, Schiebefenster
- Freie Schlagwörter (EN)
- RLNC, Network Coding, Multi-Core, DAG, Sliding Window
- Klassifikation (DDC)
- 621.3
- Klassifikation (RVK)
- ZN 6045
- ZN 6400
- ST 200
- GutachterIn
- Prof. Dr. Frank Fitzek
- Prof. Dr. Patrick Seeling
- Prof. Dr. Matthias Hollick
- Den akademischen Grad verleihende / prüfende Institution
- Technische Universität Dresden, Dresden
- Version / Begutachtungsstatus
- publizierte Version / Verlagsversion
- URN Qucosa
- urn:nbn:de:bsz:14-qucosa2-765176
- Veröffentlichungsdatum Qucosa
- 09.11.2021
- Dokumenttyp
- Dissertation
- Sprache des Dokumentes
- Englisch
- Lizenz / Rechtehinweis
- CC BY 4.0
- Inhaltsverzeichnis
1 Introduction 2 Background and Related Work 2.1 Network Delay 2.2 Network Coding Basics 2.3 RLNC Optimization for Throughput 2.3.1 SIMD Optimization 2.3.2 Block Operation Increasing Cache Efficieny with Subblocking 2.3.3 Optimizing Matrix Computations 2.4 Progressive RLNC Decoders 2.5 Sliding Window RLNC 3 Optimized RLNC Parallelization with Scheduling Graphs 3.1 Offline Directed Acyclic Graph (DAG) Scheduling 3.1.1 Blocked LU Matrix Inversion 3.1.2 Scheduling on a DAG 3.1.3 Phase 1: DAG Recording 3.1.4 Phase 2: DAG Schedule Execution 3.1.5 DAG Scheduling vs. Conventional Multithreading 3.1.6 Task Size Considerations 3.1.7 Scheduling Strategies First Task Strategy Task Dependency Strategy Data Locality Strategy Combined Task Dependency and Data Locality Strategy 3.2 Online DAG Scheduling 3.2.1 Online DAG Operation Forward Elimination Backward Substitution Row Swapping 3.2.2 Scheduling on an Online DAG Data Dependency Traversal Online DAG Creating and Task Delegation 3.2.3 Optimizations Stripe Optimization Full Rows Optimization 3.3 Evaluation Setup 3.3.1 Multicore Boards ODROID-XU3 ODROID-XU4 ODROID-XU+E Cubieboard 4 Raspberry Pi 2 Model B 3.3.2 Evaluation Parameters Parameter Settings Matrix Types 3.3.3 Performance Metrics Throughput Delay Energy 3.3.4 Evaluation Methodology 3.4 Evaluation Results 3.4.1 Block Size b 3.4.2 Comparison of Scheduling Strategies 3.4.3 Single Thread Throughput 3.4.4 Multi Thread Throughput 3.4.5 Comparison of Multicore Boards 3.4.6 Energy Consumption 3.4.7 Online DAG vs. Offline DAG Throughput 3.4.8 DAG vs Progressive CD 3.4.9 Delay 3.4.10 Trading Throughput with Delay 3.4.11 Sparse Coefficient Matrices in Online DAG 4 Sliding Window - Caterpillar RLNC (CRLNC) 4.1 CRLNC Overview 4.2 CRLNC Packet Format And Encoding 4.3 CRLNC Decoding 4.3.1 Shifting the Row Echelon Form Same sequence number: s_p = s_d New Packet: s_p > s_d Old Packet: s_p < s_d 4.3.2 Larger Decoding Windows: w_d > w_e 4.3.3 CRLNC Decoding Storage and Computing Requirements 4.4 CRLNC Evaluation 4.4.1 Performance Metrics Packet Loss Probability In-Order Packet Delay 4.4.2 Evaluation Methodology 4.5 Evaluation Results 4.5.1 Packet Loss Probability 4.5.2 In-Order Packet Delay 4.5.3 Tradeoffs for Larger Decoding Windows 4.5.4 Computation Complexity 5 Summary and Conclusion List of Publications Bibliography