Dokument: Low-Latency Data Access in a Java-based Distributed In-Memory Key-Value Storage

Titel:Low-Latency Data Access in a Java-based Distributed In-Memory Key-Value Storage
URL für Lesezeichen:https://docserv.uni-duesseldorf.de/servlets/DocumentServlet?id=52187
URN (NBN):urn:nbn:de:hbz:061-20200225-095556-5
Kollektion:Dissertationen
Sprache:Englisch
Dokumententyp:Wissenschaftliche Abschlussarbeiten » Dissertation
Medientyp:Text
Autor: Nothaas, Stefan [Autor]
Dateien:
[Dateien anzeigen]Adobe PDF
[Details]66,52 MB in einer Datei
[ZIP-Datei erzeugen]
Dateien vom 04.02.2020 / geändert 04.02.2020
Beitragende:Prof. Dr. Schöttner, Michael [Gutachter]
Prof. Dr. Conrad, Stefan [Gutachter]
Dewey Dezimal-Klassifikation:000 Informatik, Informationswissenschaft, allgemeine Werke » 004 Datenverarbeitung; Informatik
Beschreibung:Large scale highly interactive online or batch processing offline graph applications require either low latency or high throughput for processing huge graphs with trillions of edges and billions of vertices. To keep data-access times low, systems designed for this type of big data application typically keep all data in-memory and aggregate hundreds or thousands of servers in cluster or cloud environments to create an extensive storage backend. However, highly parallel graph applications typically store and process large graphs consisting mostly of small objects less than 128 bytes. These requirements are challenging for the backend storage, the distributed processing platform, the local memory management and the network subsystem.

This thesis addresses three primary research questions in the context of a Java-based distributed in-memory key-value storage: (1) highly concurrent and distributed (graph) processing on a Java-based in-memory key-value storage; (2) a memory management in Java providing low-latency data-access and low-overhead synchronization for large graph datasets consisting of many small objects; (3) a network subsystem for highly concurrent sending and receiving of messages leveraging low latency and high-throughput network-interconnects in Java applications.

First, this thesis proposes a general compute platform and a graph processing framework for a Java-based distributed in-memory key-value storage. The compute platform builds on top of the key-value storage executing concurrent and distributed computations on storage nodes to benefit from data locality. The platform offers services to either dispatch light-weight SIMD-based computations or heavy-weight and coordination-based computations to multiple servers. The framework was evaluated with the breadth-first search algorithm (part of the Graph500 benchmark) to compare the proposed concepts to other state-of-the-art graph processing systems.

The second contribution addresses low-latency local data-access in an in-memory key-value storage in Java. It proposes a low memory- and access-overhead memory management designed for an in-memory key-value storage but also applicable in any highly parallel Java application. A custom key-value translation mechanism was extended to support low-overhead concurrent data access using a custom per-object read-write lock without considerably increasing the per-object memory overhead. The latter is kept low by a custom fixed-block allocator optimized for small objects in typical graph data-sets. The evaluation shows that our proposed solution provides an at least five-times lower memory-overhead compared to two other memory managers of other state-of-the-art Java-based key-value systems and outperforms them up to 28-fold with 128 threads on read-heavy workloads.

With InfiniBand interconnects available in HPC and cloud environments, distributed applications can highly benefit from single-digit microseconds latency and gigabytes per second throughput. The third and last contribution addresses the network with a focus on InfiniBand hardware and proposes a Java-based transport agnostic network subsystem for highly concurrent synchronous and asynchronous messaging in Java applications. This subsystem is complemented by an InfiniBand transport implementation to leverage the performance of such high-speed hardware. The evaluation shows that our solution provides high throughput and scalability on local and distributed concurrency even on worst-case all-to-all communication patterns compared to two state-of-the-art InfiniBand-based MPI implementations.
Lizenz:In Copyright
Urheberrechtsschutz
Fachbereich / Einrichtung:Mathematisch- Naturwissenschaftliche Fakultät » WE Informatik
Dokument erstellt am:25.02.2020
Dateien geändert am:25.02.2020
Promotionsantrag am:12.11.2019
Datum der Promotion:30.01.2020
english
Benutzer
Status: Gast
Aktionen