KIT | KIT-Bibliothek | Impressum | Datenschutz

Communication Efficient Algorithms for Distributed OLAP Query Execution

Hespe, Demian ORCID iD icon

Abstract:

As a result of the growing amounts of Data in todays Databases, one machine is often not sufficient to store and process these. The proper solution to this problem is to scale the system out on a cluster. However, the distribution of the data throughout the machines of the cluster results in a high percentage of communication time in the overall execution time of a query, especially for complex analytical queries. For this reason, we try to minimize the volume of communicated data to allow faster runtimes when a query cannot be executed on a single node of the cluster without any communication. We analyze techniques from previous work and propose improvements to them backed by a complexity analysis of the communication volume for both, our algorithms and the algorithms from the previous work. For the evaluation of our algorithms we implement them for chosen queries of the TPC-H benchmark and run them on a cluster of up to 128 nodes with a database of up to 30 terabytes of uncompressed data (128 TB if only a small proportion of the database is used). We provide both, scaling experiments and runtime comparisons to previous work and the current TPC-H record holder. ... mehr


Volltext §
DOI: 10.5445/IR/1000058831
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Theoretische Informatik (ITI)
Publikationstyp Hochschulschrift
Publikationsjahr 2014
Sprache Englisch
Identifikator urn:nbn:de:swb:90-588317
KITopen-ID: 1000058831
Verlag Karlsruher Institut für Technologie (KIT)
Umfang 52 S.
Art der Arbeit Abschlussarbeit - Bachelor
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page