Dissertation / PhD Thesis FZJ-2014-02813

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
Automated Optimization Methods for Scientific Workflows in e-Science Infrastructures



2014
Forschungszentrum Jülich GmbH Zentralbibliothek, Verlag Jülich
ISBN: 978-3-89336-949-2

Jülich : Forschungszentrum Jülich GmbH Zentralbibliothek, Verlag, Schriften des Forschungszentrums Jülich. IAS Series 24, xvi, 182 S. () = Universität Bonn, Diss., 2014

Please use a persistent id in citations:  

Abstract: Scientific workflows have emerged as a key technology that assists scientists with the design, management, execution, sharing and reuse of in silico experiments. Workflow management systems simplify the management of scientific workflows by providing graphical interfaces for their development, monitoring and analysis. Nowadays, e-Science combines such workflow management systems with large-scale data and computing resources into complex research infrastructures. For instance, e-Science allows the conveyance of best practice research in collaborations by providing workflow repositories, which facilitate the sharing and reuse of scientific workflows. However, scientists are still faced with different limitations while reusing workflows. One of the most common challenges they meet is the need to select appropriate applications and their individual execution parameters. If scientists do not want to rely on default or experience-based parameters, the best-effort option is to test different workflow set-ups using either trial and error approaches or parameter sweeps. Both methods may be inefficient or time consuming respectively, especially when tuning a large number of parameters. Therefore, scientists require an effective and efficient mechanism that automatically tests different workflow set-ups in an intelligent way and will help them to improve their scientific results. This thesis addresses the limitation described above by defining and implementing an approach for the optimization of scientific workflows. In the course of this work, scientists’ needs are investigated and requirements are formulated resulting in an appropriate optimization concept. In a following step, this concept is prototypically implemented by extending a workflow management system with an optimization framework, including general mechanisms required to conduct workflow optimization. As optimization is an ongoing research topic, different algorithms are provided by pluggable extensions (plugins) that can be loosely coupled with the framework, resulting in a generic and quickly extendable system. In this thesis, an exemplary plugin is introduced which applies a Genetic Algorithm for parameter optimization. In order to accelerate and therefore make workflow optimization feasible at all, e-Science infrastructures are utilized for the parallel execution of scientific workflows. This is empowered by additional extensions enabling the execution of applications and workflows on distributed computing resources. The actual implementation and therewith the general approach of workflow optimization is experimentally verified by four use cases in the life science domain. All workflows were significantly improved, which demonstrates the advantage of the proposed workflow optimization. Finally, a new collaboration-based approach is introduced that harnesses optimization provenance to make optimization faster and more robust in the future.

Keyword(s): Dissertation


Note: Universität Bonn, Diss., 2014

Contributing Institute(s):
  1. Jülich Supercomputing Center (JSC)
Research Program(s):
  1. 412 - Grid Technologies and Infrastructures (POF2-412) (POF2-412)

Appears in the scientific report 2014
Database coverage:
OpenAccess
Click to display QR Code for this record

The record appears in these collections:
Document types > Theses > Ph.D. Theses
Workflow collections > Public records
Institute Collections > JSC
JuOSC (Juelich Open Science Collection)
Publications database
Open Access

 Record created 2014-04-14, last modified 2023-07-11