D6.2 Demonstrator and report on workflow compilation and parallel execution

In this deliverable, we investigate two possible solutions for the scalable execution of preservation workflows. Firstly, we report on the automatic compilation of Taverna workflows to MapReduce jobs for parallel execution, and demonstrate the compilation of a simple example workflow. Secondly, we report on the use of a higher order dataflow language, namely Apache PIG, to define preservation workflows. We discuss a complex workflow that is being developed for preservation watch on large scale data. We discuss advantages and disadvantages of both approaches, especially with regards to the issues of workflow elements than can effectively be parallelized and optimized. Based on this analysis, we give an outline of a best-of-both worlds approach to bring together the graphical interface of Taverna with the optimized workflow execution and maintainability of Apache PIG.

SCAPE_D6.2_TUB_V1.0

X