Big data workflows: A reference architecture and the dataview system

Document Type

Article

Publication Date

2017

Department/School

Computer Science

Publication Title

Services Transactions on Big Data

Abstract

The big data era is here, a natural result of the digital revolution of the last few decades. The emergence of big data in virtually all areas of life raises a fundamental question - how can we turn large volumes of bits and bytes into insights and possibly values? The answer to this question is often hindered by three big data challenges: volume, velocity, and variety. While scientific workflows have been used extensively in structuring complex scientific data analysis processes, they fall short in meeting the three big data challenges on the one hand, and in leveraging the dynamic resource provisioning capability of cloud computing on the other hand. To address such limitations, we propose and develop the concept of big data workflow as the next generation of data-centric workflow technologies. In this paper we: 1) identify the key challenges for running big data workflows in the cloud; 2) propose a reference architecture for big data workflow management systems (BDWFMSs) that addresses these challenges, 3) develop DATAVIEW, a big data workflow management system, to validate our proposed reference architecture, 4) design and run two big data workflows in the automotive and astronomy domains to showcase applications of our DATAVIEW system.

Comments

A. Kashley is a faculty member in EMU's Department of Computer Science.

Share

COinS