Skip to content
/ Hi-WAY Public

Heterogeneity-incorporating Workflow ApplicationMaster for YARN

License

Notifications You must be signed in to change notification settings

marcbux/Hi-WAY

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hi-WAY

The Hi-WAY Workflow ApplicationMaster for YARN provides the means to execute arbitrary scientific workflows on top of Apache Hadoop 2 (YARN). In this context, scientific workflows are directed acyclic graphs (DAGs), in which nodes are black-box tasks (e.g. Bash scripts, Java programs, Python scripts, compiled C++ executables) processing unstructured data (arbitrary files). Edges in the graph represent data dependencies between the tasks. Hi-WAY uses Hadoop's distributed file system HDFS to store the workflow's input, output and intermediate data.

Hi-WAY currently supports the workflow languages Cuneiform, Galaxy, and Pegasus DAX, yet can be easily extended to support other workflow languages. A number of general-purpose and more specialized schedulers are provided, which can take into account data locality when assigning tasks to machines to reduce data transfer times during workflow execution. When running workflows, Hi-WAY captures comprehensive provenance information, which can be stored as files in HDFS, as well as in a MySQL or Couchbase database. These provenance traces are evaluated by the scheduler for performance estimation and can be used to re-execute previous workflow runs. The ApplicationMaster has been tested to scale to more than 600 concurrent tasks and is fault-tolerant in that it is able to restart failed tasks.

The Installation instructions can be found here.

About

Heterogeneity-incorporating Workflow ApplicationMaster for YARN

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages