Skip to content

Workflow description format

Bartosz Balis edited this page Feb 19, 2021 · 6 revisions

Workflow graph structure

HyperFlow uses a simple JSON format to describe workflows. The format of the file has a very simple structure:

{
  "name": "Hello",         // name of the workflow
  "processes": [ ... ],    // array of vertices of the workflow graph (called "processes")
  "signals": [ ... ]       // array of edges of the workflow graph (called "signals")
}

For example this structure:

{
    "name": "Hello",
    "processes": [ {        
        "name": "Node_0",   // name of the "process" (should be unique)
	"ins": [ 0 ],       // input edges ("signals") (array of indexes in the "signals" array)
        "outs": [ 1 ]       // output edges                                      
    }, {
        "name": "Node_1",
	"ins": [ 1 ],
        "outs": [ 2 ]
    } ],
    "signals": [ {          
        "name": "sig_0"     // name of the signal (should be unique)
    }, {
        "name": "sig_1"
    }, {
        "name": "sig_2"
    } ]
}

describes the following graph:

Note that in HyperFlow the workflow graph is a multigraph which means that a given pair of vertices may be connected by multiple edges. For example, each edge may denote a file that is produced by one task and consumed by another.

Workflow vertices: processes

{
  "name": "Sqr",
  "type": "dataflow",
  "function": "sqr",
  "ins": [ "number" ],      // instead of array indexes, signal names can also be used
  "outs": [ "square" ]
}

name (string, mandatory) - unique name of the process

type (string, optional) - type of the process (default value: dataflow)

function (string, mandatory) - name of the JavaScript function that will be invoked when the process is activated

parlevel (integer, optional) - a number denoting how many activations of the process can be executed concurrently (default 1, 0 means infinite). See Sqrsum example.

ordering (string true or false, optional) - a flag denoting whether outputs of concurrent activations of a process should be ordered or not. See Sqrsum example.

firingLimit (integer, optional) - a number denoting the maximum number of activations of this process (unbounded if undefined)

firintInterval (integer, optional) - time interval (in miliseconds) at which the process should be activated (only relevant for processes with no input signals, see Streaming Map/Reduce example.

Workflow edges: signals

{
  "name": "number",
  "data": [ 1, 2, 3, 4, 5, 6 ]
}

If data is present, it contains a sequence of data elements (instances of that signal) that will be sent to the workflow after it has started.

Support for templates

HyperFlow workflow description supports variable interpolation. You can put {{var_name}} variables in the workflow.json, and provide values for these variables in one of the following ways:

  • Through hflow command line parameter --var, e.g.
hflow run <wf_dir> --var="function=command_print" --var="workdir=/home/workdir"
  • Through environment variables starting with HF_VAR_, e.g.
export HF_VAR_function=command_print

This will result in replacing all occurrences of {{function}} in workflow.json with command_print when the workflow is run with the hflow run command.