-
Notifications
You must be signed in to change notification settings - Fork 31
Workflow description format
HyperFlow uses a simple JSON format to describe workflows. The format of the file has a very simple structure:
{
"name": "Hello", // name of the workflow
"processes": [ ... ], // array of vertices of the workflow graph (called "processes")
"signals": [ ... ] // array of edges of the workflow graph (called "signals")
}
For example this structure:
{
"name": "Hello",
"processes": [ {
"name": "Node_0", // name of the "process" (should be unique)
"ins": [ 0 ], // input edges ("signals") (array of indexes in the "signals" array)
"outs": [ 1 ] // output edges
}, {
"name": "Node_1",
"ins": [ 1 ],
"outs": [ 2 ]
} ],
"signals": [ {
"name": "sig_0" // name of the signal (should be unique)
}, {
"name": "sig_1"
}, {
"name": "sig_2"
} ]
}
describes the following graph:
Note that in HyperFlow the workflow graph is a multigraph which means that a given pair of vertices may be connected by multiple edges. For example, each edge may denote a file that is produced by one task and consumed by another.
{
"name": "Sqr",
"type": "dataflow",
"function": "sqr",
"ins": [ "number" ], // instead of array indexes, signal names can also be used
"outs": [ "square" ]
}
name
(string, mandatory) - unique name of the process
type
(string, optional) - type of the process (default value: dataflow
)
function
(string, mandatory) - name of the JavaScript function that will be invoked when the process is activated
parlevel
(integer, optional) - a number denoting how many activations of the process can be executed concurrently (default 1
, 0
means infinite). See Sqrsum
example.
ordering
(string true
or false
, optional) - a flag denoting whether outputs of concurrent activations of a process should be ordered or not. See Sqrsum
example.
firingLimit
(integer, optional) - a number denoting the maximum number of activations of this process (unbounded if undefined)
firintInterval
(integer, optional) - time interval (in miliseconds) at which the process should be activated (only relevant for processes with no input signals, see Streaming Map/Reduce example.
{
"name": "number",
"data": [ 1, 2, 3, 4, 5, 6 ]
}
If data
is present, it contains a sequence of data elements (instances of that signal) that will be sent to the workflow after it has started.
HyperFlow workflow description supports variable interpolation. You can put {{var_name}}
variables in the workflow.json
, and provide values for these variables in one of the following ways:
- Through
hflow
command line parameter--var
, e.g.
hflow run <wf_dir> --var="function=command_print" --var="workdir=/home/workdir"
- Through environment variables starting with
HF_VAR_
, e.g.
export HF_VAR_function=command_print
This will result in replacing all occurrences of {{function}}
in workflow.json
with command_print
when the workflow is run with the hflow run
command.