GitHub - flowers9/pypeFLOW: a simple lightweight workflow engine for data analysis scripting

What is pypeFLOW

pypeFLOW is light weight and reusable make / flow data process library written in Python.

Most of bioinformatics analysis or general data analysis includes various steps combining data files, transforming files between different formats and calculating statistics with a variety of tools. Ian Holmes has a great summary and opinions about bioinformatics workflow at http://biowiki.org/BioinformaticsWorkflows. It is interesting that such analysis workflow is really similar to constructing software without an IDE in general. Using a "makefile" file for managing bioinformatics analysis workflow is actually great for generating reproducible and reusable analysis procedure. Combining with a proper version control tool, one will be able to manage to work with a divergent set of data and tools over a period of time for a project especially when there are complicate dependence between the data, tools and customized code for the analysis tasks.

However, using "make" and "makefile" implies all data analysis steps are done by some command line tools. If you have some customized analysis tasks, you will have to write some scripts and to make them into command line tools. In my personal experience, I find it is convenient to bypass such burden and to combine those quick and simple steps in a single scripts. The only caveat is that if an analyst does not save the results of any intermediate steps, he or she has to repeat the computation all over again for every steps from the beginning. This will waste a lot of computation cycles and personal time. Well, the solution is simple, just like the traditional software building process, one have to track the dependencies and analyze them and only reprocess those parts that are necessary to get the most up-to-date final results.

General Design Principles

Explicitly modeling data and task dependency

Support declarative programming style within Python while maintaining some thing that imperative programming dose the best

Utilize RDF meta-data framework

Keep it simple if possible

Features

Multiple concurrent task scheduling and running

Support task as simple shell script (considering clustering job submission in mind)

reasonable simple interface for declarative programming

General Installation

pypeFlow uses the standard python setup.py for installation:

python setup.py install

Once install, a brief documentation can be generated by:

cd doc
make html

The generate sphinx html document can be viewed by point your web browser to _build/html/index.html in the doc directory.

Name		Name	Last commit message	Last commit date
Latest commit History 275 Commits
doc		doc
example		example
examples-pwatcher		examples-pwatcher
presentation		presentation
pwatcher		pwatcher
pypeflow		pypeflow
src		src
.gitignore		.gitignore
.travis.yml		.travis.yml
License.txt		License.txt
README.rst		README.rst
readme.slurm.md		readme.slurm.md
setup.py		setup.py
travis.sh		travis.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is pypeFLOW

General Design Principles

Features

General Installation

About

Releases

Packages

Languages

License

flowers9/pypeFLOW

Folders and files

Latest commit

History

Repository files navigation

What is pypeFLOW

General Design Principles

Features

General Installation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages