A Dataflow Programming Library.
Masamune provides a dataflow programming framework on top of Thor. In the framework, dataflows are constructed as Thor tasks that transform source data into the target data. Source and target data descriptions are encoded as annotations associated with the Thor command. From these source and target annotations, Masamune constructs a data dependency tree that describes how to automatically construct a target data set.
Describe your dataflow as source, target data transformations:
class ExampleThor < Thor
# Mix in Masamune specific Data Flow Behavior
include Masamune::Thor
include Masamune::Actions::DataFlow
# Mix in Masamune Actions for Data Processing
include Masamune::Actions::Streaming
include Masamune::Actions::Hive
# Describe a Data Processing Job
desc 'extract_logs', 'Organize log files by YYYY-MM-DD'
target fs.path(:target_dir, '%Y-%m-%d')
source fs.path(:source_dir, '%Y%m%d*.log')
def extract_logs
targets.missing.each do |target|
target.sources.each do |source|
# Transform source into target
fs.copy(source.path, target.path)
end
end
end
end
Execute your dataflow with the goal of processing all data from the start of the year:
thor extract_logs --start '1 year ago'
rake spec # Run Rspec unit code examples
rake spec:acceptance # Run Rspec acceptance code examples
rake spec:all # Run All Rspec code examples
rake spec:unit # Run Rspec unit code examples
- Fork the project
- Fix the issue
- Add unit tests
- Submit pull request on github