Skip to content

This assignment of Web Services and Cloud-Based Systems course in UvA is about using the Brane framework to implement a data processing pipeline.

Notifications You must be signed in to change notification settings

nightmare224/brane-programming-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The assignment is about using the Brane framework to implement a data processing pipeline for the assignment of Web services and Cloud-Based Systems course in UvA.

We aim to find the key factors of heart disease through the dataset from Kaggle. We implemented the compute pipeline to do data preprocessing, model training, and visualization as well as the report pipeline to gather the result of the compute pipeline into the HTML report. Through the report, we can find out the key factors of heart disease.

pipeline_overview

Getting Started

Build

To build the required packages and dataset in the Brane environment, run:

bash brane-programming-project/brane-heart-disease/build.sh

If the packages and dataset build successfully, you should see:

build

Run

There are two pipeline in total. The first one is compute pipeline. The second one is report pipeline. All the computation are in the compute pipeline. The report pipeline is used to collect all the figures that generate from compute pipeline into single HTML report.

To trigger compute pipeline, run:

brane run brane-programming-project/brane-heart-disease/scripts/pipeline.bs

If the compute pipeline run sucessfully, you should see:

compute_pipeline

To trigger report pipeline, run:

brane run brane-programming-project/brane-heart-disease/scripts/report.bs

If the report pipeline run sucessfully, you should see:

report_pipeline

Usage

If the report pipeline run successfully, you can find out the filepath of final report by running:

brane data path heart-disease-report

This command would return the path of directory as shown in the below figure.

report_pipeline

The directory should contain several HTML files. Download the whole directory and then open the report.html. You would see the menu like this:

report_menu

This report would contain all the figures that genreated in the compute pipeline. To see the demo report, check here.

Discussion

As you can see, we have two pipeline: compute pipeline and report pipeline. In fact, there is no reason that we split it into two pipeline. It should be in the single pipeline. The reason of having two pipeline is because Brane seems cannot use the commit dataset instantly in same package in same pipeline.

For example:

import visualization;

let fig := feature_importance();
commit_result("heart-disease-report", fig);
// will get heart-disease-report not found
let data := new Data { name := "heart-disease-report" };
let report := generate_report(fig);

Although the workaroud is to pass IntermediateType data, it cannot obtain the output file as a single directory accross multiple function.

import visualization;

let fig := feature_importance();
fig := model_report();
fig := heart_disease_positive_ratio();
fig := heart_disease_positive_ratio();
fig := heart_disease_positive_ratio();
fig := heart_disease_positive_ratio();
fig := heart_disease_positive_ratio();

let report := generate_report(fig);

References

Data Analysis

Brane

Visualization

About

This assignment of Web Services and Cloud-Based Systems course in UvA is about using the Brane framework to implement a data processing pipeline.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages