RyanBusby / productionFailures Public

Notifications You must be signed in to change notification settings
Fork 1
Star 5

Binary classification of products passage or failure of quality control

5 stars 1 fork Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
img		img
src		src
README.md		README.md
toyExample.ipynb		toyExample.ipynb

Repository files navigation

Binary Classification with Apache Spark / HDFS

↖data source

The goal of the competition is to predict which parts will fail quality control

My goal is to utilize the hadoop ecosystem to handle a large dataset and establish a pipeline for machine learning

`munge` :

Aggregate columns using RDD transformations
Create a column that indicates which of those column aggregations are outliers.

`fit_predict` :

Model data with Spark Machine Learning package
Predict on test data

`munge_fit_predict` :

Run this as is to use the toy data set example

About

Binary classification of products passage or failure of quality control

Report repository

Releases

No releases published

Packages

No packages published

Languages