CS626 - Large Scale Data Science project
This repository contains code and files in partial fulfilment of the requirements for CS626 by Daniel Cotter, Steve Roggenkamp, and Nima Seyedtalebi.
The code directory contains the code for this project.
Within code we have the scripts directory containing many of the shell, Python, and scala scripts we used for this project.
We also have the code/sec-wc/src/main/python subdirectory containing the Python code used to process the data.
The rdbms directory contains artifacts from our early experimentation with lodaing data inte a traditional database.