The DS 3001 project for CJ Barcelos, Benjamin Sarkis, and Drew Ciccarelli
Project Overview
For the project, you will work in teams of three or four on a problem of your choosing that is interesting, significant, and relevant to Data Science. You will have great latitude in what you choose to work on, so take advantage of this opportunity to make a big impact!
The primary requirements of the project are:
Your project code must live on github (Links to an external site.)Links to an external site.. We prefer it to be public. However, if you're too scared to share your code with future employers, you can claim a private github account (with a .edu email address). Your project must use some non-trivial data. Here are good starting points: (1) http://www.kdnuggets.com/datasets/index.html (Links to an external site.)Links to an external site., (2) https://github.com/caesar0301/awesome-public-datasets (Links to an external site.)Links to an external site.,(3) https://snap.stanford.edu/data/index.html (Links to an external site.)Links to an external site., and (4) https://www.kaggle.com/ (Links to an external site.)Links to an external site. Your project must apply some mining or analytics algorithm. Your project must use the "data science loop", ultimately leading to a data product or data visualization that can help guide decision making. Grading Criteria
The course project officially counts for 30% of your final grade.
[25%] Project proposal: Due April 8 by 11:59pm [25%] Checkpoint: Due April 19 by 11:59pm [50%] Project workshop: April 30 and May 1 in-class Project proposal (April 8) [1 to 2 pages (pdf); Post on Canvas]
Each group should post a 1-2 page project proposal in PDF to the Canvas by April 8 at 11:59pm.
You should include:
The name of your team and the team members What is the need? Who wants or benefits? What data (or datasets)? What is your "data science" toolkit? You should list specific tools / packages you will use. Preliminary sketch of what you hope to build Checkpoint: Exploratory Data Analysis and Data Visualization (April 19) [4 page MAX (PDF); Post on Canvas]
For the project checkpoint, you must have collected a significant portion of the data that your project will ultimately use. You will post a brief summary of your exploratory data analysis and your prototype visualization. Post your MAX 4-page PDF to Canvas.
You should include:
Summary and descriptive statistics of your data Data cleaning steps taken Insights Initial screenshots Sketch of interaction in your final data product Next steps Project Workshop (April 30 and May 1 in-class; Post on Canvas)
On April 30 and May 1 we will hold the Data Science Project Workshop in-class. Each team will give a project overview and a demo. All students are required to participate on both days.