As a part of Data Analytics course at Virginia Tech, a data analytics competition was organized by General Dynamics.
This repository consists of our source code and the project report which ultimately fetched our team members Silver Medal :)
GD is also a sponsor of the Discovery Analytics Center at Virginia Tech.
However, the records in http_info.csv file has been reduced down from 28 to 1 million. All other files remain intact
The records were reduced so that the dataset could be compressed down to ~500MB.
The reduced dataset for the project can be downloaded here.
Dataset Readme The project report consists of some interesting discoveries made while analyzing the dataset.
The report can be viewed/downloaded here. The important directories in this repositories include:
- jupyter-notebooks : contains main source code of our project
- email-topic-modelling : LDA topic modelling of email contents
- url-sentiment-analysis : Sentiment analysis using Google Cloud Natural Language API
- url-topic-modelling : URL content classifier using Google Cloud Natural Language API
- miscellaneous, utility
- Google Cloud Natural Language API
- Pandas data analytics
- Jupyter Notebook
- Seaborn: data visualization
- Latent Dirichlet Allocation (LDA)
- Vader Sentiment Analysis Library
This project is licensed under the Apache License 2.0 - check the LICENSE.md file for details.
Special thanks to Dr. Leman, our DA course instructor and General Dynamics for organizing this competition.