Skip to content

Our team won the Silver medal (second place) in the General Dynamics Data Analytics competition. Source code of our project has been added here.

License

Notifications You must be signed in to change notification settings

hkuadithya/general-dynamics-data-analytics

Repository files navigation

GD Data Analytics competition - Virginia Tech

As a part of Data Analytics course at Virginia Tech, a data analytics competition was organized by General Dynamics.
This repository consists of our source code and the project report which ultimately fetched our team members Silver Medal :)
GD is also a sponsor of the Discovery Analytics Center at Virginia Tech.

Getting Started

Dataset

The original dataset for analysis consisted of 16GB of data.
However, the records in http_info.csv file has been reduced down from 28 to 1 million. All other files remain intact
The records were reduced so that the dataset could be compressed down to ~500MB.
The reduced dataset for the project can be downloaded here.
Dataset Readme

Project report

The project report consists of some interesting discoveries made while analyzing the dataset.
The report can be viewed/downloaded here.

Project structure

The important directories in this repositories include:

  • jupyter-notebooks : contains main source code of our project
  • email-topic-modelling : LDA topic modelling of email contents
  • url-sentiment-analysis : Sentiment analysis using Google Cloud Natural Language API
  • url-topic-modelling : URL content classifier using Google Cloud Natural Language API
  • miscellaneous, utility

Technolgies explored

  1. Google Cloud Natural Language API
  2. Pandas data analytics
  3. Jupyter Notebook
  4. Seaborn: data visualization
  5. Latent Dirichlet Allocation (LDA)
  6. Vader Sentiment Analysis Library

Team Members

License

This project is licensed under the Apache License 2.0 - check the LICENSE.md file for details.

Acknowledgments

Special thanks to Dr. Leman, our DA course instructor and General Dynamics for organizing this competition.

About

Our team won the Silver medal (second place) in the General Dynamics Data Analytics competition. Source code of our project has been added here.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published