Skip to content

Latest commit

 

History

History
8 lines (4 loc) · 835 Bytes

README.md

File metadata and controls

8 lines (4 loc) · 835 Bytes

KDD-CUP -2015

In this project, we address MOOC attrition using machine learning techniques for KDD Cup 2015 dataset. Feature engineering is performed on this dataset to generate categorical, time and course completion based features and avoid overfitting. We explored sampling methods to avoid the original class imbalance problem. KNN, Logistic Regression, Neural Network, Random Forest, Gradient Boosting and XGBoost algorithms are applied to obtain the predictions. We picked the best models out of these as estimators to the weighted voting classifier. Our model reports an accuracy of 87.74% and AUC of 87.97%.

We have the feature engineered individual CSV files in the data folder and notebooks contain jupyter notebook of different experiments conducted.

Submissions Folder contain the final submission for the course.