KDD-CUP -2015

In this project, we address MOOC attrition using machine learning techniques for KDD Cup 2015 dataset. Feature engineering is performed on this dataset to generate categorical, time and course completion based features and avoid overfitting. We explored sampling methods to avoid the original class imbalance problem. KNN, Logistic Regression, Neural Network, Random Forest, Gradient Boosting and XGBoost algorithms are applied to obtain the predictions. We picked the best models out of these as estimators to the weighted voting classifier. Our model reports an accuracy of 87.74% and AUC of 87.97%.

We have the feature engineered individual CSV files in the data folder and notebooks contain jupyter notebook of different experiments conducted.

Submissions Folder contain the final submission for the course.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

KDD-CUP -2015

Files

README.md

Latest commit

History

README.md

File metadata and controls

KDD-CUP -2015