The repository contains the code for Machine Learning course 2020 (CS-433) project 1 (Higgs Boson challenge: https://www.aicrowd.com/challenges/epfl-machine-learning-higgs/leaderboards) at EPFL. More information about this challenge can be found in the folder documents
.
The project is accomplished by team INteam
with members:
- Riccardo Cadei: @RiccardoCadei
- Raphael Attias: @raphaelattias
- Shasha Jiang: @dust629
With a Test Accuracy of 0.841 we got the 7-th place out of 277 teams.
The data train.csv
and test.csv
should be found in https://github.com/epfml/ML_course/tree/master/projects/project1/data, to run the code please download and place them in the data
folder
The project has been developed and test with python3.6
.
The required library for running the models and training is numpy
.
The library for visualization is matplotlib
.
Results to predict the test datasets are generated by running:
python3 run.py
.
And the final results are saved in: /data/finalsubmission.csv
.
implementations.py
: the implementation of 6 methods to train the model.
run.py
: the results after using the selected model to predict test data.
exploration.py
: understanding the features of data with visualization.
process_data.py
: preprocessing data for model training and prediction.
crossvalidalidation.py
: using cross-validation to test the accuracy of different models.
select_parameter.py
: searching for the appropriate parameters(lambda, degree etc.) for models.
main.ipynb
: tuning the best parameters for ridge regression and predicting the accuracy of all the methods through cross validation.
plots.ipynb
: data analysis and visualizion of the accuracy and error with different choices of parameters.
documents/report.pdf
: a 2-pages report of the complete solution.