This project provides a sample solution to Allstate Claims Severity competition on Kaggle.
- Part 1: Data Disvovery Basic data analysis. Similar analysis can be found at dnkirill's data discovery and Santhosh Sharma's exploratory study.
- Part 2: XGBoost model training and tuning Show a framework step by step for hyper-parameters optimization of XGBoost based on Bayersian Optimization.
- Part 3: LightGBM model training and tuning Show a framework step by step for hyper-parameters optimization of LightGBM based on Bayersian Optimization.
- Part 4: ANN model training and tuning A simple example for parameters tuning of ANN model.
- Part 5: Final result with blending and stack Combine predictions of XGBoost, LightGBM and ANN with two linear functions, Ridge and XGBoost gblinear.
The dataset is available for free on Kaggle's competition page.
This project uses the following software (if version number is omitted, latest version is recommended):
- Python stack: python 2.7, numpy, scipy, sklearn, pandas, matplotlib.
- XGBoost: XGBoost is short for “Extreme Gradient Boosting”, where the term “Gradient Boosting” is proposed in the paper Greedy Function Approximation: A Gradient Boosting Machine, by Friedman.
- LightGBM: A fast, distributed, high performance gradient boosting (GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
- Keras: Keras is a high-level neural networks library. It is shown in the post that how to access to GPUs to speed up the training of the deep learning models by using the Amazon Web Service (AWS) infrastructure.
- BayesianOptimization: A Python implementation of global optimization with gaussian processes.