This project explores Pima Dataset to classify Diabetes.
This project can be used as a template to understand and apply Machine Learning models on real world problems.
The project uses Skilearn, numpy, pandas as supporting libraries.
The process of Machine Learning Consist of -
- Collecting the Data.
- Cleaning the Features, reducing the strongly corrilated columns.
- Understanding the behavior of the features (Curves and boundaries)
- Imputing
- Partitioning the Data into Train and Test
- Selecting the ML algorithm
- Training the Model.
- Evaluating the Performance.
- Refining the Model.
- Trying more complex algorithm.
- Cross Validation.
- Understanding Where to Stop.!!
All the above steps are applied and used in the example. Please click on the Pima-Predictions.ipynb to see the flow.