This repository contains a Python script for predicting diabetes using the K-Nearest Neighbors (KNN) algorithm. The code uses the popular machine learning library scikit-learn and involves preprocessing the data, handling missing values, and training a KNN classifier.
The dataset used in this project is named "diabetes.csv." It is assumed to contain information related to diabetes, with columns such as Glucose, BloodPressure, SkinThickness, BMI, Insulin, and others.
- Replace zero values in specific columns ('Glucose', 'BloodPressure', 'SkinThickness', 'BMI', 'Insulin') with the mean of non-zero values.
- Split the dataset into input features (X) and output labels (y).
- Split the dataset into training and testing sets using the
train_test_split
function from scikit-learn.
- Standardize the features using the
StandardScaler
to ensure that all features have the same scale.
- Create a KNN classifier with parameters (n_neighbors=11, p=2, metric='euclidean').
- Train the classifier using the training data.
- Predict the labels for the test set using the trained classifier.
- Evaluate the model performance using confusion matrix and F1 score.
To run this code, make sure you have Python installed along with the required libraries specified in the script. You can install these dependencies using:
pip install pandas numpy scikit-learn