Diabetes Prediction using K-Nearest Neighbors (KNN)

This repository contains a Python script for predicting diabetes using the K-Nearest Neighbors (KNN) algorithm. The code uses the popular machine learning library scikit-learn and involves preprocessing the data, handling missing values, and training a KNN classifier.

Dataset

The dataset used in this project is named "diabetes.csv." It is assumed to contain information related to diabetes, with columns such as Glucose, BloodPressure, SkinThickness, BMI, Insulin, and others.

Code Overview

Data Preprocessing:

Replace zero values in specific columns ('Glucose', 'BloodPressure', 'SkinThickness', 'BMI', 'Insulin') with the mean of non-zero values.
Split the dataset into input features (X) and output labels (y).

Train-Test Split:

Split the dataset into training and testing sets using the train_test_split function from scikit-learn.

Feature Scaling:

Standardize the features using the StandardScaler to ensure that all features have the same scale.

K-Nearest Neighbors Classification:

Create a KNN classifier with parameters (n_neighbors=11, p=2, metric='euclidean').
Train the classifier using the training data.

Prediction and Evaluation:

Predict the labels for the test set using the trained classifier.
Evaluate the model performance using confusion matrix and F1 score.

Running the Code

To run this code, make sure you have Python installed along with the required libraries specified in the script. You can install these dependencies using:

pip install pandas numpy scikit-learn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Diabetes Prediction using K-Nearest Neighbors (KNN)

Dataset

Code Overview

Data Preprocessing:

Train-Test Split:

Feature Scaling:

K-Nearest Neighbors Classification:

Prediction and Evaluation:

Running the Code

Files

README.md

Latest commit

History

README.md

File metadata and controls

Diabetes Prediction using K-Nearest Neighbors (KNN)

Dataset

Code Overview

Data Preprocessing:

Train-Test Split:

Feature Scaling:

K-Nearest Neighbors Classification:

Prediction and Evaluation:

Running the Code