Machine Learning Model Evaluation and Comparison

This project was completed as a part of the course EEE-595 Statistical Machine Learning at Arizona State Universtiy during Fall 2023. This repository contains code and a comprehensive report analyzing and comparing the performance of various machine learning models on multiple datasets. The models implemented include Logistic Regression, Support Vector Machines (SVM), K-Nearest Neighbors (KNN), and Neural Networks (both fully connected and convolutional). The objective of this project is to evaluate and compare the effectiveness of different machine learning algorithms across three distinct datasets. The report includes details of data preprocessing, hyperparameter tuning, model training, and performance evaluation metrics like accuracy, precision, recall, and F1-score. We explore both grid search and random search techniques to optimize hyperparameters for each model.

Datasets

Wisconsin Breast Cancer Dataset (binary classification: benign vs. malignant tumors)
- 569 samples with 30 features
- Balanced classes: equal representation of benign and malignant cases
UCI Adult Dataset (income classification: above or below $50,000)
- 48,842 samples with 14 features (8 categorical, 6 continuous)
- Imbalanced classes with fewer high-income instances
Fashion MNIST Dataset (multiclass classification of fashion items)
- 70,000 grayscale images (28x28 pixels)
- 10 distinct classes (e.g., shirts, shoes, dresses, etc.)

Models Implemented

The following models were implemented and evaluated:

Logistic Regression:
- Trained using both Stochastic Gradient Descent (SGD) and Gradient Descent (GD)
- Hyperparameters tuned: learning rate, regularization parameter (L2), batch size, and number of epochs
Support Vector Machines (SVM):
- Explored linear, polynomial, and radial basis function (RBF) kernels
- Hyperparameters tuned: regularization parameter (C), degree (d), and gamma (γ)
K-Nearest Neighbors (KNN):
- Hyperparameters tuned: number of neighbors (k), weight function, spatial algorithm, and leaf size
- Dimensionality reduction performed using PCA to optimize performance
Neural Networks:
- Fully connected networks (FCNN) for tabular datasets
- Convolutional neural networks (CNN) for image data (LeNet-5 architecture for Fashion MNIST)
- Hyperparameters tuned: number of layers, neurons per layer, learning rate, batch size, and regularization:

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Logistic Regression		Logistic Regression
Final Report - Evaluating and Comparing Machine Learning Models for Classification.pdf		Final Report - Evaluating and Comparing Machine Learning Models for Classification.pdf
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning Model Evaluation and Comparison

Datasets

Models Implemented

About

Releases

Packages

Languages

License

SarwanShah/ASU_2023_Machine-Learning-Models-for-Classification

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Model Evaluation and Comparison

Datasets

Models Implemented

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages