Collection of Machine Learning projects implemented via Python and other supporting libraries
Supporting Dataset : pima-indians-diabetes.data.csv
Description : This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases.
Goal : The objective of the dataset is to diagnostically predict whether or not a patient has diabetes,
based on certain diagnostic measurements included in the dataset
Results : Cross validation accuracy of 80% is achieved and open to all for further improvement.
Supporting Dataset : cancer.csv
Description : This data was gathered by the University of Wisconsin Hospitals, Madison and by Dr. William H. Wolberg.
The following 9 columns are features that express different types of information connected to the detected tumors.
They represent data related to: Clump Thickness, Uniformity of Cell Size, Uniformity of Cell Shape, Marginal Adhesion,
Single Epithelial Cell Size, Bare Nuclei, Bland Chromatin, Normal Nucleoli and Mitoses.
The last column is the class of the tumor and it has two possible values: 2 means that the tumor was found to be benign.
4 means that it was found to be malignant.
Goal : Predict if a tumor is benign or malignant, based on the features provided by the data.
Results : Cross validation accuracy of 96% is achieved and open to all for further improvement.
Supporting Dataset : games.csv
Description : Board games have been making a comeback lately, and deeper, more strategic boardgames, like Settlers of Catan
have become hugely popular. BoardGameGeek is a popular site where these types of board games are discussed and reviewed.
Here we have a dataset that contains 80000 board games and their associated review scores.
The data was scraped from BoardGameGeek and compiled into CSV format by Sean Beck.
Goal : The goal of this project was to create a model that can accurately predict game ratings.
Results : 60% and open to all for further improvement.
Supporting Dataset : train_u6lujuX_CVtuZ9i.csv and test_Y3wMUE5_7gLdaTN.csv
Description : This dataset has 614 rows and 13 coloumns(including target) with attributes like gender,applicant income,
marriage status and credit history and so on.
Goal : Predict whether loan can be approved or not based on features mentioned in dataset and test model on test set
Results : Cross validation accuracy of 82% is achieved and open to all for further improvement.
Supporting Dataset : Kaggle's Dogs vs Cats dataset
Description : The Kaggle competition provided 25,000 labeled photos: 12,500 dogs and the same number of cats.
Predictions were then required on a test dataset of 12,500 unlabeled photographs. It is a good beginner to intermediate
level project for Computer Vision tasks
Goal : The goal of this project is to implement transfer learning for our dataset and classify given image.
Results : 94% accuracy on classification and open to all for further improvement.
Supporting Dataset : Market_Basket_Optimisation.csv
Description : The CSV contains about 7500 transcation of an imaginary supermarket.
These transcations are arranged with items as columns.
Goal : Generate rules using Apriori algorithm
Results : 160 rules are generated indicating item X can be bought with item Y.
Supporting Dataset : fruit_data_with_colors.txt
Description : The fruits dataset was created by Dr. Iain Murray from University of Edinburgh.
Goal : Classify fruits into their respective category.
Results : 100% with KNN and Decision Tree