This library aims to provide many different models of machine learning algorithms (eg: logistic reg, SVM, neural networks…). The main goal for me is to better understand how those algorithms work by implementing them.
Thus, it does not claim to provide better performances, results or ease of use than any other existing implementation.
Files:
- Model.py: Generic base class for all learning models
- LogisticReg.py: Implementation of logistic regression
- Utils.py: Small utilitaries functions
- Parser.py: Parser for idx file format, used for example in MNIST dataset
It is possible to try this library by testing it against the MNIST dataset:
import Parser as p
import LogisticReg as lr
# Loading datasets
# xval is the cross validation dataset. Eventhough it is not used in
# this code, it should be used for tuning the parameters of the
# algorithm.
train_img = p.IdxParser("mnist/train-img").parse()
train_lbl = p.IdxParser("mnist/train-lbl").parse()
test_img = p.IdxParser("mnist/test-img").parse()
test_img = test_img.reshape((10000, 28*28))
test_lbl = p.IdxParser("mnist/test-lbl").parse()
xval_img = train_img[50000:].reshape((10000, 28*28))
xval_lbl = train_lbl[50000:]
train_img = train_img[:50000].reshape((50000, 28*28))
train_lbl = train_lbl[:50000]
# Creating a Logistic Regression model with default parameters
model = lr.LogisticReg()
# Learning, takes some time...
Theta = model.train(train_img, train_lbl, 10)
# Testing against test set
perf = model.evaluate(Theta, test_img, test_lbl)
This evaluation code gave a success ratio of 0.90 for logistic regression (Meaning that 9 over 10 images in the test set were correctly guessed).
- Improving performances with random batches learning
- More models (linear regression, neural networks, SVM,…)