k-NN Algorithm in Golang, Haskell, Rust, F#, Julia, OCaml, Octave and Factor

This repository contains naive implementations of the k-NN algorithm in several languages (for k = 1), and an extra one in Golang with some optimizations.

Blog Chain

This was inspired by a chain of blog posts:

the last one implemented it in Factor
the one before that did it in Rust.
that one was inspired by a blog post which had it in F# and OCaml, and a follow-up which improves the first OCaml implementation.

This repository adds the naive implementation in Golang and Haskell, an additional implementation in Golang with some optimizations, and thanks to this guy, a couple implementations in Julia and Octave.

Comparison

Performance comparisons between the naive implementations in each language were performed on a freshly spun up c3.xlarge EC2 instance as follows:

Install Golang, Haskell, Rust, F#, OCaml and Octave, either with apt-get or building from source ./configure && make && make install # ...etc. Install the latest nightly Julia build with dpkg. Download Factor.
Write the (naive) code for Golang and Haskell. Copy-paste the code for Rust, F#, OCaml, Julia, Octave, and Factor.
Compile executable binaries for Haskell, Rust, F#, and OCaml. Run the Factor code in the scratchpad REPL and the Octave code in the Octave REPL. Run the Golang code with go run and the Julia code with julia.

Results

Julia: 1.557s - 3.432s*
Octave: 2.540s
Golang: 4.701s
Factor: 6.358s
OCaml: 12.757s
F#: 23.507s
Rust: 78.138s
Haskell: 91.581s

Julia has an asterisk because its initial run is slower than all subsequent runs. I'll have to rethink how to make this a fair experiment...

Golang

$ time go run golang-k-nn.go

0.944

real  0m4.701s
user  0m4.582s
sys 0m0.136s

Haskell

$ time ./haskell-k-nn

Percentage correct: 472

real  1m31.581s
user  1m29.191s
sys 0m2.384s

Rust

$ time ./rust-k-nn

Percentage correct: 94.4%

real	1m18.138s
user	1m17.980s
sys	0m0.155s

F#

$ time ./fsharp-k-nn.exe

start...
Percentage correct:94.400000

real	0m23.507s
user	0m22.751s
sys	0m0.798s

Julia

$ julia julia/julia-k-nn.jl

Percentage correct: 94.4%
elapsed time: 3.438748733 seconds (656566752 bytes allocated, 2.03% gc time)
Percentage correct: 94.4%
elapsed time: 1.560125781 seconds (566037680 bytes allocated, 6.06% gc time)

OCaml

$ time ./ocaml-k-nn

Percentage correct:94.400000

real	0m12.757s
user	0m12.500s
sys	0m0.257s

Octave

$ cp octave/octave-k-nn.m octaveknn.m
$ octave

octave:1> octaveknn;
Percentage correct: 94.400000%
Elapsed time is 2.53968 seconds.

$ rm octaveknn.m

Factor

$ mkdir -p $FACTOR_HOME/work/k-nn
$ cp factor/factor-k-nn.factor $FACTOR_HOME/work/k-nn
$ cp *.csv $FACTOR_HOME/work/k-nn
$ $FACTOR_HOME/factor

IN: scratchpad USE: factor-k-nn
Loading resource:work/k-nn/factor-k-nn.factor
Loading resource:basis/formatting/formatting.factor
Loading resource:basis/formatting/formatting-docs.factor

IN: scratchpad gc [ k-nn ] time
Percentage correct: 94.400000
Running time: 6.357621145 seconds

Optimized implementation in Golang

For Golang, an additional implementation is given which is signficantly faster, but suffers no loss in accuracy. It involves two optimizations:

Short-circuit distance calculations between a test case and a training case that are necessarily suboptimal. In other words, if you know the distance to one potential nearest neighbour is 100, and half-way through calculating the distance to another potential nearest neighbour you already have a distance-so-far of 105, stop calculating and move on to the next candidate for nearest neighbour.
Use goroutines to parallelize the computations. The way this was done was not ideal, because the parallelism isn't in the classification algorithm itself, instead it parellelizes the classification of the members of the validation sample. However, it's is easy enough to "do it right", and what's currently there is good enough to see how significant the gains are when firing on all your cores.

$ time go run golang-k-nn-speedup.go

0.944

real  0m1.375s
user  0m3.314s
sys 0m0.117s

Contributing

Tell me if I should use special compiler flags to improve performance for some of the languages.
Tell my why this experiment is invalid.
Improve naive implementations without changing the spirit of the algorithm (e.g. use eager evaluation in Haskell).
Add optimized implementations of k-NN which improve performance at no cost to accuracy.
Add implementations for other languages (with compilation and/or run instructions).

TODO

Make the Go stuff a useable package
Explore Accuracy vs Time tradeoffs of only considering some of the training set or some of the pixels when classifying a test case
Make it easy to experiment with a matrix of different options
Do it in C
Do it in Python with scikit-learn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

k-NN Algorithm in Golang, Haskell, Rust, F#, Julia, OCaml, Octave and Factor

Blog Chain

Comparison

Results

Golang

Haskell

Rust

F#

Julia

OCaml

Octave

Factor

Optimized implementation in Golang

Contributing

TODO

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
factor		factor
fsharp		fsharp
haskell		haskell
julia		julia
ocaml		ocaml
octave		octave
rust		rust
README.md		README.md
fsharp-k-nn.exe		fsharp-k-nn.exe
golang-k-nn-speedup.go		golang-k-nn-speedup.go
golang-k-nn.go		golang-k-nn.go
haskell-k-nn		haskell-k-nn
ocaml-k-nn		ocaml-k-nn
rust-k-nn		rust-k-nn
trainingsample.csv		trainingsample.csv
validationsample.csv		validationsample.csv

amitkgupta/nearest_neighbour

Folders and files

Latest commit

History

Repository files navigation

k-NN Algorithm in Golang, Haskell, Rust, F#, Julia, OCaml, Octave and Factor

Blog Chain

Comparison

Results

Golang

Haskell

Rust

F#

Julia

OCaml

Octave

Factor

Optimized implementation in Golang

Contributing

TODO

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages