Perform naive Bayesian classification into an arbitrary number of classes on sets of strings. bayesian
also supports term frequency-inverse document frequency calculations (TF-IDF).
Copyright (c) 2011-2017. Jake Brukhman. ([email protected]). All rights reserved. See the LICENSE file for BSD-style license.
This is meant to be an low-entry barrier Go library for basic Bayesian classification. See code comments for a refresher on naive Bayesian classifiers, and please take some time to understand underflow edge cases as this otherwise may result in innacurate classifications.
Using the go command:
go get github.com/navossoc/bayesian
go install !$
See the GoPkgDoc documentation here.
- Conditional probability and "log-likelihood"-like scoring.
- Underflow detection.
- Simple persistence of classifiers.
- Statistics.
- TF-IDF support.
To use the classifier, first you must create some classes and train it:
import "github.com/navossoc/bayesian"
const (
Good bayesian.Class = "Good"
Bad bayesian.Class = "Bad"
)
classifier := bayesian.NewClassifier(Good, Bad)
goodStuff := []string{"tall", "rich", "handsome"}
badStuff := []string{"poor", "smelly", "ugly"}
classifier.Learn(goodStuff, Good)
classifier.Learn(badStuff, Bad)
Then you can ascertain the scores of each class and the most likely class your data belongs to:
scores, likely, _ := classifier.LogScores(
[]string{"tall", "girl"},
)
Magnitude of the score indicates likelihood. Alternatively (but with some risk of float underflow), you can obtain actual probabilities:
probs, likely, _ := classifier.ProbScores(
[]string{"tall", "girl"},
)
To use the TF-IDF classifier, first you must create some classes
and train it and you need to call ConvertTermsFreqToTfIdf() AFTER training
and before calling classification methods such as LogScores
, SafeProbScores
, and ProbScores
)
import "github.com/navossoc/bayesian"
const (
Good bayesian.Class = "Good"
Bad bayesian.Class = "Bad"
)
// Create a classifier with TF-IDF support.
classifier := bayesian.NewClassifierTfIdf(Good, Bad)
goodStuff := []string{"tall", "rich", "handsome"}
badStuff := []string{"poor", "smelly", "ugly"}
classifier.Learn(goodStuff, Good)
classifier.Learn(badStuff, Bad)
// Required
classifier.ConvertTermsFreqToTfIdf()
Then you can ascertain the scores of each class and the most likely class your data belongs to:
scores, likely, _ := classifier.LogScores(
[]string{"tall", "girl"},
)
Magnitude of the score indicates likelihood. Alternatively (but with some risk of float underflow), you can obtain actual probabilities:
probs, likely, _ := classifier.ProbScores(
[]string{"tall", "girl"},
)
Use wisely.