Set of different phonetic encoders' implementations.
To install:
$ go get -v github.com/f1monkey/phonetic
The fastest algorithm in this library. Soundex is used to encode words into a phonetic code for matching similar sounding words with different spellings. It was developed for indexing English language names. Wiki page.
Code example:
package main
import (
"fmt"
"github.com/f1monkey/phonetic/soundex"
)
func main() {
e := soundex.NewEncoder()
result := e.Encode("orange")
fmt.Println(result)
// prints: O652
}
The Metaphone encoder converts words into a phonetic code that represents their pronunciation for comparing words based on their phonetic properties, rather than their spelling. The Metaphone encoder was designed for English. Wiki page
Code example
package main
import (
"fmt"
"github.com/f1monkey/phonetic/metaphone"
)
func main() {
e := metaphone.NewEncoder()
result := e.Encode("orange")
fmt.Println(result)
// prints: ORNJ
}
Cologne phonetics (Kölner Phonetik) is a phonetic algorithm used for indexing German words by their sound, allowing for name and word matching in German language databases. Wiki page
Code example:
package main
import (
"fmt"
"github.com/f1monkey/phonetic/cologne"
)
func main() {
e := cologne.NewEncoder()
result := e.Encode("Großtraktor")
fmt.Println(result)
// prints: 47827427
}
Caverphone2 is a phonetic algorithm used for indexing and matching names, particularly in English and New Zealand languages. Wiki page
package main
import (
"fmt"
"github.com/f1monkey/phonetic/caverphone2"
)
func main() {
e := caverphone2.NewEncoder()
result := e.Encode("orange")
fmt.Println(result)
// prints: ARNK111111
}
It's a Go port of the original PHP library BMPM is a phonetic algorithm used for indexing and matching names in multiple languages. Contains a huge amount of different rules to transform a word to it's phonetic representation. Current implementation is relatively slow.
To reduce outcoming binary size, the three rulesets were split into different packages:
github.com/f1monkey/phonetic/beidermorse
- generic rules (for general usage)github.com/f1monkey/phonetic/beidermorse/beidermorseash
- ashkenazi rulesgithub.aaakk.us.kg/f1monkey/phonetic/beidermorse/beidermorsesep
- sephardic rules
Each package contains exact
and approx
(default) rulesets. To use exact
ruleset, you should pass a special option to encoder (see in example).
Code examples:
generic
ruleset withapprox
accuracyimport ( "fmt" "github.com/f1monkey/phonetic/beidermorse" ) func main() { encoder, _ := beidermorse.NewEncoder() result := encoder.Encode("orange") fmt.Println(result) // prints: [orangi oragi orongi orogi orYngi Yrangi Yrongi YrYngi oranxi oronxi orani oroni oranii oronii oranzi oronzi urangi urongi] }
generic
ruleset withexact
accuracyimport ( "fmt" "github.com/f1monkey/phonetic/beidermorse" ) func main() { encoder, _ := beidermorse.NewEncoder(beidermorse.WithAccuracy(beidermorse.Exact)) result := encoder.Encode("orange") fmt.Println(result) // prints: [orange oranxe oranhe oranje oranZe orandZe] }
generic
ruleset withexact
accuracy andenglish
language with buffer reusing (to reduce GC pressure)import ( "fmt" "github.com/f1monkey/phonetic/beidermorse" ) func main() { encoder, err = beidermorse.NewEncoder( beidermorse.WithAccuracy(beidermorse.Exact), beidermorse.WithLang(beidermorse.English), beidermorse.WithBufferReuse(true), ) result := encoder.Encode("orange") fmt.Println(result) // prints: [orenk orenge orendS orendZe oronk oronge orondS orondZe orank orange orandS orandZe arenk arenge arendS arendZe aronk aronge arondS arondZe arank arange arandS arandZe] }
ashkenazi
ruleset withapprox
accuracyimport ( "fmt" "github.com/f1monkey/phonetic/beidermorseash" ) func main() { encoder, _ := beidermorseash.NewEncoder() result := encoder.Encode("orange") fmt.Println(result) // prints: [orangi orongi orYngi Yrangi Yrongi YrYngi oranzi oronzi orani oroni oranxi oronxi urangi urongi] }
sephardic
ruleset withapprox
accuracyimport ( "fmt" "github.com/f1monkey/phonetic/beidermorsesep" ) func main() { encoder, _ := beidermorsesep.NewEncoder() result := encoder.Encode("orange") fmt.Println(result) // prints: [uranzi uranz uranS uranzi uranz uranhi uranh] }
- Soundex
goos: linux goarch: amd64 pkg: github.com/f1monkey/phonetic/soundex cpu: AMD Ryzen 9 6900HX with Radeon Graphics Benchmark_Encoder_Encode-16 14173989 99.21 ns/op 8 B/op 1 allocs/op PASS ok github.com/f1monkey/phonetic/soundex 1.497s
- Metaphone
goos: linux goarch: amd64 pkg: github.com/f1monkey/phonetic/metaphone cpu: AMD Ryzen 9 6900HX with Radeon Graphics Benchmark_Encoder_Encode-16 6451292 267.1 ns/op 48 B/op 3 allocs/op PASS ok github.com/f1monkey/phonetic/metaphone 1.916s
- Cologne phonetics
goos: linux goarch: amd64 pkg: github.com/f1monkey/phonetic/cologne cpu: AMD Ryzen 9 6900HX with Radeon Graphics Benchmark_Encoder_Encode-16 3737944 374.8 ns/op 104 B/op 3 allocs/op PASS ok github.com/f1monkey/phonetic/cologne 1.729s
- Caverphone2
goos: linux goarch: amd64 pkg: github.com/f1monkey/phonetic/caverphone2 cpu: AMD Ryzen 9 6900HX with Radeon Graphics Benchmark_Encoder_Encode-16 1864532 641.7 ns/op 40 B/op 3 allocs/op PASS ok github.com/f1monkey/phonetic/caverphone2 1.851s
- Beider-Morse
With buffer reuse:
goos: linux goarch: amd64 pkg: github.com/f1monkey/phonetic/beidermorse cpu: AMD Ryzen 9 6900HX with Radeon Graphics Benchmark_Encoder_Encode_En_Approx-16 5769 219152 ns/op 21264 B/op 146 allocs/op Benchmark_Encoder_Encode_En_Exact-16 13203 82072 ns/op 9199 B/op 84 allocs/op Benchmark_Encoder_Encode_Ru_Approx-16 30060 54323 ns/op 6093 B/op 48 allocs/op Benchmark_Encoder_Encode_Ru_Exact-16 37522 28353 ns/op 2657 B/op 26 allocs/op
goos: linux goarch: amd64 pkg: github.com/f1monkey/phonetic/beidermorse cpu: AMD Ryzen 9 6900HX with Radeon Graphics Benchmark_Encoder_Encode_BufferReuse_En_Approx-16 10000 129346 ns/op 6126 B/op 130 allocs/op Benchmark_Encoder_Encode_BufferReuse_En_Exact-16 23198 48813 ns/op 2297 B/op 72 allocs/op Benchmark_Encoder_Encode_BufferReuse_Ru_Approx-16 48902 29909 ns/op 1297 B/op 41 allocs/op Benchmark_Encoder_Encode_BufferReuse_Ru_Exact-16 65834 16260 ns/op 485 B/op 22 allocs/op