This is the new Parsley morphological parser for Latin. It's written in SFST-PL, the Stuttgart FST generation format. The morphological data itself is almost entirely the work of Morpheus from the Perseus Project.
The tools provided here generate two separate morphological parsers, in AT&T format. The first is a stemmer, which takes an inflected Latin word and tags it by part-of-speech. All possible parses are returned.
The second FST is a lemmatizer, which takes the stem component of the stemmer's output and converts it into a lemma (or headword).
A sample run of the transducer using the SFST-PL tools:
$ rake
$ fst-mor out/morphology.a
reading transducer...
analyze> filio
$ fst-mor out/lemmas.a
reading transducer...
analyze> fi_l<masc><ius_i>
To build the Latin parser, you'll first need to install the following prerequisites:
- The Stuttgart SFST library (available here.)
- A reasonably modern version of Rake (the versions included in some releases of Mac OS X are not acceptable)
There's a Debian package for SFST, so all Debian Linuxes (including Ubuntu) are in luck:
apt-get install sfst
For now you have to build SFST from the sources. This in turn will require XCode.
I'm working on a Homebrew recipe for SFST to make your life a bit easier.
git checkout
cd parsley-core
cd latin
Also included in this repository are lightweight FST implementations in two different languages:
- C/Objective-C
- Go
You're on your own elsewhere. I have an unpublished Ruby FST reader, but obnoxious things like lack of tail call optimization in the Ruby interpreter hinder its performance.
There's a good C++ FST reader included in SFST-PL, as well as the excellent C++ OpenFST.
If you happen to know of interpreters in other languages, let me know and I'll list them here.
There's no included binary (you can just use fst-mor
for that), but
the library does have good test coverage:
go get
go test
You'll need XCode to run the tests, which are written using SenTestingKit. Open the xcodeproj in XCode and compile away!