About

In this project we will implement NLTK Taggers for Slovene language.

##Reqirements

For this tagger to work, you need Python 2.7 and NLTK.

##Usage

Unitl this taggers are build into NLTK, you can download the taggers from folder slovene_taggers/ and use them in NLTK.

The example, which shows how to use Slovene taggers, is in file example.py

Slovenian explanation of tags is in jos1M/josMSD-canon-sl.tbl

##Folders and files description

evaluation/ : outputs from evaluation script. graph.m is octave code for plotting evaluation results.
jos100k/ : Slovene corpus taken from JOS project with 100.000 tagged words.
jos1M/ : Slovene corpus taken from JOS project with million tagged words.
paper :the latex paper about this project
pos/jos1M.pos : this file is used as an input for trainer program from trainer/
slovene_taggers/ : the result of this project. Here are strored Slovene Taggers, which can be used in NLTK.
slides/ : presentation slides in Slovene
trainer/ : the code forked from https://github.com/japerk/nltk-trainer. This trainer is used to train the taggers.
evaluateTaggers.sh : commands for accuracy evaluation of the taggers.
evaluateTaggersSpeed.py : commands for measuring the time spent for tagging.
example.py : this example shows, how to use Slovene taggers in NLTK.
generateTaggers.sh : commands for generating the taggers. The generation uses data pos/jos1M.pos and program trainer/train_tagger.py.
transformJOS.py : the code for transforming all .xml corpuses from jos1M/ into pos/jos1M.pos.

Provide feedback

Name		Name	Last commit message	Last commit date
Latest commit History 210 Commits
evaluation		evaluation
jos100k		jos100k
jos1M		jos1M
paper		paper
pos		pos
slides		slides
slovene_taggers		slovene_taggers
trainer		trainer
.gitignore		.gitignore
README.md		README.md
evaluateTaggers.sh		evaluateTaggers.sh
evaluateTaggersSpeed.py		evaluateTaggersSpeed.py
example.py		example.py
generateTaggers.sh		generateTaggers.sh
transformJOS.py		transformJOS.py