This library is a set of tools and functions to help natural language processing tasks specific for Portuguese language.
Fuctions and tools covered:
-
OK - Sentence Splitting
-
OK - Word Tokenization
-
OK - Lemmatization
-
OK - Word Morfological analysis
-
OK - Stemming
-
Word Spelling check
-
Portuguese vocabulary
-
Grammar checker
-
Multi-Word Expression Tokenization
-
Portuguese language model
-
Portuguese WordVectors (Word2Vec model)
-
Part-of-Speech tagger (two models, one trained in MacMorpho, other in Google UTB)
-
Corpus
- MacMorpho revisited
- Machado
- Bosque
- Universal Tree Bank
- Product Reviews
-
Lexicons and wordnets
- WordNet.BR
- OpenWordNet
- VerbNet
- Unitex PB
- Unitex PT
- Sentiment Lexicons
- Gazetters
- Named Entities
-
Model and interface for Stanford CoreNLP Dependency Parser on UTB
-
Interface for PALAVRAS parser
-
Named entity detection
-
WordSense Disamgbiguation
-
Sentiment Analysis
-
Text Summarization