This library is a set of tools and functions to help natural language processing tasks specific for Portuguese language.
Fuctions and tools covered:
OK - Sentence Splitting
OK - Word Tokenization
OK - Lemmatization
OK - Word Morfological analysis
OK - Stemming
Word Spelling check
Portuguese vocabulary
Grammar checker
Multi-Word Expression Tokenization
Portuguese language model
Portuguese WordVectors (Word2Vec model)
Part-of-Speech tagger (two models, one trained in MacMorpho, other in Google UTB)
- MacMorpho revisited
- Machado
- Bosque
- Universal Tree Bank
- Product Reviews
Lexicons and wordnets
- WordNet.BR
- OpenWordNet
- VerbNet
- Unitex PB
- Unitex PT
- Sentiment Lexicons
- Gazetters
- Named Entities
Model and interface for Stanford CoreNLP Dependency Parser on UTB
Interface for PALAVRAS parser
Named entity detection
WordSense Disamgbiguation
Sentiment Analysis
Text Summarization