Skip to content

Python Library for Natural Language Processing for Portuguese Language

Notifications You must be signed in to change notification settings

pedrobalage/nlppt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Natural Language Processing for Portuguese (NLPPT)

This library is a set of tools and functions to help natural language processing tasks specific for Portuguese language.

Fuctions and tools covered:

  • OK - Sentence Splitting

  • OK - Word Tokenization

  • OK - Lemmatization

  • OK - Word Morfological analysis

  • OK - Stemming

  • Word Spelling check

  • Portuguese vocabulary

  • Grammar checker

  • Multi-Word Expression Tokenization

  • Portuguese language model

  • Portuguese WordVectors (Word2Vec model)

  • Part-of-Speech tagger (two models, one trained in MacMorpho, other in Google UTB)

  • Corpus

    • MacMorpho revisited
    • Machado
    • Bosque
    • Universal Tree Bank
    • Product Reviews
  • Lexicons and wordnets

    • WordNet.BR
    • OpenWordNet
    • VerbNet
    • Unitex PB
    • Unitex PT
    • Sentiment Lexicons
    • Gazetters
    • Named Entities
  • Model and interface for Stanford CoreNLP Dependency Parser on UTB

  • Interface for PALAVRAS parser

  • Named entity detection

  • WordSense Disamgbiguation

  • Sentiment Analysis

  • Text Summarization

About

Python Library for Natural Language Processing for Portuguese Language

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages