-
Notifications
You must be signed in to change notification settings - Fork 9
Home
Michael Schmitz edited this page Nov 21, 2013
·
10 revisions
To provide an easy to use, well engineered NLP stack that will enable researchers to easily engage in in higher-level research.
- Benefits
- Collaboration from companies as well as institutions. In my experience, engineering collaboration with institutions is usually quite weak.
- Can be used on grants/programs with license limitations. For example, the IARPA grant at the UW required a very open license (GPL was not OK).
- Compatible with existing University projects (i.e. Stanford CoreNLP, ClearNLP). A research-only license would mean that Stanford could not incorperate our work into their software (a research-only license conflicts with the GPL).
- It limits monetization of components, but monetization may not be our goal. If monetization is our goal, then I believe a open source base system will help gain attention and allow us to market higher-level applications. Many organizations do this (provide their basic software for free). (GitHub, Typesafe, Travis).
There are a growing number of NLP stacks.
- OpenNLP. There is no intent for models to work out of the box. Tools are not threadsafe.
- Stanford.
- ClearNLP. Tools make large compromises to exhibit research contributions (i.e. 6 GB of memory for a 0.5% gain in F measure).
- Breeze. Breeze is being divided. Chalk is the NLP portion. The number of covered tools is quite limited.
- ClearTK. Not a "swiss army knife" but a UIMA solution.
- Gate.
- Factorie. Provides some basic NLP tools, but is mostly focused on providing a DSL for probabilistic modeling.
- POS tagger for web text with open license model annotations (OpenNLP).
- Chunker (shallow parser) for web text with open license model annotations (OpenNLP).
- Taggers platform for quickly writing extractors.
- GraphLab
- Spark
- Boom