Skip to content
This repository has been archived by the owner on Feb 21, 2025. It is now read-only.

Available Corpora

knoxa edited this page Aug 3, 2017 · 8 revisions

Below is a list of freely available text corpora, which may be useful for the development or testing of Baleen. The list is not exhaustive, and Baleen has not been developed to specifically work with any of the following so performance may vary.

A larger list of corpora, along with a list of other NLP related tools, is available on Stanford University's website: http://www-nlp.stanford.edu/links/statnlp.html#Corpora