Skip to content

Latest commit

 

History

History

enhgcorpus

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Parsed Corpus of Early New High German

The Parsed Corpus of Early New High German provides a syntactic annotation of parts of Luther's Septembertestament in accordance with specifications of the Penn Treebank annotation family (also applied to other Germanic languages), created and published by Caitlin Light (2011), see below for acknowledgements and terms of use.

We provide the originally released version 0.5 of the data in its original form and (except for the addition of this readme) without any alternations as part of the collection of open source corpora maintained by the Applied Linguistics Labs at Goethe University Frankfurt and the University of Augsburg, Germany. We do so without any intend to imply an involvement with the creation of this data or its subsequent development, but as a means to keep it available to the scientific community: We found that when the original hosting platform shut down public wikis in 2014 (and ultimately ceded operating in 2019) this data was no longer available to the public. Because of its importance to the study of Early Modern High German, we thus publish the internal copy of the data that we had archived. This does not necessarily represent the most recent version of this data.

Also note that only the data is provided, the accompanying documentation (published on the website, not part of the data release) is lost. In particular, this includes the documentation any the specifics of the annotation scheme. Much of this can be recovered from the Penn-style annotations for Old English and Icelandic, as well as the PhD thesis of Caitlin Light, The syntax and pragmatics of fronting in Germanic, for which this data was created. The thesis is publicly available from ProQuest, and a summary can be found in a presentation by Caitlin Light.

Christian Chiarcos

Applied Computational Linguistics

University of Augsburg, Germany

Metadata

Version 0.5

Copyright 2011 Caitlin Light

Website: http://enhgcorpus.wikispaces.com/

Contact: [email protected]

Licensing and acknowledgments

The Parsed Corpus of Early New High German is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License along with this program. If not, see http://www.gnu.org/licenses/.

The Parsed Corpus of Early New High German is a treebank developed in part using Dan Bikel's statistical parser at http://www.cis.upenn.edu/~dbikel/software.html and Beth Randall's CorpusSearch software at http://corpussearch.sourceforge.net/. The original source text used in development of the treebank was acquired from Wikisource at http://de.wikisource.org/wiki/Lutherbibel.

This project was partially funded by the Benjamin Franklin Fellowship and the Dean's Summer Fellowship awarded by the University of Pennsylvania.

The creator asks that you please contact her at the address provided above if any errors are found, in order to aid in the further development and improvement of the treebank.