Skip to content

Latest commit

 

History

History
69 lines (55 loc) · 3.05 KB

PARCReader.md

File metadata and controls

69 lines (55 loc) · 3.05 KB

PARC 3.0 Reader

Overview

Penn Attribution Relation Corpus 3.0 (PARC 3.0) paper link

Contact the owner of the corpus, Silvia Pareti for access to the corpus (You will need valid LDC licenses to PTB & PDTB).

Implementation details

Given an source directory, the reader will look for all files with ".xml" extension in all nested sub-directories. Each document is read into a TextAnnotation instance with the following views defined in the ViewNames class:

  • TOKENS: TokenLabelView that keeps gold tokenization from corpus.
  • SENTENCE: TokenLabelView that keeps gold sentence split from corpus.
  • ATTRIBUTION_RELATION: PredicateArgumentView. Each Attribution Relation corresponds to one predicate argument set. The "Cue" in each Attribution Relation serves as a "predicate", and "source"s and "span"s in that relation serves as arguments.
  • (optional)POS: TokenLabelView that keeps POS tags from corpus
  • (optional)LEMMA: TokenLabelView that keeps lemma of each token from corpus

Usage

Directory Structure

Standard WSJ directory structure.

\PARC3
	\train
   		\00
			wsj-0001.xml
 	      	...
       	\01
        	wsj-0101.xml
            ...
        ...
    \test
    	\23
        	...
    \dev
    	\24
        	...

Usage in Java

import edu.illinois.cs.cogcomp.nlp.corpusreaders.parcReader.PARC3Reader;
import edu.illinois.cs.cogcomp.nlp.corpusreaders.parcReader.PARC3ReaderConfigurator;

// Read all training data, with defualt settings (discard gold POS and LEMMA)
PARC3Reader reader = new PARC3Reader("data/PARC3/train"); 

or specify your own settings by creating a *.properties file. See PARC3ReaderConfigurator for what fields you should specify.

PARC3Reader reader = new PARC3Reader(new ResourceManager("my-parc3-config.properties"))

PARC3Reader implements Iterable<TextAnnotation> interface.

while (reader.hasNext()) {
	TextAnnotation doc = reader.next();
	...
}

or

for (TextAnnotation doc : reader) {
	...
}