Penn Attribution Relation Corpus 3.0 (PARC 3.0) paper link
Contact the owner of the corpus, Silvia Pareti for access to the corpus (You will need valid LDC licenses to PTB & PDTB).
Given an source directory, the reader will look for all files with ".xml" extension in all nested sub-directories. Each document is read into a TextAnnotation
instance with the following views defined in the ViewNames
class:
TOKENS
:TokenLabelView
that keeps gold tokenization from corpus.SENTENCE
:TokenLabelView
that keeps gold sentence split from corpus.ATTRIBUTION_RELATION
:PredicateArgumentView
. Each Attribution Relation corresponds to one predicate argument set. The "Cue" in each Attribution Relation serves as a "predicate", and "source"s and "span"s in that relation serves as arguments.- (optional)
POS
:TokenLabelView
that keeps POS tags from corpus - (optional)
LEMMA
:TokenLabelView
that keeps lemma of each token from corpus
Standard WSJ directory structure.
\PARC3
\train
\00
wsj-0001.xml
...
\01
wsj-0101.xml
...
...
\test
\23
...
\dev
\24
...
import edu.illinois.cs.cogcomp.nlp.corpusreaders.parcReader.PARC3Reader;
import edu.illinois.cs.cogcomp.nlp.corpusreaders.parcReader.PARC3ReaderConfigurator;
// Read all training data, with defualt settings (discard gold POS and LEMMA)
PARC3Reader reader = new PARC3Reader("data/PARC3/train");
or specify your own settings by creating a *.properties
file. See PARC3ReaderConfigurator
for what fields you should specify.
PARC3Reader reader = new PARC3Reader(new ResourceManager("my-parc3-config.properties"))
PARC3Reader
implements Iterable<TextAnnotation>
interface.
while (reader.hasNext()) {
TextAnnotation doc = reader.next();
...
}
or
for (TextAnnotation doc : reader) {
...
}