Skip to content

chemicaltree/tetra

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

TETRA (Text Revision of ACL papers) corpus

TETRA consists of documen-level revisions for the articles published at ACL-related venues, and designed based on an annotation scheme that can handle edit types beyond sentences (such as argument flow) in addition to conventional word- and phrase-level edit types.

See the paper for more information.

Data format

The dataset is formatted in xml, consisting of one xml file per paper. Here's sample of annotations in the dataset:

example_of_annotation

Each file contains the following information in xml tags.

  • meta information
    • doc id: ID of the paper (ACL Anthology)
    • editor: ID of the revised human expert
    • format: venues (conference (Conf) or workshop (WS))
    • position: first author's position (Non-student (NS) or Student (S))
    • region: region of the affiliation (Native (N) or Non-native (NN))
  • edit information
    • edit type: edit type
    • crr: edit instance by human expert
    • comments: rationale comments

Citation

Thank you for your interest in our dataset. If you use it in your research, please cite:

@misc{mita2022automated,
      title={Towards Automated Document Revision: Grammatical Error Correction, Fluency Edits, and Beyond}, 
      author={Masato Mita and Keisuke Sakaguchi and Masato Hagiwara and Tomoya Mizumoto and Jun Suzuki and Kentaro Inui},
      year={2022},
      eprint={2205.11484},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published