Skip to content

Releases: CopticScriptorium/corpora

Spring 2024 Release

30 May 12:15
d4d786c
Compare
Choose a tag to compare

We are pleased to announce release 5.0.0 of Coptic Scriptorium! Our data now includes over 1,288,229 tokens of searchable, linguistically analyzed Coptic data from dozens of ancient Coptic works.

This release also marks the introduction of Bohairic Coptic data to our corpus holdings: the repository now contains Bohairic Bible materials, covering Mark 1-16 and 1 Cor. 1-16, with manually reviewed segmentation for the entire corpus, and manual tagging and treebanking for chapters 1-5 in each book. Segmentation and tagging were reviewed in collaboration with Nicholas Wagner, and treebanking was done in collaboration with Nina Speranskaja. As a result of this work, we are in the process of compiling new NLP tools and guidelines specifically for Bohairic.

In addition, the release includes corrections and updates to existing corpora as well as the addition of several new Sahidic works and documents:

A. Sections of five works by Shenoute of Atripe:

B. New documents were added to existing works:

C. Newly added translation spans for Pistis Sophia, aligned by Randy Komforty

These join the newly treebanked and tagged Bohairic data, which can be found here:

We are very grateful to all of our collaborators and contributors, without whom this project could not function. We welcome Nicholas Wagner to the team and warmly thank Randy Komforty for his work on Pistis Sophia, and Nina Sepranskaja for her treebanking work.

As with all our releases, raw machine readable data for all corpora can be found, including morphological and syntactic analysis, as well as named entity recognition and entity linking (currently only for Sahidic), in this GitHub repository, in a variety of popular formats:
https://github.com/CopticScriptorium/corpora

You can also search for complex linguistic annotations in the data using our ANNIS server - please see our tutorial here to get started with some query tips and a helpful cheat sheet:
https://copticscriptorium.org/ANNIS_tutorial

Fall 2023 corpus release

25 Oct 13:52
2877117
Compare
Choose a tag to compare

We are pleased to announce release 4.5.0 of Coptic Scriptorium! Our data now includes over 1,278,500 tokens of searchable, linguistically analyzed Coptic data from dozens of ancient Coptic works (an increase of over 11,500 tokens from the previous release).

This release corrects a large number of consistency errors identified in our existing data, and also adds some new documents:

We are very grateful to all of our collaborators and contributors, without whom this project could not function. We welcome Christine Ayad, Lydia Bremer-McCollum, Adeline Harrington, and Nina Speranskaja.

As with all releases, raw machine readable data for all corpora can be found, including morphological and syntactic analysis, as well as named entity recognition and entity linking, in this GitHub repository, in a variety of popular formats.

You can also search for complex linguistic annotations in the data using our ANNIS server - please see our new tutorial here to get started with some query tips and a helpful cheat sheet:

https://copticscriptorium.org/ANNIS_tutorial

Fall 2022 corpus release

14 Oct 15:09
78730ee
Compare
Choose a tag to compare

We are pleased to announce release 4.4.0 of Coptic Scriptorium. Our data now includes over 1,267,000 tokens of searchable, linguistically analyzed Coptic data from dozens of ancient Coptic works (an increase of almost 100,000 tokens from the previous release). We are very grateful to all of our collaborators and contributors, without whom this project could not function.

This release corrects a large number of consistency errors identified in our existing data, and also adds some new documents:

We would like to thank the Marcion Project for making the underlying digitized text of Pistis Sophia available, and all of the annotators for their hard work. Tamara Siuda, Rebecca Krawiec, Philippe Zaher, and Lance Martin contributed, in addition to Amir and Carrie. As our current DHAG grant ends, we would like to give special thanks to Lance, who has been working as our DH specialist on the project since 2019, for doing an amazing job of keeping track of all the data and the various tasks he’s been in charge of over the past three years!

All documents have metadata for word segmentation, tagging, parsing, entities and identities (Wikipedia identifiers for named entities) to indicate whether those annotations are machine annotations only (automatic), checked for accuracy by an expert in Coptic (checked), or closely reviewed for accuracy, usually as a result of treebanking (gold).

Spring 2022 corpus release

12 May 13:55
a120735
Compare
Choose a tag to compare

It is our pleasure to announce release 4.3.0 of Coptic Scriptorium corpora, which currently cover over 1,175,000 tokens of searchable, linguistically analyzed Coptic data from dozens of ancient Coptic works. New in this release:

Corrections and additional annotations:

  • Pilot work adding partial Arabic translations (work by Philippe Zaher)
  • Improvements and error corrections to a variety of works (including Because of You Too O Prince of Evil, Dormition of John, Book of Ruth and Homilies of Proclus)

The newly released material encompasses over 57,000 tokens of semi-automatically annotated data. We would like to give special thanks to the Marcion Project for making much of the underlying digitized text available, and the annotators whose hard work has made this release possible.

All documents have metadata for word segmentation, tagging, parsing, entities and identities (Wikipedia identifiers for named entities) to indicate whether those annotations are machine annotations only (automatic), checked for accuracy by an expert in Coptic (checked), or closely reviewed for accuracy, usually as a result of treebanking (gold).

Fall 2021 corpus release

13 Apr 17:09
6829991
Compare
Choose a tag to compare

It is our pleasure to announce the latest data release from Coptic Scriptorium, version 4.2.0. This release contains both new Coptic material and additions to older datasets, as well as expanding our entity annotations and named-entity linking to all of our data, including the semi-automatically annotated Old Testament. This also means automatic updates to all of our interfaces, such as the recently added example usage functionality in the Coptic Dictionary Online, which is linked to the corpora.

The new material, including more digitized data courtesy of the Marcion project, as well as manually digitized and corrected OCR data from out of print editions includes:

All documents have metadata for word segmentation, tagging, parsing, entities and identities (Wikipedia identifiers for named entities) to indicate whether those annotations are machine annotations only (automatic), checked for accuracy by an expert in Coptic (checked), or closely reviewed for accuracy, usually as a result of treebanking (gold).

Spring 2021 corpus release

02 Apr 19:34
ef141be
Compare
Choose a tag to compare

We are please to announce the following additions/updates, as well as the addition of entity annotations and named entity Wikipedia links to the automatically processed New Testament corpus (sahidica.nt):

  • Life of John the Kalybites, parts 1 and 2 (annotations by Lance Martin, Tamara, Siuda, and Caroline T. Schroeder)
  • Mysteries of John the Evangelist, parts 1 and 2 (Mitchell Abrams, Lance Martin, Tamara Siuda, Caroline T. Schroeder)
  • Pseudo-Ephrem, The Asceticon of Apa Ephrem, parts 1 and 2 (Lance Martin and Caroline T. Schroeder)
  • Pseudo-Timothy of Alexandria Discourses, Discourse on Abbaton, parts 1 and 2 (Elizabeth Davidson, Lance Martin, Caroline T. Schroeder, and Amir Zeldes)
  • Magical Papyri (Korshi Dosoo, Edward O. D. Love, Markéta Preininger, Lance Martin, Caroline T. Schroeder, and Amir Zeldes)

Expansions and Improvements of existing corpora:

  • Apa Johannes Canons (Diliana Atanassova, Caroline T. Schroeder, Lance Martin, and Amir Zeldes)
  • Apophthegmata Patrum (Marina Ghaly, Christine Luckritz Marquis, Caroline T. Schroeder)
  • New release of sahidica.nt, now with semi-automatically disambiguated named entity linking

All documents have metadata for word segmentation, tagging, parsing, entities and identities (Wikipedia identifiers for named entities) to indicate whether those annotations are machine annotations only (automatic), checked for accuracy by an expert in Coptic (checked), or closely reviewed for accuracy, usually as a result of treebanking (gold).

Summer 2020 corpus release

01 Sep 19:43
49d26d2
Compare
Choose a tag to compare

We are please to announce the following additions/updates, as well as the addition of entity annotations and named entity Wikipedia links:

  • John of Constantinople, on Penitence and Abstinence
  • Pseudo-Chrysostom:
    • On the Canaanite Woman
    • On Susanna
  • Pseudo-Basil of Caesarea, on the End of the World and the Temple of Solomon
  • Life of Pisentius, parts 1-2
  • Expansions and improvements of existing corpora:
    • More Apophthegmata Patrum
    • Further material from Shenoute’s works:
      • God Says Through Those Who Are His (including parallel witnesses and new material)
      • Some Kinds of People Sift Dirt

All documents have metadata for word segmentation, tagging, parsing, entities and identities (Wikipedia identifiers for named entities) to indicate whether those annotations are machine annotations only (automatic), checked for accuracy by an expert in Coptic (checked), or closely reviewed for accuracy, usually as a result of treebanking (gold).

Winter 2020 corpus release

16 Mar 19:27
78c41d9
Compare
Choose a tag to compare

We are please to announce the following additions/updates:

  • Saints’ lives and martyrologies
    • Martyrdom of Victor the General (parts 3-8; this work is now complete)
    • Life of Aphou
    • Life of Paul of Tamma
    • Life of Phib
  • More works by Archimandrite Shenoute of Atripe:
    • God Says Through Those Who Are His
    • Whoever Seeks God Will Find
    • Unknown Work 5.1
  • Miscellaneous
    • Three Discourses of Pseudo-Athanasius:
      • Homily on Matthew 20
      • On Mercy and Judgment
      • On the Soul and the Body
    • The Instructions of Apa Pachomius
    • Canons of Apa Johannes (5 new and revised documents, digital edition provided by Diliana Atanassova)

All documents have metadata for word segmentation, tagging, and parsing to indicate whether those annotations are machine annotations only (automatic), checked for accuracy by an expert in Coptic (checked), or closely reviewed for accuracy, usually as a result of manual parsing (gold).

V3.0.1 - Minor corrections

25 Oct 13:39
9d1f537
Compare
Choose a tag to compare
  • Added missing corpus metadata to I See Your Eagerness and Doc Papyri
  • Minor corrections to one document in each of those corpora

Fall 2019 corpus release

01 Oct 21:14
31905d8
Compare
Choose a tag to compare

Coptic Scriptorium is happy to announce our latest data release, including a variety of new sources thanks to our collaborators (digitized data courtesy of the Marcion and PAThs projects!). New in this release are:

  • Saints' lives
    • Life of Cyrus
    • Life of Onnophrius
    • Lives of Longinus and Lucius
    • Martyrdom of Victor the General (part 2)
  • Miscellaneous:
    • Dormition of John
    • Homilies of Proclus
    • Letter of Pseudo-Ephrem

We are also releasing expansions to some of our existing corpora, including:

  • Canons of Johannes (new material annotated by Elizabeth Platte and Caroline T. Schroeder, digital edition provided by Diliana Atanassova)
  • Apophthegmata Patrum
  • A large number of corrections to most of our existing corpora, which are being republished in this release.

All documents have metadata for word segmentation, tagging, and parsing to indicate whether those annotations are machine annotations only (automatic), checked for accuracy by an expert in Coptic (checked), or closely reviewed for accuracy, usually as a result of manual parsing (gold).

Also new in this release are stable identifiers and links to PATHS identifiers (https://atlas.paths-erc.eu/cite), specifically some corpora now contain metadata identifiers for paths manuscripts, paths works, and paths authors.