Skip to content

IMAGO-Catalogues-Jjanes/TEIcatalogs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TEICatalogs: Corpus of encoded 19th and 20th catalogs

This repository contains TEI files of 19th and 20th exhibition catalogs.

Production

Those files were created thanks to this pipeline:

Segmentation and transcription were done in eScriptorium, using models trained with Kraken on datasets from here and here.
Python data extraction which transformed the ALTO4 files extracted from eScriptorium to TEI files is accessible here.

Manual correction is done between each step of the pipeline.
Since the Layout analysis has been corrected for each catalogs, ALTO4 files extracted from eScriptorium can then be used to train a more efficient segmentation model.

The TEI files were built in order to stick to the ODD done by Caroline Corbières.

Repository

This repository presents, for each catalog, images, alto4 files extracted from eScriptorium, TEI and csv file.
The css file àffichage_TEI.css allows you to correct the TEI files more easily.

Credits

Documents have been encoded by Juliette Janes, intern of the Artl@s project, with the help of Simon Gabay under the supervision of Béatrice Joyeux-Prunel.

Licence

Images from catalogs published prior 1920 and transcriptions are CC-BY.
The other images are extracts of catalogs published after 1920 and are the intellectual property of their productor.
68747470733a2f2f692e6372656174697665636f6d6d6f6e732e6f72672f6c2f62792f322e302f38387833312e706e67

Cite this repository

Juliette Janes, Simon Gabay, Béatrice Joyeux-Prunel, TEICatalogs: Corpus of encoded 19th and 20th catalogs, 2021, Paris: ENS Paris https://github.com/Juliettejns/TEIcatalogs/

Contacts

If you have any questions or remarks, please contact [email protected] and [email protected].