Build a Mets file (Page Collections file) to easily work with Aletheia.
See: Page Collections in the Aletheia User Guide.
You can also use the METS file for the OCR-D framework (https://github.com/OCR-D/core).
- Name and path of the image file (without file extension).
- If available the matching PAGE XML file.
- Both files should have the same name and only differ in their file extension.
- The files should be stored in relevant folders:
- e.g. the image files in the folder
jpg
- and the PAGE XML files in the folder
page
- e.g. the image files in the folder
imagefolder
name of the image file folderpagefolder
name of the PAGE file folderimageFormat
Format of image filesnoIMAGE=yes
Indication that no image files can be specified,noPAGE=yes
Indication that no PAGE files can be specified or are availabledrive
The drive letter from windows file system.
The link
element contains the path to the image
or PAGE
file.
Note: See the example file in the example folder. Use only a slash for seperating the folders, dont use a backslash also when you use the Windows OS.
<?xml version="1.0" encoding="UTF-8"?>
<gt>
<link>[Path to the Image or PAGE file]/[Name of the File without Extension]</link>
</gt>
java -jar ../saxon9he.jar -xsl:../xsl/makeAletheia_mets.xsl -s:../example/example.xml imagefolder=jpg imageFormat=jpg pagefolder=page
A variante that no PAGE files can be specified or available:
java -jar ../saxon9he.jar -xsl:../xsl/makeAletheia_mets.xsl -s:../example/example.xml imagefolder=tiff imageFormat=tif noPage=yes