Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GT Dataset for Malayalam #104

Closed
nidame opened this issue Feb 27, 2023 · 3 comments · Fixed by #124
Closed

Add GT Dataset for Malayalam #104

nidame opened this issue Feb 27, 2023 · 3 comments · Fixed by #124
Labels
project Issues related to new projects

Comments

@nidame
Copy link

nidame commented Feb 27, 2023

Hi, we have exported the data from Transkribus in the ALTO format only because the Page XML export from Transkribus produces invalid data. Transkribus has confirmed that the TranskribusMetadata node is not valid with regards to the original XML schema. I suspect this will cause problems when importing the Page XML in eScriptorium.
How shall we handle this? Transkribus wants to solve this, but I have no information when.

@nidame nidame added the project Issues related to new projects label Feb 27, 2023
@alix-tz
Copy link
Member

alix-tz commented Mar 1, 2023

This situation is linked to #60

Can you build the description of the dataset with the ALTO files, but leave the PAGE files accessible in your repository?

@alix-tz
Copy link
Member

alix-tz commented Mar 1, 2023

Maybe simply make sure to keep the organization you have currently:

- data/
   - alto/
   - page/
   - images files, ...

@PonteIneptique
Copy link
Member

This entry is being currently reviewed by @alix-tz for the PR. Sorry for the delay !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
project Issues related to new projects
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants