Loosen expectation of XML structure when finding the pageId #453
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I encountered METS embedded in a OAI-PMH response, and while processing the result with OCR-D works somewhat, it fails to find the pageIds for every file in the METS.
Example OAI-PMH with METS:
https://digital.staatsbibliothek-berlin.de/oai?verb=GetRecord&metadataPrefix=mets&identifier=oai%3Adigital.staatsbibliothek-berlin.de%3APPN719671574
When saving that as
mets.xml
,ocrd workspace validate
reports lots of errors like this one:Fix this by loosening the expectation of the XML structure when finding the pageId. (There are more XPath strings in the code that could be reviewed, I think.)