-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ocrd-tool schema: be less restrictive on input/ouptut_filegrp #168
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fully agree on the general direction, but...
* Since the core implementation depends on a single input file group (`Processor.input_files` property) we cannot just drop it though (hence the `["PLACEHOLDER"]` default)
Are you referring to OCR-D/core#274 by any chance? It looks like this PR is trying to conflate a) providing a default for the fileGrps in the tool json (and clarifying their semantics) with b) how core handles these when the processor is called without fileGrp parameters. (IMO we could live with INPUT
and OUTPUT
as default examples, but should remove them as defaults in the Processor
runtime.)
Also related: OCR-D/core#364
Co-authored-by: Robert Sachunsky <[email protected]>
Co-authored-by: Robert Sachunsky <[email protected]>
Co-authored-by: Robert Sachunsky <[email protected]>
Co-authored-by: Robert Sachunsky <[email protected]>
That issue is obsolete, based on a confused understanding of the semantics of I was mostly thinking of
in |
It seems this went circle, but towards the general question of relaxing input/output fileGrp conventions: I would be fine with any form of relaxation that will still allows us to maintain direct ability to do a round trip DFG-Viewer-METS --> OCR-D -->DFG-Viewer-METS. E.g. this could be just one possible input setting/template for fileGrps? |
This PR is not about the relaxation of our fileGrp (METS) conventions, though. It is about the interpretation of the fileGrp section in the ocrd-tool.json, namely becoming a mere example of how the user could wire that processor in a workflow, and their defaults in case the developer did not specify any. We've discussed templating fileGrp names for likely or simple workflows for a while, but agreed against this IIRC because this would almost always be impractical anyway, quickly clash, and raise more misconception and irritation on the user side than provide actual value. But maybe I misunderstood what you meant by getting that roundtrip. |
Designing the
ocrd-sanitize
processor makes it obvious that theinput_file_grp
/output_file_grp
are not only obsolete but an anti-pattern:mets:file
s from amets:fileGrp
. Since the core implementation depends on a single input file group (Processor.input_files
property) we cannot just drop it though (hence the["PLACEHOLDER"]
default)input_file_grp
/output_file_grp
makes it worse