Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional parameter for custom resolution #376

Open
M3ssman opened this issue Dec 5, 2019 · 6 comments
Open

Additional parameter for custom resolution #376

M3ssman opened this issue Dec 5, 2019 · 6 comments
Assignees
Milestone

Comments

@M3ssman
Copy link
Contributor

M3ssman commented Dec 5, 2019

Hello,

please add DPI-Parameters to enable to enforce custom resolution when using tesseract

Tesseract CLI

--dpi 470

ocrd-tesserocr

dpi: 470

@bertsky
Copy link
Collaborator

bertsky commented Dec 5, 2019

I see the need, too. But we already rely on core's OcrdExif info to pass into Tesseract. Shouldn't this override be available to all processors, @kba?

@kba
Copy link
Member

kba commented Dec 5, 2019

Shouldn't this override be available to all processors, @kba?

Yes, but it's not trivial I'm afraid. My idea would be to allow overriding the pixel density values in the OcrdExif constructor but that gets called in workspace methods and model factory function. These would need to accept additional parameters to pass on to the OcrdExif constructor. Not complicated, just a bit convoluted, e.g. even more parameters to Workspace.image_from_page.

I do see the need though, so if the added complexity is fine by you, I'll create a PR in core.

@bertsky
Copy link
Collaborator

bertsky commented Dec 5, 2019

Yes, but it's not trivial I'm afraid. My idea would be to allow overriding the pixel density values in the OcrdExif constructor but that gets called in workspace methods and model factory function. These would need to accept additional parameters to pass on to the OcrdExif constructor. Not complicated, just a bit convoluted, e.g. even more parameters to Workspace.image_from_page.

But that only gets called by the processor again, so we are still where we started (adding a parameter for every single tool)!

Perhaps we should start adding other mechanisms that affect all processors equally (like the loglevel override):

  1. How about generic parameters (which are added to the tool json automatically)?
  2. Or extra CLI options (which are supported automatically when using ocrd.decorators)?
  3. Or even environment variables?
  4. Or even site-level configuration files (akin to ocrd_logging.py)?

Besides the manual DPI override, this would also allow supporting DPI meta-data validation with different levels of strictness.

Or supporting automatic workspace validation with different levels/sets of checks.

Or supporting processing with --force/--overwrite.

Or supporting processing on multiple CPUs/GPUs with given scalefactor.

Or supporting time constraints on different hierarchy levels.

Just saying!

@bertsky
Copy link
Collaborator

bertsky commented Dec 6, 2019

Anyway, IMO this issue should be transferred to core or spec, since it involves/affects more people/projects.

@bertsky
Copy link
Collaborator

bertsky commented Dec 19, 2019

At least for the DPI override, another processor-independent mechanism could be to have a dedicated processor earlier in the pipeline writing /PcGts/Page/@imageXResolution as an override to the image metadata parsed by OcrdExif – a processor like the one proposed here, only with an additional manual override – together with the behavioural changes in core.

@kba can you please transfer the issue?

@bertsky
Copy link
Collaborator

bertsky commented Jun 22, 2022

Besides the manual DPI override, this would also allow supporting DPI meta-data validation with different levels of strictness.

Or supporting automatic workspace validation with different levels/sets of checks.

Or supporting processing with --force/--overwrite.

Or supporting processing on multiple CPUs/GPUs with given scalefactor.

Or supporting time constraints on different hierarchy levels.

Just saying!

Or enabling/disabling METS caching (or even METS server).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants