Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make hocr-tools a proper module #42

Open
kba opened this issue Aug 31, 2016 · 3 comments
Open

Make hocr-tools a proper module #42

kba opened this issue Aug 31, 2016 · 3 comments

Comments

@kba
Copy link
Contributor

kba commented Aug 31, 2016

The README currently states:

Each command line program is self contained; if you have Python 2.7 with the required packages installed, it should just work. (Unfortunately, that means some code duplication; we may revisit this issue in later revisions.)

I would like to revisit this issue 😄

The advantages of striving to make the programs self-contained is that there is no need to install the whole project to run an individual script, provided the requirements were installed by some other means (e.g. apt-get). For simple scripts like hocr-check this is really neat.

The disadvantages of self-contained commands are IMHO:

  • Code redundancy (assoc, get_text etc.). These are small functions but it's considerable boilerplate and keeping them consistent is a hassle. This also makes it hard to spot that e.g. get_text has not been needed for a while.
  • Embedding resources in the source code, such as the invisible font in hocr-pdf, makes it hard to add changes.
  • It makes it harder to keep consistent interfaces. Some commands use optparse, others parse CLI arguments themselves, some read from STDIN on no args, some show the help page on no args, some exit with an error etc. A shared hocrlib module could help reduce boilerplate, though a consistent use of one of argparse could also remedy this situation.

In summary, I would argue for an approach with a shared library, resources in the file system and require users to properly (setup.py) install the tools.

What do you think?

In particular, is anyone relying on the scripts being self-contained?

@kba kba mentioned this issue Aug 31, 2016
@stweil
Copy link
Collaborator

stweil commented Aug 31, 2016

We could also write modular code and provide a way to build self contained command files.

@zuphilip
Copy link
Collaborator

I agree that the advantages of a shared library, resources are larger and we can also require that people are running a setup script (maybe add more options there). Personally, I always run the setup.py-script to make the command known system-wide.

Would it still be possible to go the PyPI way with that?

We could also write modular code and provide a way to build self contained command files.

Yes, we could maybe build them also for every release as a separate zip (resp. check if we can automate that). Alternatively, it could also be an option for the setup script to build independent python commands.

@kba kba mentioned this issue Aug 31, 2016
4 tasks
@kba
Copy link
Contributor Author

kba commented Aug 31, 2016

Would it still be possible to go the PyPI way with that?

Yes, all this can be handled by setuptools in setup.py.

We could also write modular code and provide a way to build self contained command files.

Not sure if I understand. The commands cannot be truly self-contained, they still have external dependencies, e.g. hocr-pdf requires reportlab to be installed.

Creating special versions of the scripts at build time, e.g. "baking in" the invisible font is possible but very finicky.

But we can use setuptools to create egg distributions, Windows installers and such. And of course users should still be able to just git clone the repository and run the scripts directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants