-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restructure general workflow, cli, services, processors, docs, tests #40
Changes from 65 commits
11bff34
abeb696
f22758b
c54a433
1fc1ee7
01045cf
dbaa95c
3a05a34
e3e46ae
919cec9
f3cc1ae
db9eb66
98ef09a
f018657
18d12ee
6d81ea5
9641bbd
905aa1e
fef9b5c
ff1e92d
62684c1
728564e
8ecce18
fb09eba
cf724a8
9dbe659
cfb536a
6cbb851
48b3cee
b8a942b
98c099b
bb8832f
50b1f60
751f0c2
f323355
9244464
19e0c19
e4c9fc4
93dfbba
3de7a6a
468c698
83de2f7
02e01c9
271d0bf
26b9fff
3a9391c
4568bd8
3da3e46
4bfd81b
e712196
0c80139
74db476
30a97c9
32c2ce7
b58e6f6
fa92fa9
cabb273
bfdf09a
38c85a7
9f3c82c
80d33da
c8232b7
3693299
4e0fa6f
38d29ab
f911afa
1a2bd9b
daade7f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -107,3 +107,6 @@ env2/ | |
ocrd.egg-info | ||
/src | ||
spec | ||
.pytest_cache | ||
.~lock* | ||
/profile |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
[MASTER] | ||
extension-pkg-whitelist=lxml | ||
ignored-modules=cv2,tesserocr | ||
|
||
[MESSAGES CONTROL] | ||
disable = | ||
missing-docstring, | ||
no-self-use, | ||
too-many-arguments, | ||
superfluous-parens, | ||
invalid-name, | ||
line-too-long, | ||
too-few-public-methods, |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
language: python | ||
python: | ||
- 2.7 | ||
- 3.6 | ||
before_install: | ||
- sudo add-apt-repository ppa:alex-p/tesseract-ocr | ||
- sudo apt-get -qq update | ||
- sudo make deps-ubuntu | ||
install: | ||
- make deps-pip test-deps-pip | ||
script: | ||
- make test |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,14 +1,25 @@ | ||
export | ||
|
||
SHELL = /bin/bash | ||
PYTHON = python2 | ||
PYTHONPATH := .:$(PYTHONPATH) | ||
PIP = pip | ||
LOG_LEVEL = INFO | ||
|
||
# BEGIN-EVAL makefile-parser --make-help Makefile | ||
|
||
help: | ||
@echo "" | ||
@echo " Targets" | ||
@echo "" | ||
@echo " deps-ubuntu Dependencies for deployment in an ubuntu/debian linux" | ||
@echo " deps-pip Install python deps via pip" | ||
@echo " spec Clone the spec dir for sample files" | ||
@echo " install (Re)install the tool" | ||
@echo " test-run Test the run command" | ||
@echo " deps-ubuntu Dependencies for deployment in an ubuntu/debian linux" | ||
@echo " deps-pip Install python deps via pip" | ||
@echo " spec Clone the spec dir for sample files" | ||
@echo " install (Re)install the tool" | ||
@echo " test-deps-pip Install test python deps via pip" | ||
@echo " test Run all unit tests" | ||
@echo " docs Build documentation" | ||
@echo " docs-clean Clean docs" | ||
|
||
# END-EVAL | ||
|
||
|
@@ -20,22 +31,51 @@ deps-ubuntu: | |
libtesseract-dev \ | ||
libleptonica-dev \ | ||
libimage-exiftool-perl \ | ||
libxml2-utils \ | ||
tesseract-ocr-eng \ | ||
tesseract-ocr-deu \ | ||
tesseract-ocr-deu-frak | ||
|
||
# Install python deps via pip | ||
deps-pip: | ||
pip3 install --user -r requirements.txt | ||
$(PIP) install -r requirements.txt | ||
|
||
# Clone the spec dir for sample files | ||
spec: | ||
git clone https://github.com/OCR-D/spec | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just a note that the example files in spec need updating - the mets.xml should be updated to reflect recent discussions and ideally we should pick some sample images that are a) lightweight and b) for which we already have ground truth in ocr-d.de/daten. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Absolutely, tracking in #41 |
||
|
||
# (Re)install the tool | ||
install: | ||
pip3 install --user . | ||
$(PIP) install . | ||
|
||
test/assets: spec | ||
mkdir -p test/assets | ||
cp -r spec/io/example test/assets/herold | ||
|
||
# Install test python deps via pip | ||
test-deps-pip: | ||
$(PIP) install -r requirements.txt | ||
|
||
.PHONY: test | ||
# Run all unit tests | ||
test: | ||
pytest --log-level=$(LOG_LEVEL) --duration=10 test | ||
|
||
.PHONY: docs | ||
# Build documentation | ||
docs: | ||
sphinx-apidoc -f -o docs/api ocrd | ||
cd docs ; $(MAKE) html | ||
|
||
# Clean docs | ||
docs-clean: | ||
cd docs ; rm -rf _build api | ||
|
||
pyclean: | ||
rm **/*.pyc | ||
rm -rf .pytest_cache | ||
|
||
test-profile: | ||
$(PYTHON) -m cProfile -o profile $(which py.test) test | ||
$(PYTHON) analyze_profile.py | ||
|
||
# Test the run command | ||
test-run: spec | ||
run-ocrd spec/io/example/mets.xml |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
import pstats | ||
p = pstats.Stats('profile') | ||
p.strip_dirs() | ||
p.sort_stats('tottime') | ||
p.print_stats(50) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# Minimal makefile for Sphinx documentation | ||
# | ||
|
||
# You can set these variables from the command line. | ||
SPHINXOPTS = | ||
SPHINXBUILD = sphinx-build | ||
SPHINXPROJ = pyocrd | ||
SOURCEDIR = . | ||
BUILDDIR = build | ||
|
||
# Put it first so that "make" without argument is like "make help". | ||
help: | ||
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) | ||
|
||
.PHONY: help Makefile | ||
|
||
# Catch-all target: route all unknown targets to Sphinx using the new | ||
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). | ||
%: Makefile | ||
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
ocrd | ||
==== | ||
|
||
.. toctree:: | ||
:maxdepth: 4 | ||
|
||
ocrd |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
ocrd.cli package | ||
================ | ||
|
||
Submodules | ||
---------- | ||
|
||
ocrd.cli.merge\_ocr\_txt module | ||
------------------------------- | ||
|
||
.. automodule:: ocrd.cli.merge_ocr_txt | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: | ||
|
||
ocrd.cli.run module | ||
------------------- | ||
|
||
.. automodule:: ocrd.cli.run | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: | ||
|
||
|
||
Module contents | ||
--------------- | ||
|
||
.. automodule:: ocrd.cli | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
ocrd.model package | ||
================== | ||
|
||
Submodules | ||
---------- | ||
|
||
ocrd.model.ocrd\_file module | ||
---------------------------- | ||
|
||
.. automodule:: ocrd.model.ocrd_file | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: | ||
|
||
ocrd.model.ocrd\_mets module | ||
---------------------------- | ||
|
||
.. automodule:: ocrd.model.ocrd_mets | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: | ||
|
||
ocrd.model.ocrd\_page module | ||
---------------------------- | ||
|
||
.. automodule:: ocrd.model.ocrd_page | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: | ||
|
||
ocrd.model.ocrd\_xml\_base module | ||
--------------------------------- | ||
|
||
.. automodule:: ocrd.model.ocrd_xml_base | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: | ||
|
||
|
||
Module contents | ||
--------------- | ||
|
||
.. automodule:: ocrd.model | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
ocrd.processor.characterize package | ||
=================================== | ||
|
||
Submodules | ||
---------- | ||
|
||
ocrd.processor.characterize.exif module | ||
--------------------------------------- | ||
|
||
.. automodule:: ocrd.processor.characterize.exif | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: | ||
|
||
|
||
Module contents | ||
--------------- | ||
|
||
.. automodule:: ocrd.processor.characterize | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
ocrd.processor package | ||
====================== | ||
|
||
Subpackages | ||
----------- | ||
|
||
.. toctree:: | ||
|
||
ocrd.processor.characterize | ||
ocrd.processor.segment_line | ||
ocrd.processor.segment_region | ||
|
||
Submodules | ||
---------- | ||
|
||
ocrd.processor.base module | ||
-------------------------- | ||
|
||
.. automodule:: ocrd.processor.base | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: | ||
|
||
|
||
Module contents | ||
--------------- | ||
|
||
.. automodule:: ocrd.processor | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
ocrd.processor.segment\_line package | ||
==================================== | ||
|
||
Submodules | ||
---------- | ||
|
||
ocrd.processor.segment\_line.tesseract3 module | ||
---------------------------------------------- | ||
|
||
.. automodule:: ocrd.processor.segment_line.tesseract3 | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: | ||
|
||
|
||
Module contents | ||
--------------- | ||
|
||
.. automodule:: ocrd.processor.segment_line | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
ocrd.processor.segment\_region package | ||
====================================== | ||
|
||
Submodules | ||
---------- | ||
|
||
ocrd.processor.segment\_region.tesseract3 module | ||
------------------------------------------------ | ||
|
||
.. automodule:: ocrd.processor.segment_region.tesseract3 | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: | ||
|
||
|
||
Module contents | ||
--------------- | ||
|
||
.. automodule:: ocrd.processor.segment_region | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tesseract 4.0 for Ubuntu 18.04 should be coming soon, then we can switch to using apt with no ppa.
See tesseract issue #1423 and recent release of 4.0.0-beta.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Problem with travis is that you can only choose between 12.04 and 14.04. At least since 16.04 / Debian 7, 3.04 is shipped and tesserocr should build.
Even with the PPA it doesn't build for me since tesseract 4 seems to make use of some c++11 features that the 14.04 gcc version doesn't support.
I'm now using the same pre-install setup that tesserocr uses (download and build leptonica and tesseract). Very inefficient but it worked: https://travis-ci.org/OCR-D/pyocrd/jobs/358810995