-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restructure general workflow, cli, services, processors, docs, tests #40
Merged
Merged
Changes from all commits
Commits
Show all changes
68 commits
Select commit
Hold shift + click to select a range
11bff34
Stubs for METS, PAGE, resolver and workspace, pylint, unittests
kba abeb696
OcrdMetsFile in its own file
kba f22758b
wip: ResolverCache
kba c54a433
model.ocrd_page: fix indexing off by one
kba 1fc1ee7
recognize works
kba 01045cf
gitignore libreoffice lock files
kba dbaa95c
wip: travis
kba 3a05a34
typo: segent -> segment
kba e3e46ae
ResolverCache working
kba 919cec9
'make test' to run all unit tests
kba f3cc1ae
add EXIF constants
kba db9eb66
OcrdMets: cache fileGrps
kba 98ef09a
:memo: Update README
kba f018657
python3 compat (make PYTHON=python3 test)
kba 18d12ee
create processor class, port exif to new api, extend page, test files…
kba 6d81ea5
rename ocrd.log -> ocrd.utils to contain reusable static code
kba 9641bbd
move to utils, export getLogger, coordinate_string_from_xywh
kba 905aa1e
lazy logging
kba fef9b5c
pylint: stop complaining about lxml
kba ff1e92d
page tag constants
kba 62684c1
OcrdPage: methods for listing/creating regions/lines
kba 728564e
workspace: + save_mets method
kba 8ecce18
utils: xywh_from_coordinate_string as opposite of coordinate_string_f…
kba fb09eba
WIP port segmenting to new api
kba cf724a8
pylint: stop complaining about tesserocr/cv2
kba 9dbe659
MIMETYPE_PAGE = text/page+xml
kba cfb536a
processor: helpers for input/output of files
kba 6cbb851
OcrdPage: prefer "X is not None" over "not X"
kba 48b3cee
tests: assets module
kba b8a942b
mirror module structure in tests
kba 98c099b
segment*/tesseract: use processor shortcuts
kba bb8832f
xsl namespace
kba 50b1f60
workspace: output files are saved with file:// if no url
kba 751f0c2
OcrdPage: typos
kba f323355
xml prettify
kba 9244464
remove cruft from ocrd_xml_base
kba 19e0c19
tests: run all with uniitest discover
kba e4c9fc4
test with pytest
kba 93dfbba
test OcrdPage
kba 3de7a6a
:fire: remove original characterizing/segmenting
kba 468c698
:memo: docstrings in OcrdPage
kba 83de2f7
:fire: remove initializing
kba 02e01c9
start with cli
kba 271d0bf
cli
kba 26b9fff
run_process in ocrd.processor to flexibly create workspace and run pr…
kba 3a9391c
Expose existing processors on web service, extend run-server script
kba 4568bd8
rename binary to 'ocrd', merge run and run_server, update setup.py
kba 3da3e46
CLI: ocrd process is now chainable
kba 4bfd81b
:fire: remove ocrd.webservices
kba e712196
minimal repository web service
kba 0c80139
optionally symlink instead of copy in resolver
kba 74db476
:memo: docs, move code in in processor/__init__.py to processor/base.py
kba 30a97c9
basic setup for documentation with sphinx
kba 32c2ce7
move image manipulation to workspace.resolve_image_as_pil
kba b58e6f6
resolver: allow setting workspace directory explicitly (for testing)
kba fa92fa9
use xmllint --format to optionally canonicalize/pretty print XML
kba cabb273
canonical ID for mets:file: fileGrp@USE + 4-zero-padded index within grp
kba bfdf09a
page: helpers to work with TextLine
kba 38c85a7
processor.add_output_file: pass on ID
kba 9f3c82c
WIP recognition with tesseract3
kba 80d33da
test assets
kba c8232b7
:bug: resolver: use hyper-verbose but uniqe filenames based on url
kba 3693299
:art: remove obsolete pylint exceptions
kba 4e0fa6f
rename tesseract3 -> tesserocr
kba 38d29ab
make 'test-profile' to list most time-consuming lines
kba f911afa
workspace: remove hard-coded reference to INPUT fileGrp
kba 1a2bd9b
:green_heart: travis add @alex-p's tesseract-ocr PPA
kba daade7f
properly skip recognize test, travis
kba File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -107,3 +107,6 @@ env2/ | |
ocrd.egg-info | ||
/src | ||
spec | ||
.pytest_cache | ||
.~lock* | ||
/profile |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
[MASTER] | ||
extension-pkg-whitelist=lxml | ||
ignored-modules=cv2,tesserocr | ||
|
||
[MESSAGES CONTROL] | ||
disable = | ||
missing-docstring, | ||
no-self-use, | ||
too-many-arguments, | ||
superfluous-parens, | ||
invalid-name, | ||
line-too-long, | ||
too-few-public-methods, |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
language: python | ||
python: | ||
- 2.7 | ||
- 3.6 | ||
before_install: | ||
- sudo apt-get -qq update | ||
- sudo apt-get install -y autoconf automake libtool | ||
- sudo apt-get install -y libpng12-dev | ||
- sudo apt-get install -y libjpeg62-dev | ||
- sudo apt-get install -y libtiff4-dev | ||
- sudo apt-get install -y zlib1g-dev | ||
- wget http://www.leptonica.org/source/leptonica-1.73.tar.gz -O /tmp/leptonica.tar.gz | ||
- tar -xvf /tmp/leptonica.tar.gz | ||
- pushd leptonica-1.73 && ./configure && make && sudo make install && popd | ||
- wget https://github.com/tesseract-ocr/tesseract/archive/3.04.01.tar.gz -O /tmp/tesseract.tar.gz | ||
- tar -xvf /tmp/tesseract.tar.gz | ||
- cd tesseract-3.04.01 | ||
- ./autogen.sh && ./configure | ||
- LDFLAGS="-L/usr/local/lib" CFLAGS="-I/usr/local/include" make | ||
- sudo make install && sudo ldconfig | ||
- cd .. | ||
- wget https://github.com/tesseract-ocr/tessdata/archive/3.04.00.tar.gz -O /tmp/tessdata.tar.gz | ||
- tar -xvf /tmp/tessdata.tar.gz | ||
- sudo mkdir -p /usr/local/share/tessdata/ | ||
- sudo rsync -a tessdata-3.04.00/ /usr/local/share/tessdata | ||
- sudo apt-get install -y libimage-exiftool-perl libxml2-utils | ||
install: | ||
- make deps-pip test-deps-pip | ||
script: | ||
- make test |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
import pstats | ||
p = pstats.Stats('profile') | ||
p.strip_dirs() | ||
p.sort_stats('tottime') | ||
p.print_stats(50) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# Minimal makefile for Sphinx documentation | ||
# | ||
|
||
# You can set these variables from the command line. | ||
SPHINXOPTS = | ||
SPHINXBUILD = sphinx-build | ||
SPHINXPROJ = pyocrd | ||
SOURCEDIR = . | ||
BUILDDIR = build | ||
|
||
# Put it first so that "make" without argument is like "make help". | ||
help: | ||
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) | ||
|
||
.PHONY: help Makefile | ||
|
||
# Catch-all target: route all unknown targets to Sphinx using the new | ||
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). | ||
%: Makefile | ||
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
ocrd | ||
==== | ||
|
||
.. toctree:: | ||
:maxdepth: 4 | ||
|
||
ocrd |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
ocrd.cli package | ||
================ | ||
|
||
Submodules | ||
---------- | ||
|
||
ocrd.cli.merge\_ocr\_txt module | ||
------------------------------- | ||
|
||
.. automodule:: ocrd.cli.merge_ocr_txt | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: | ||
|
||
ocrd.cli.run module | ||
------------------- | ||
|
||
.. automodule:: ocrd.cli.run | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: | ||
|
||
|
||
Module contents | ||
--------------- | ||
|
||
.. automodule:: ocrd.cli | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
ocrd.model package | ||
================== | ||
|
||
Submodules | ||
---------- | ||
|
||
ocrd.model.ocrd\_file module | ||
---------------------------- | ||
|
||
.. automodule:: ocrd.model.ocrd_file | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: | ||
|
||
ocrd.model.ocrd\_mets module | ||
---------------------------- | ||
|
||
.. automodule:: ocrd.model.ocrd_mets | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: | ||
|
||
ocrd.model.ocrd\_page module | ||
---------------------------- | ||
|
||
.. automodule:: ocrd.model.ocrd_page | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: | ||
|
||
ocrd.model.ocrd\_xml\_base module | ||
--------------------------------- | ||
|
||
.. automodule:: ocrd.model.ocrd_xml_base | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: | ||
|
||
|
||
Module contents | ||
--------------- | ||
|
||
.. automodule:: ocrd.model | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
ocrd.processor.characterize package | ||
=================================== | ||
|
||
Submodules | ||
---------- | ||
|
||
ocrd.processor.characterize.exif module | ||
--------------------------------------- | ||
|
||
.. automodule:: ocrd.processor.characterize.exif | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: | ||
|
||
|
||
Module contents | ||
--------------- | ||
|
||
.. automodule:: ocrd.processor.characterize | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
ocrd.processor package | ||
====================== | ||
|
||
Subpackages | ||
----------- | ||
|
||
.. toctree:: | ||
|
||
ocrd.processor.characterize | ||
ocrd.processor.segment_line | ||
ocrd.processor.segment_region | ||
|
||
Submodules | ||
---------- | ||
|
||
ocrd.processor.base module | ||
-------------------------- | ||
|
||
.. automodule:: ocrd.processor.base | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: | ||
|
||
|
||
Module contents | ||
--------------- | ||
|
||
.. automodule:: ocrd.processor | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note that the example files in spec need updating - the mets.xml should be updated to reflect recent discussions and ideally we should pick some sample images that are a) lightweight and b) for which we already have ground truth in ocr-d.de/daten.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely, tracking in #41