Versioned according to Semantic Versioning.
2.8.0 - 2020-06-04
Added:
ocrd-dummy
, a minimal processor that copies input to output, #468- OcrdPage:
get_AllRegions
: retrieve all regions, sorted by document or reading order, #479 - OcrdPage:
sort_AllIndexed
: sort all children by@index
in-place - OcrdPage:
clear_AllIndexed
: clear all@index
children - OcrdPage:
extend_AllIndexed
: Add elements with incrementing@index
- OcrdPage: Replace empty reading order groups with equivalent
RegionRef
on export - OcrdPage:
get_UnorderedGroupChildren
: get reading order elements of anUnorderedGroup
Changed:
- OcrdPage:
get_AllIndexed
: allow filtering by child type - OcrdPage:
get_AllIndexed
: index_sort parameter to enable/disable sorting
2.7.1 - 2020-05-27
Fixed:
ocrd workspace find
: UseOcrdMets.get_physical_pages
method, fix #491
2.7.0 - 2020-05-27
Changed:
- 🔥
Workspace.image_from_page
no longer treatsPrintSpace
as functionally equivalent toBorder
, #490 OcrdMets.get_physical_pages
method companion toOcrdMets.physical_pages
property, #484- OcrdMets/ocrd workspace find: Search for multiple pages by fileid, ht @bertsky, #463, #484
- PAGE validation: respect reading order in consistency checks, #442
2.6.1 - 2020-05-14
Added:
- OcrdPage: new method
get_AllIndexed
for OrderedGroup and OrderedGroupIndexed that lists their children, sorted by index, #478
Changed:
- Improved log message for profiling processors, #477
Fixed:
- Search for
--page-id
now orders of magnitutde faster, ht @bertsky, #481 - Not all generateDS types were exported by ocrd_page, now they are, #480
2.6.0 - 2020-05-12
Fixed:
- image files no longer cached in workspace, #446, #448
Added
- Many mets:file search fields support regex now, #458, #448
2.5.3 - 2020-04-30
Fixed:
- OcrdPage: hacks to make XML namespace output consistent, #474
2.5.2 - 2020-04-29
Fixed:
- logging: format is spec-conformant again, #466
Added:
- ocrd_page: generateDS-generated code has
__hash__
method now, #443
2.5.1 - 2020-04-23
Fixed:
- logging: disable propagation for loggers with handler, #463
2.5.0 - 2020-04-23
Changed:
- Logging configuration via configuration file, not script, #460 (HT @M3ssman @bertsky)
Added:
- Execution of processors is tracked with logger 'orcd.process.profile', #461
2.4.4 - 2020-03-17
Fixed:
- #437 broke simple types in PAGE model, (really) revert 3a0a3a8, #451
2.4.3 - 2020-03-12
Fixed:
- bashlib: ocrd-tool.json-related errors no longer lead to silent exit, #456
- #437 broke simple types in PAGE model, revert 3a0a3a8, #451
2.4.2 - 2020-02-20
Fixed:
- JSON strings longer than OS-allowed filename size crash fixed, #444
2.4.1 - 2020-02-19
Changed:
- Updated PAGE bindings generated using generateDS, #437
2.4.0 - 2020-02-17
Fixed:
- concatenation of PAGE elements was based on
@index
starting with1
, #430 - CLI:
--help
should work for processors w/oinput_file_grp
, #440 - CLI:
--version
more robust, #433
Changed:
- processor
--help
lists parameter enum values if available, #427 - 🔥
workspace.save_image_file
takesmimetype
instead offormat
, #441
2.3.1 - 2020-01-23
Changed:
- More expressive coordination validation, #418
ocrd workspace init
shortcut forocrd workspace init .
, #419ocrd workspace validate
shortcut forocrd workspace validate mets.xml
, #419ocrd workspace clone <METS_URL>
shortcut forocrd workspace <METS_URL> .
, #419- workspace validation: check of dimensions in image and PAGE only with
--download
, #423 - behavior of
clobber_mets=False
changed to silently skip re-downloading METS instead of making a fuss, #425
Fixed:
- regression from #419 for
ocrd workspace validate
- regression from #384 for task sequence validation, #424
2.3.0 - 2020-01-21
Changed:
- layout of --help, #411
- OcrdMets: Removing the last file for a physical page will remove the physical page
Fixed:
mets_basename
was not respected inocrd workspace init
, #415, #417- typos, #414
Added:
- OcrdMets.remove_physical_page to remove
structMap[@TYPE="physical"]
entries ocrd validate
CLI to validate task groups, PAGE XML and processor parameters, #245, #384
2.2.2 - 2020-01-16
Added:
- Validation of input/output file groups before running a processor/task sequence, #392
- Improved
--help
for both python and bashlib processors, #402, #408
Fixed:
- bashlib: Calling bashlib processor w/o parameters, #381, #400
- bashlib: syntax error, regression from 2d89c22ae3. #410
2.2.1 - 2020-01-14
Fixed:
- OcrdExif: PNG metadata extraction was broken, #395, #396
- Remove the trivial and error-prone image caching feature in resolver, #399
- When creating files with workspace.add_file, single-component file paths (i.e. just the basename of a file) were treated as directories, #404
- When downloading files to a workspace, check first whether those files to be added already exist on disk and are within the workspace directory., #404
2.2.0 - 2020-01-10
Fixed:
- PIL.Image.open'ed files weren't closed, #390
- resolver: if mets_url is relative path, resolve before anything else, #319, #397
- Resolver.workspace_from_url: Create dst_dir before resolve for py
<=
3.5, #330, #393 - fix help string for -m/--mets, fix #263, #391
Changed:
- downgrade filegrp syntax errors to warnings, #364, #389
2.1.3 - 2020-01-08
Changed:
- bagit-profile matches changes from spec v3.4.2 (metadata dir)
2.1.2 - 2020-01-06
Changed:
- have save_mets use UTF-8 encoding for byte-serialization (no entities), #388
Fixed:
- regression from #374, #387
2.1.1 - 2020-01-02
Added:
- PAGE validator: coordinate self-validity and mutual consistency, #374
Fixed:
- Add more related mime types and fix image/jpeg, #382
2.1.0 - 2019-12-20
Added:
- Workspace validation will check cardinality of images per file is 1, #243, OCR-D/spec#132
Changed:
- bashlib will no longer warn about "non-conformant" file group names, #365
- Invalid
file:/
URL will now raise exceptions, #373 - image_from_*: increase tolerance for size mismatch after rotation to 2px, #371
2.0.2 - 2019-12-02
Changed:
ocrd process
: Validate parameters when validating a task- Dockerfile: Revert to Ubuntu 18.04 for LTS compatibility, #344
- Parameter validation: Raise exception for unknown parameters
ocrd ocrd-tool validate
: Raise exception for unknown keys in JSON
2.0.1 - 2019-11-26
Fixed:
- METS
CREATEDATE
date format now ISO8601, #360 ocrd workspace find
allow outputting file group, #359- processor decorator:
--version
should succeed independent of parameters, #358
Changed:
ocrd process
uses the ocrd-tool.json of the tools to check whether output file group necessary, #296- Dockerfile: Revert to Ubuntu 18.04 for LTS compatibility, #344
- pixel density warnings downgraded further to "notice", #361
2.0.0 - 2019-11-05
Changed:
- image_from_page etc: allow filling with background or transparency
- 🔥 API changes, #311, #327
- Dockerfile: Omit
ENTRYPOINT
, OCR-D/spec#130, #340 - Relax pixel density validation errors to warnings, OCR-D/spec#129, #339
1.0.1 - 2019-10-25
Fixed:
- Add
dimension
to workspace validation skip list, #329 - Update ocrd-tool.json schema to spec 3.3.0 (no output_file_grp, no syntax restriction on content-type)
- PAGE XML output references xsi:schemaLocation, #331
- Update Pillow to 6.2.0
Changed:
ocrd process
: task validation takes processor's ocrd-tool into account, #296
1.0.0 - 2019-10-18
- Workspace validation: Validate that files mentioned in pc:Page/@imageFilename exist in METS and on FS, #309
ocrd ocrd-tool parse-params
has the string-or-filepath logic for -p/--parameter as for the CLI
1.0.0b19 - 2019-09-10
- image_from_page etc: allow filtering by feature (@comments), #294
1.0.0b18 - 2019-09-06
Changed:
-m/--mets
is not required anymore, #301ocrd workspace prune-files
: Throw on error removing non-existent file-p/--parameter
argument accepts raw JSON as well now, #239
Fixed:
- OcrdFile: Default fileGrp to
TEMP
- OcrdFile: Accept url constructor arg
- Workspace: Simplify file download code, add extensions to files
- Processor:
chdir
to workspace directory on init so relative files resolve properly - typos in docstrings
- README: 'module' -> 'package'
- workspace.image_from_page etc: logic with rotation/angle
- Adapted test suite to OCR-D/assets now with file extensions
Added:
- utils:
MIME_TO_EXT
to map mime types to preferred extension - Validation of imageHeight/imageWidth in PAGE vs. actual image height/width, #229
1.0.0b17 - 2019-08-21
Fixed:
- Require
Pillow == 5.4.1
throughout
1.0.0b16 - 2019-08-21
Added:
- many utility methods for image manipulation and coordinate handling, #268, OCR-D/ocrd_tesserocr#49
bbox_from_points
bbox_from_xywh
bbox_from_polygon
coordinates_for_segment
coordinates_of_segment
crop_image
membername
image_from_polygon
points_from_bbox
points_from_polygon
points_from_xywh
polygon_from_bbox
polygon_from_x0y0x1y1
polygon_from_xywh
polygon_mask
rotate_coordinates
xywh_from_bbox
- Spec-conformant handling of AlternativeImage, OCR-D/spec#116, OCR-D/ocrd_tesserocr#33, #284
Changed:
- workspace bagger will create files with extension
save_mets
is atomic now, #278, #285
1.0.0b15 - 2019-08-14
Fixed:
- regression in namespace handling of PAGE output, #277
1.0.0b14 - 2019-08-14
Fixed:
- METS is serialized as Unicode instead of character entities, #279
1.0.0b13 - 2019-08-13
Added:
ocrd workspace remove
to remove files, #275, #245ocrd workspace remove-group
to remove file groups, #275, #245ocrd workspace prune-files
1.0.0b12 - 2019-08-08
Fixed:
- Regression with ocrd_page data types, #269
- Segfault issue with Pillow >= 6.0.0, #270
1.0.0b11 - 2019-07-29
Changed:
- Improve pixel density logic in OcrdExif, #256, #37, OCR-D/ocrd_tesserocr#54
- 🔥 stop supporting python
<= 3.4
- Support only 2019-07-15 PAGE version
1.0.0b10 - 2019-06-25
Fixed:
- Handle TIFF ResolutionUnit not being set #250
- bashlib:
--mets-file
should be--mets
Changed:
- missing required parameters should raise exception, fix #244 #247
1.0.0b9 - 2019-05-20
Changed:
- export additional region types from generated code, #241
Added:
points_from_y0x0y1x1
for inverted x/y pairs
Added:
ocrd workspace list-page
to list all page IDs
Changed:
- Extended page with TextStyle for Page, , PRImA-Research-Lab/PAGE-XML#8
1.0.0b6 - 2019-03-19
ocrd workspace set-id
case in argument error- fix DeprecationWarning for PyYAML 5.1+
- use headless opencv
1.0.0b1 - 2019-02-27
First beta of 1.0.0
Changed:
- 🔥 Drop Python2 support
- 🔥 Refactored project into 5 modules with little dependencies each
- Implement 3.2.0 of the spec
Removed:
- 🔥 Move factory methods from OcrdPage and OcrdExif to new module
ocrd.model_factory
- Factor out XML constants to
ocrd.constants.xml
- 🔥 BaseProcessor.add_output_file removed
0.15.2 - 2019-01-07
Added:
- workspace validator: make url tests (java, non-http) checks skippable
0.15.1 - 2019-01-04
Fixed:
resolver: fall back to "mets.xml" as basename if not provided
Added:
- PageValidator: Check consistency of PAGE according to spec 3.0.0, #223
Changed:
- 🔥 Change validators to use a single static
validate
method where applicable, #224
0.14.0 - 2018-12-13
Changed:
- 🔥 Use mets:structMap[@TYPE="PHYSICAL"] instead of GROUPID to group by page, #221, #222, OCR-D/spec#81
- 🔥 Rename group_id -> page_id, throughout, OCR-D/spec#101
- 🔥 CLI: Rename --group-id -> --page-id (
-g
)
0.13.3 - 2018-12-05
Fixed:
ocrd workspace clone
fixed to have the pre-0.13.1 behavior- Missing metsHdr made adding agents fail, #218
0.13.2 - 2018-11-29
Changed:
- various fixes to workspace and OCRD-ZIP validation, #217
0.13.1 - 2018-11-26
Fixed:
- Relative files in workspace were resolved to target, not source dir of METS, #215
0.13.0 - 2018-11-23
Changed:
- Adapt bagit profile to spec v2.6.2: allow fetch.txt
- Adapt bagit profile to spec v2.6.3: Tag-Files-Allowed
Removed:
- Code related to generating Open API / Swagger definitions, #210
0.12.0 - 2018-11-22
Changed:
- Depend on pillow >= 5.3.0 to mitigate python-pillow/Pillow#2926
Added:
- backup functionality for
ocrd workspace
, #204
0.11.0 - 2018-11-21
Changed:
ocrd process
interface change, #205
Fixed:
- Grayscale images can be handled by workspace correctly now, #211
0.10.0 - 2018-11-15
Added:
- OCRD-ZIP implementation based on OCR-D/spec#70, #207
- CLI:
ocrd zip
to bag, spill and validate OCRD-ZIP
Fixed:
- Adapted tests to work with bagit-based structure, OCR-D/assets#18
- Read
VERSION
constant from setup.py, #209
0.9.0 - 2018-10-30
Fixed:
- CLI:
ocrd process
, #199
Changed:
- Implement changes from spec v2.5.0: ocrd-tool must have in/output groups
0.8.8 - 2018-10-24
Fixed:
KeyError
because ocrd_tool not saved on processor instance, #192- another
KeyError
becausepnginfo
isn't set for all PNG by exiftool, #194 - support 1-bit bitonal images, #196
0.8.7 - 2018-10-23
Changed:
- Calling
workspace.save_mets()
will now save processor information in the header, #147, #191
0.8.6 - 2018-10-19
Changed:
- Update ocrd-tool schema to spec v2.4.0
Fixed:
- Calls to
click.argument
had too many arguments, #186
0.8.5 - 2018-09-26
Changed:
- Logger setup according to v2.3.0 of the spec, #185
0.8.4 - 2018-09-21
Removed:
- remove
workspace_from_folder
, #180
Changed:
- Creating METS from scratch will set creator agent and creation date, #147
Fixed:
- Schema regression in 0.8.3 (description)
0.8.3 - 2018-09-20
Changed:
- Make logging logic spec-compliant according to #173
Removed:
prefer_symlink
mechanism, #179
0.8.2 - 2018-09-04
Fixed:
- Handle missing parameters in ocrd-tool.json like empty array #160
- cli:
ocrd workspace validate
#175
0.8.1 - 2018-08-31
Removed:
merge_ocr_txt.py
script to be reborn as its own project, #170
0.8.0 - 2018-08-22
Fixed:
- file://-URL will not be "downloaded" if already present, #165
Changed:
- Logging can be configured with a file
ocrd_logging.py
in either $PWD, $HOME or /etc, #164 Processor.add_output_file
deprecated, #166
0.7.5 - 2019-08-16
Fixed:
- bashlib: invalid parsers caused cryptic error, #161
0.7.4 - 2019-08-15
Fixed:
- Sort coordinates clockwise, #159, OCR-D/assets#11
0.7.3 - 2019-08-14
Changed:
- Missing
description
for parameters in schema, #155
0.7.2 - 2019-08-01
Fixed
- Regression in XML pretty printing, #152
0.7.1 - 2019-07-25
Changed:
- Remove dependency on
xmllint
command line tool, #72, #151 - Remove dependency on
exiftool
, #71, #150
0.7.0 - 2018-07-25
Changed:
- Behavior of workspace generation from URL changes (#149)
- when cloning from a
file://
URL and no directory is given do NOT create a temporary directory but reuse the existing directory - When not providing
mets_basename
, assume the last URL path segment to be the METS basename instead of the fixed stringmets.xml
- when cloning from a
- incoroporate changes to ocrd_tool schema from spec/v2.2.1
0.6.0 - 2018-07-23
Changed:
- Harmonized URL handling according to https://ocr-d.github.io/cli#urlfile-convention / spec v2.2.0
0.5.0 - 2018-07-19
Fixed:
- EXIF supports JFIF metadata, #141
- typo: repository -> workspace
Changed:
- sh wrapper: set
working_dir
always, default to$PWD
- Drop all caching, #132, #143, #126, #100, #92, #45, #85
Added:
- CLI:
--force
option forworkspace add
to replace existing ID, #134
0.4.4 - 2018-07-03
Fixes:
- CLI:
-k
onworkspace find
for non-existent fields, #133 - CLI: Persist downloads in METS, #136
- CLI:
workspace find --download
will download to subdir of fileGrp, #137
Added:
OcrdFile
has getterfileGrp
for theUSE
attribute of parentmets:fileGrp
, #139
0.4.3 - 2018-06-27
Fixed:
- bash 4.3 compat fixes, #131
ocrd workspace find
default was wrong, #130
0.4.2 - 2018-06-25
Added:
- CLI:
-k
onworkspace find
repeatable
Changed:
- Simplified bashlib
0.4.1 - 2018-06-22
Changed:
- CLI: rename
workspace create
->workspace init
and align with clone syntax
0.4.0 - 2018-06-18
Changed:
- Caching is disabled by default
Added:
- CLI:
ocrd ocrd-tool version
to show version - API:
OcrdMets
getters/setters forunique_identifier
- CLI:
ocrd workspace set-id MODS_IDENTIFIER_PURL
to set mods:identifier - CLI:
ocrd workspace get-id
to get mods:identifier - CLI:
ocrd workspace clone
syntax changed - CLI:
ocrd workspace clone/find
support downloading files - CLI:
ocrd workspace find -L
for local files - CLI:
ocrd workspace find -i ID
to search by ID
0.3.2 - 2018-06-19
Fixed:
- CLI:
cord ocrd-tool tool parse-params
validate as well as merges with default - Parameter validation: Check whether parameter is
required: true
0.3.1 - 2018-06-18
Added
- CLI support
--version
Fixed:
- CLI: validate of ocrd-tool.json didn't parse JSON, #115
Changed:
mets_file_id
in utils is nowconcat_padded
and more flexible
0.3.0 - 2018-06-18
Added
- find_files supports
ID
Changed
- Adding a file with an existing
ID
will raise an exception, #110
Fixed
- Wrapped tools support
-J
/--dump-json
GROUPID
was not passed when adding files
Removed
--output-mets
, spec v2.0.0
0.2.5 - 2018-06-18
Added
- Bash library
- CLI command
ocrd-tool
- CLI command
workspace find
- CLI command
workspace list-group
- CLI command
workspace create
- CLI
workspace
flag --no-cache - bash library to build wrappers
- CLI:
--version
global flag Changed - Downloading a file will set the URL in the METS Fixed
- run_processor will save_mets after process
0.2.4 - 2018-06-04
Fixed
GT
is a valid category for metsfileGrp@USE
Changed:
- Adapted to spec v1.2.0
0.2.3 - 2018-05-17
Fixed:
- Adapted to spec v1.1.5
- Ensure python 2.7 backwards compatibility
Changed:
- ocrd_tool must be passed to
Processor
constructor
0.2.2 - 2018-05-15
Added:
- points_from_x0y0x1y1 util
Fixed:
- EXIF tags for PNG
0.2.1 - 2018-05-08
Added
- CLI:
ocrd workspace {add,clone,pack,unpack,validate}
, #75
0.2.0 - 2018-05-08
Changed:
- Use 2018 PAGE namespace http://schema.primaresearch.org/PAGE/gts/pagecontent/2018-07-15, PRImA-Research-Lab/PAGE-XML#4
0.1.0 - 2018-04-26
Changed:
- ocrd_page API based on XSD
- rename/fix coordinate code
0.0.7 - 2018-04-18
Changed
- Adapted schemas to v1.1.1 of OCR-D/spec
0.0.6 - 2018-04-18
Changed
- Support parameters in run_processor
0.0.5 - 2018-04-18
Changed
- Adapted schemas to v1.1.0 of OCR-D/spec
0.0.4 - 2018-04-18
Changed
- Adapted click wrapper in decorators to v1.0.0 of OCR-D/spec
0.0.2 - 2018-04-17
Fixed
- setup.py install_requires mirrors requirements.txt
0.0.1 - 2018-04-17
Initial Release