Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PageType.get_AllRegions to list all kinds of regions #479

Merged
merged 36 commits into from
Jun 4, 2020
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
abef352
PageType.get_AllRegions to list all kinds of regions
kba May 13, 2020
3445f87
Update ocrd_models/ocrd_page_user_methods.py
bertsky May 14, 2020
a48b8c1
update generateds page, add region filter if using reading order, wip
kba May 14, 2020
f51a2e4
Merge branch 'hotfix-ocrd-page-exports' into get-all-regions
kba May 14, 2020
8da3f3c
Merge branch 'get-all-regions' of https://github.com/kba/ocrd-core in…
kba May 14, 2020
d2a01bb
refactoring: move generateDS methods to their own files
kba May 15, 2020
be7f026
get_AllRegions: adapt to signature proposed in #240, test with order=…
kba May 15, 2020
e1740f7
README: explain how to add user methods to PAGE API
kba May 15, 2020
6f9163e
Update ocrd_models/README.md
kba May 28, 2020
0c73b3e
Update ocrd_models/README.md
kba May 28, 2020
5c2f3a8
Update ocrd_models/README.md
kba May 28, 2020
6a57506
recursion (with both finite or arbitrary depth) for get_AllRegions
kba May 28, 2020
a9072c8
regenerate PAGE API
kba May 28, 2020
ac62b85
get_AllRegions: clean-up merge artifacts and reorganize
kba May 28, 2020
fd6d545
Update ocrd_models/ocrd_page_user_methods/get_AllRegions.py
kba May 28, 2020
86a7133
get_AllRegions: _region_id method unneccessary now
kba May 28, 2020
ce06392
Merge branch 'get-all-regions' of https://github.com/kba/ocrd-core in…
kba May 28, 2020
5c8d89b
regenerate PAGE API
kba May 28, 2020
f6e3da5
:art: pylint
kba May 28, 2020
8351056
add_AllIndexed -> extend_AllIndexed
kba May 28, 2020
f202205
get_AllRegions: differentiate "reading-order"/"reading-order-only"
kba May 28, 2020
ffba6f9
get_AllRegions: catch negative depth, test depth==0
kba May 29, 2020
207f396
:memo: get_AllRegions: document example
bertsky May 29, 2020
9ced315
get_AllRegions: fix recursion
kba May 29, 2020
629f38d
get_AllRegions: Update example
kba May 29, 2020
e958559
wip
kba May 29, 2020
1964563
reading order test sample: add unorderedgroups for testing
kba May 29, 2020
27e256f
add get_UnorderedGroupChildren, let get_AllIndexed handle UnorderedGr…
kba May 29, 2020
1b17e3f
get_AllIndexed: allow filtering by child type
kba May 29, 2020
ae613cf
get_AllIndexed: index_sort parameter to enable/disable sorting
kba May 29, 2020
b1df95f
add sort_AllIndexed to sort in-place
kba May 29, 2020
fd9dc83
extend_AllIndexed: increment @index when adding elements
kba May 29, 2020
9d0e539
Merge branch 'master' into get-all-regions
kba May 29, 2020
84f1d33
:memo: changelog
kba May 29, 2020
0e14633
Document extend_AllIndexed validate_contiunuity param
kba Jun 3, 2020
b79474a
Merge branch 'master' into get-all-regions
kba Jun 4, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions ocrd_models/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,31 @@
> OCR-D framework - file format APIs and schemas

See https://github.com/OCR-D/core

## Adding user methods to the generated PAGE API

Let's say you want to add a method `get_FirstTextRegion` on the pc:Page element:

1. Create a file `ocrd_models/ocrd_page_user_methods/get_FirstTextRegion.py`

```python
def get_FirstTextRegion(self):
return self.get_TextRegion[0]
```

kba marked this conversation as resolved.
Show resolved Hide resolved
(Note that the method name and file name must be identical.)
2. Edit `ocrd_models/ocrd_page_user_methods.py` and append to the `METHOD_SPECS` list:

```python
METHOD_SPECS = (
# ...
_add_method(r'^PageType$', 'get_FirstTextRegion')
# ...
)
```

3. Regenerate the PAGE API:

```sh
make generate-page
```
150 changes: 103 additions & 47 deletions ocrd_models/ocrd_models/ocrd_page_generateds.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@
# -*- coding: utf-8 -*-

#
# Generated Wed May 13 16:09:07 2020 by generateDS.py version 2.35.20.
# Python 3.7.6 (default, Jan 8 2020, 19:59:22) [GCC 7.3.0]
# Generated Thu May 28 14:25:40 2020 by generateDS.py version 2.35.20.
# Python 3.6.9 (default, Apr 18 2020, 01:56:04) [GCC 8.4.0]
#
# Command line options:
# ('-f', '')
Expand All @@ -16,7 +16,7 @@
# repo/assets/data/schema/data/2019.xsd
#
# Command line:
# /home/kba/miniconda3/bin/generateDS -f --root-element="PcGts" -o "ocrd_models/ocrd_models/ocrd_page_generateds.py" --disable-generatedssuper-lookup --user-methods="ocrd_models/ocrd_page_user_methods.py" repo/assets/data/schema/data/2019.xsd
# /home/kba/ocrd_all/venv/bin/generateDS -f --root-element="PcGts" -o "ocrd_models/ocrd_models/ocrd_page_generateds.py" --disable-generatedssuper-lookup --user-methods="ocrd_models/ocrd_page_user_methods.py" repo/assets/data/schema/data/2019.xsd
#
# Current working directory (os.getcwd()):
# core
Expand Down Expand Up @@ -2850,6 +2850,60 @@ def buildChildren(self, child_, node, nodeName_, fromsubclass_=False, gds_collec
obj_.original_tagname_ = 'CustomRegion'
def __hash__(self):
return hash(self.id)
def get_AllRegions(self, classes=None, order='document'):
"""
Get all the *Region element or only those provided by ``classes``.
Returned in document order unless ``order`` is ``reading-order`` is set (NOT CURRENTLY IMPLEMENTED)
kba marked this conversation as resolved.
Show resolved Hide resolved
Arguments:
classes (list) Classes of regions that shall be returned, e.g. ['Text', 'Image']
order ("document"|"reading-order") Whether to return regions sorted by document order (default) or by reading order
"""
if order not in ['document', 'reading-order']:
raise Exception("Argument 'order' must be either 'document' or 'reading-order', not '{}'".format(order))
def region_class(x):
return x.__class__.__name__.replace('RegionType', '')
def get_recursive_regions(regions, level):
if level == 1:
# stop recursion, filter classes
if classes:
return [r for r in regions if region_class(r) in classes]
else:
return regions
# find more regions recursively
more_regions = []
for region in regions:
more_regions.append([])
for class_ in ['Advert', 'Chart', 'Chem', 'Custom', 'Graphic', 'Image', 'LineDrawing', 'Map', 'Maths', 'Music', 'Noise', 'Separator', 'Table', 'Text', 'Unknown']:
if class_ == 'Map' and not isinstance(region, PageType):
# 'Map' is not recursive in 2019 schema
continue
more_regions[-1] += getattr(region, 'get_{}Region'.format(class_))()
if not any(more_regions):
return get_recursive_regions(regions, 1)
regions = [region for r, more in zip(regions, more_regions) for region in [r] + more]
return get_recursive_regions(regions, level - 1 if level else 0)
ret = get_recursive_regions([self], depth + 1 if depth else 0)
if order == 'reading-order':
reading_order = self.get_ReadingOrder()
if reading_order:
reading_order = reading_order.get_OrderedGroup() or reading_order.get_UnorderedGroup()
if reading_order:
def get_recursive_reading_order(rogroup):
if isinstance(rogroup, (OrderedGroupType, OrderedGroupIndexedType)):
elements = rogroup.get_AllIndexed()
if isinstance(rogroup, (UnorderedGroupType, UnorderedGroupIndexedType)):
elements = (rogroup.get_RegionRef() + rogroup.get_OrderedGroup() + rogroup.get_UnorderedGroup())
regionrefs = list()
for elem in elements:
regionrefs.append(elem.get_regionRef())
if not isinstance(elem, (RegionRefType, RegionRefIndexedType)):
regionrefs.extend(get_recursive_reading_order(elem))
return regionrefs
reading_order = get_recursive_reading_order(reading_order)
if reading_order:
id2region = dict([(region.id, region) for region in ret])
ret = [id2region[region_id] for region_id in reading_order if region_id in id2region]
kba marked this conversation as resolved.
Show resolved Hide resolved
return ret
# end class PageType


Expand Down Expand Up @@ -5347,7 +5401,28 @@ def buildChildren(self, child_, node, nodeName_, fromsubclass_=False, gds_collec
obj_.original_tagname_ = 'UnorderedGroupIndexed'
def __hash__(self):
return hash(self.id)

def get_AllIndexed(self):
return sorted(self.get_RegionRefIndexed() + self.get_OrderedGroupIndexed() + self.get_UnorderedGroupIndexed(), key=lambda x : x.index)

def clear_AllIndexed(self):
ret = self.get_AllIndexed()
self.set_RegionRefIndexed([])
self.set_OrderedGroupIndexed([])
self.set_UnorderedGroupIndexed([])
return ret

def add_AllIndexed(self, elements):
if not isinstance(elements, list):
elements = [elements]
for element in sorted(elements, key=lambda x : x.index):
bertsky marked this conversation as resolved.
Show resolved Hide resolved
if isinstance(element, RegionRefIndexedType):
self.add_RegionRefIndexed(element)
elif isinstance(element, OrderedGroupIndexedType):
self.add_OrderedGroupIndexed(element)
elif isinstance(element, UnorderedGroupIndexedType):
self.add_UnorderedGroupIndexed(element)
return self.get_AllIndexed()

def exportChildren(self, outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='OrderedGroupType', fromsubclass_=False, pretty_print=True):
eol_ = '\n' if pretty_print else ''
namespaceprefix_ = 'pc:'
Expand All @@ -5367,27 +5442,7 @@ def exportChildren(self, outfile, level, namespaceprefix_='', namespacedef_='xml
cleaned.append(entry)
for entry in cleaned:
entry.export(outfile, level, namespaceprefix_, namespacedef_='', name_=entry.__class__.__name__[:-4], pretty_print=pretty_print)

def get_AllIndexed(self):
return sorted(self.get_RegionRefIndexed() + self.get_OrderedGroupIndexed() + self.get_UnorderedGroupIndexed(), key=lambda x : x.index)
def add_AllIndexed(self, elements):
if not isinstance(elements, list):
elements = [elements]
for element in sorted(elements, key=lambda x : x.index):
if isinstance(element, RegionRefIndexedType):
self.add_RegionRefIndexed(element)
elif isinstance(element, OrderedGroupIndexedType):
self.add_OrderedGroupIndexed(element)
elif isinstance(element, UnorderedGroupIndexedType):
self.add_UnorderedGroupIndexed(element)
return self.get_AllIndexed()

def clear_AllIndexed(self):
ret = self.get_AllIndexed()
self.set_RegionRefIndexed([])
self.set_OrderedGroupIndexed([])
self.set_UnorderedGroupIndexed([])
return ret

# end class OrderedGroupIndexedType


Expand Down Expand Up @@ -6136,7 +6191,28 @@ def buildChildren(self, child_, node, nodeName_, fromsubclass_=False, gds_collec
obj_.original_tagname_ = 'UnorderedGroupIndexed'
def __hash__(self):
return hash(self.id)

def get_AllIndexed(self):
return sorted(self.get_RegionRefIndexed() + self.get_OrderedGroupIndexed() + self.get_UnorderedGroupIndexed(), key=lambda x : x.index)

def clear_AllIndexed(self):
ret = self.get_AllIndexed()
self.set_RegionRefIndexed([])
self.set_OrderedGroupIndexed([])
self.set_UnorderedGroupIndexed([])
return ret

def add_AllIndexed(self, elements):
if not isinstance(elements, list):
elements = [elements]
for element in sorted(elements, key=lambda x : x.index):
if isinstance(element, RegionRefIndexedType):
self.add_RegionRefIndexed(element)
elif isinstance(element, OrderedGroupIndexedType):
self.add_OrderedGroupIndexed(element)
elif isinstance(element, UnorderedGroupIndexedType):
self.add_UnorderedGroupIndexed(element)
return self.get_AllIndexed()

def exportChildren(self, outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='OrderedGroupType', fromsubclass_=False, pretty_print=True):
eol_ = '\n' if pretty_print else ''
namespaceprefix_ = 'pc:'
Expand All @@ -6156,27 +6232,7 @@ def exportChildren(self, outfile, level, namespaceprefix_='', namespacedef_='xml
cleaned.append(entry)
for entry in cleaned:
entry.export(outfile, level, namespaceprefix_, namespacedef_='', name_=entry.__class__.__name__[:-4], pretty_print=pretty_print)

def get_AllIndexed(self):
return sorted(self.get_RegionRefIndexed() + self.get_OrderedGroupIndexed() + self.get_UnorderedGroupIndexed(), key=lambda x : x.index)
def add_AllIndexed(self, elements):
if not isinstance(elements, list):
elements = [elements]
for element in sorted(elements, key=lambda x : x.index):
if isinstance(element, RegionRefIndexedType):
self.add_RegionRefIndexed(element)
elif isinstance(element, OrderedGroupIndexedType):
self.add_OrderedGroupIndexed(element)
elif isinstance(element, UnorderedGroupIndexedType):
self.add_UnorderedGroupIndexed(element)
return self.get_AllIndexed()

def clear_AllIndexed(self):
ret = self.get_AllIndexed()
self.set_RegionRefIndexed([])
self.set_OrderedGroupIndexed([])
self.set_UnorderedGroupIndexed([])
return ret

# end class OrderedGroupType


Expand Down
99 changes: 17 additions & 82 deletions ocrd_models/ocrd_page_user_methods.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
# source: https://bitbucket.org/dkuhlman/generateds/src/default/gends_user_methods.py

import re
import codecs
from os.path import dirname, join

#
# You must include the following class definition at the top of
Expand Down Expand Up @@ -80,94 +82,27 @@ def show(self):
# generated superclass file and also section "User Methods" in
# the documentation, as well as the examples below.

#
# Replace the following method specifications with your own.

#
# List all *Indexed children sorted by @index
#
get_AllIndexed = MethodSpec(name='get_AllIndexed',
source=r'''
def get_AllIndexed(self):
return sorted(self.get_RegionRefIndexed() + self.get_OrderedGroupIndexed() + self.get_UnorderedGroupIndexed(), key=lambda x : x.index) ''', class_names=r'^(OrderedGroupType|OrderedGroupIndexedType)$')

#
# Clear all *Indexed children sorted by @index
#
clear_AllIndexed = MethodSpec(name='clear_AllIndexed',
source=r'''
def clear_AllIndexed(self):
ret = self.get_AllIndexed()
self.set_RegionRefIndexed([])
self.set_OrderedGroupIndexed([])
self.set_UnorderedGroupIndexed([])
return ret
''', class_names=r'^(OrderedGroupType|OrderedGroupIndexedType)$')

#
# Add all *Indexed children sorted by @index
#
add_AllIndexed = MethodSpec(name='add_AllIndexed',
source=r'''
def add_AllIndexed(self, elements):
if not isinstance(elements, list):
elements = [elements]
for element in sorted(elements, key=lambda x : x.index):
if isinstance(element, RegionRefIndexedType):
self.add_RegionRefIndexed(element)
elif isinstance(element, OrderedGroupIndexedType):
self.add_OrderedGroupIndexed(element)
elif isinstance(element, UnorderedGroupIndexedType):
self.add_UnorderedGroupIndexed(element)
return self.get_AllIndexed()
''', class_names=r'^(OrderedGroupType|OrderedGroupIndexedType)$')
def _add_method(class_re, method_name):
kba marked this conversation as resolved.
Show resolved Hide resolved
"""
Loads a file ./ocrd_page_user_methods/{{ method_name }}.py and defines a MethodSpec applying to class_re
"""
source = []
with codecs.open(join(dirname(__file__), 'ocrd_page_user_methods', '%s.py' % method_name)) as f:
for line in f.readlines():
source.append(' %s' % line if line else line)
return MethodSpec(name=method_name, class_names=class_re, source=''.join(source))


#
# export children sorted by index of the childelement
#
exportChildren = MethodSpec(name='exportChildren',
source=r'''
def exportChildren(self, outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='OrderedGroupType', fromsubclass_=False, pretty_print=True):
eol_ = '\n' if pretty_print else ''
namespaceprefix_ = 'pc:'
if self.UserDefined is not None:
self.UserDefined.export(outfile, level, namespaceprefix_, namespacedef_='', name_='UserDefined', pretty_print=pretty_print)
for Labels_ in self.Labels:
Labels_.export(outfile, level, namespaceprefix_, namespacedef_='', name_='Labels', pretty_print=pretty_print)
cleaned = []
# remove emtpy groups and replace with RegionRefIndexedType
for entry in self.get_AllIndexed():
if isinstance(entry, (UnorderedGroupIndexedType, OrderedGroupIndexedType)) and not entry.get_AllIndexed():
rri = RegionRefIndexedType.factory(parent_object_=self)
rri.index = entry.index
rri.regionRef = entry.regionRef
cleaned.append(rri)
else:
cleaned.append(entry)
for entry in cleaned:
entry.export(outfile, level, namespaceprefix_, namespacedef_='', name_=entry.__class__.__name__[:-4], pretty_print=pretty_print)
''', class_names=r'^(OrderedGroupType|OrderedGroupIndexedType)$')
#
# Hash by memory adress/id()
#
hash_by_id = MethodSpec(name='hash',
source='''\
def __hash__(self):
return hash(self.id)
''',
class_names=r'^.*$',
)
#
# Provide a list of your method specifications.
# This list of specifications must be named METHOD_SPECS.
#
METHOD_SPECS = (
hash_by_id,
exportChildren,
get_AllIndexed,
add_AllIndexed,
clear_AllIndexed,
_add_method(r'^.*$', '__hash__'),
_add_method(r'^(OrderedGroupType|OrderedGroupIndexedType)$', 'get_AllIndexed'),
_add_method(r'^(OrderedGroupType|OrderedGroupIndexedType)$', 'clear_AllIndexed'),
_add_method(r'^(OrderedGroupType|OrderedGroupIndexedType)$', 'add_AllIndexed'),
_add_method(r'^(OrderedGroupType|OrderedGroupIndexedType)$', 'exportChildren'),
_add_method(r'^(PageType)$', 'get_AllRegions'),
)


Expand Down
2 changes: 2 additions & 0 deletions ocrd_models/ocrd_page_user_methods/__hash__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
def __hash__(self):
return hash(self.id)
12 changes: 12 additions & 0 deletions ocrd_models/ocrd_page_user_methods/add_AllIndexed.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
def add_AllIndexed(self, elements):
if not isinstance(elements, list):
elements = [elements]
for element in sorted(elements, key=lambda x : x.index):
if isinstance(element, RegionRefIndexedType):
self.add_RegionRefIndexed(element)
elif isinstance(element, OrderedGroupIndexedType):
self.add_OrderedGroupIndexed(element)
elif isinstance(element, UnorderedGroupIndexedType):
self.add_UnorderedGroupIndexed(element)
return self.get_AllIndexed()

7 changes: 7 additions & 0 deletions ocrd_models/ocrd_page_user_methods/clear_AllIndexed.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
def clear_AllIndexed(self):
ret = self.get_AllIndexed()
self.set_RegionRefIndexed([])
self.set_OrderedGroupIndexed([])
self.set_UnorderedGroupIndexed([])
return ret

20 changes: 20 additions & 0 deletions ocrd_models/ocrd_page_user_methods/exportChildren.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
def exportChildren(self, outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='OrderedGroupType', fromsubclass_=False, pretty_print=True):
eol_ = '\n' if pretty_print else ''
namespaceprefix_ = 'pc:'
if self.UserDefined is not None:
self.UserDefined.export(outfile, level, namespaceprefix_, namespacedef_='', name_='UserDefined', pretty_print=pretty_print)
for Labels_ in self.Labels:
Labels_.export(outfile, level, namespaceprefix_, namespacedef_='', name_='Labels', pretty_print=pretty_print)
cleaned = []
# remove emtpy groups and replace with RegionRefIndexedType
for entry in self.get_AllIndexed():
if isinstance(entry, (UnorderedGroupIndexedType, OrderedGroupIndexedType)) and not entry.get_AllIndexed():
rri = RegionRefIndexedType.factory(parent_object_=self)
rri.index = entry.index
rri.regionRef = entry.regionRef
cleaned.append(rri)
else:
cleaned.append(entry)
for entry in cleaned:
entry.export(outfile, level, namespaceprefix_, namespacedef_='', name_=entry.__class__.__name__[:-4], pretty_print=pretty_print)

3 changes: 3 additions & 0 deletions ocrd_models/ocrd_page_user_methods/get_AllIndexed.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
def get_AllIndexed(self):
return sorted(self.get_RegionRefIndexed() + self.get_OrderedGroupIndexed() + self.get_UnorderedGroupIndexed(), key=lambda x : x.index)

Loading