Skip to content

Commit

Permalink
Add label groups for hierarchical classification in ImageNet (#1645)
Browse files Browse the repository at this point in the history
<!-- Contributing guide:
https://github.com/openvinotoolkit/datumaro/blob/develop/CONTRIBUTING.md
-->

### Summary
This PR adds grouping of labels (directories for the ImageNet case) by
their groups (parent directories).
For example, for the following folder structure
```
.
├── label_1
│   └── label_1_1
│       └── 1.jpg
└── label_2
    └── label_2_1
        └── 2.jpg
```
label groups will be `label_1` and `label_2`.


**Note**: for the higher depth of nesting, names of groups will be
relative paths of second-to-last directories. For the following case:
```
.
├── label_1
│   └── label_1_1
│       └── label_1_1_1
│           └── 1.jpg
│        
└── label_2
    └── label_2_1
        └── label_2_1_1
            └── 2.jpg
```
label groups will be `label_1/label_1_1` and `label_2/label_2_1`.
<!--
Resolves #111 and #222.
Depends on #1000 (for series of dependent commits).

This PR introduces this capability to make the project better in this
and that.

- Added this feature
- Removed that feature
- Fixed the problem #1234
-->

### How to test
<!-- Describe the testing procedure for reviewers, if changes are
not fully covered by unit tests or manual testing can be complicated.
-->

### Checklist
<!-- Put an 'x' in all the boxes that apply -->
- [x] I have added unit tests to cover my changes.​
- [ ] I have added integration tests to cover my changes.​
- [x] I have added the description of my changes into
[CHANGELOG](https://github.com/openvinotoolkit/datumaro/blob/develop/CHANGELOG.md).​
- [ ] I have updated the
[documentation](https://github.com/openvinotoolkit/datumaro/tree/develop/docs)
accordingly

### License

- [x] I submit _my code changes_ under the same [MIT
License](https://github.com/openvinotoolkit/datumaro/blob/develop/LICENSE)
that covers the project.
  Feel free to contact the maintainers if that's a concern.
- [ ] I have updated the license header for each file (see an example
below).

```python
# Copyright (C) 2024 Intel Corporation
#
# SPDX-License-Identifier: MIT
```

---------

Signed-off-by: Ilya Trushkin <[email protected]>
  • Loading branch information
itrushkin authored Oct 23, 2024
1 parent 3d533b9 commit e79cca2
Show file tree
Hide file tree
Showing 4 changed files with 31 additions and 6 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
(<https://github.com/openvinotoolkit/datumaro/pull/1594>)
- Convert Cuboid2D annotation to/from 3D data
(<https://github.com/openvinotoolkit/datumaro/pull/1639>)
- Add label groups for hierarchical classification in ImageNet
(<https://github.com/openvinotoolkit/datumaro/pull/1645>)

### Enhancements
- Enhance 'id_from_image_name' transform to ensure each identifier is unique
Expand Down
12 changes: 11 additions & 1 deletion src/datumaro/plugins/data_formats/imagenet.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,8 +48,18 @@ def _load_categories(self, path):
path = Path(path)
for dirname in sorted(d for d in path.rglob("*") if d.is_dir()):
dirname = dirname.relative_to(path)
level = len(dirname.parts)
if str(dirname) != ImagenetPath.IMAGE_DIR_NO_LABEL:
label_cat.add(str(dirname))
parent = None
if level > 1:
parent = str(dirname.parents[0])
if not any([g.name == parent for g in label_cat.label_groups]):
label_cat.add_label_group(parent, [str(dirname.name)], group_type=0)
else:
g = next(x for x in label_cat.label_groups if x.name == parent)
g.labels.append(str(dirname.name))
label_cat.add(str(dirname), parent)

return {AnnotationType.label: label_cat}

def _load_items(self, path):
Expand Down
12 changes: 7 additions & 5 deletions tests/unit/test_imagenet_format.py
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,12 @@ class ImagenetImporterTest:
IMPORTER_NAME = ImagenetImporter.NAME

def _create_expected_dataset(self):
label_categories = LabelCategories.from_iterable(
("label_0", "label_1", f"{Path('label_1', 'label_1_1')}")
)
label_categories[-1].parent = "label_1"
label_categories.add_label_group(name="label_1", labels=["label_1_1"], group_type=0)

return Dataset.from_iterable(
[
DatasetItem(
Expand All @@ -204,11 +210,7 @@ def _create_expected_dataset(self):
annotations=[Label(1)],
),
],
categories={
AnnotationType.label: LabelCategories.from_iterable(
("label_0", "label_1", f"{Path('label_1', 'label_1_1')}")
),
},
categories={AnnotationType.label: label_categories},
)

@mark_requirement(Requirements.DATUM_GENERAL_REQ)
Expand Down
11 changes: 11 additions & 0 deletions tests/utils/test_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,17 @@ def compare_categories(test, expected, actual):
sorted(expected[AnnotationType.label].items, key=lambda t: t.name),
sorted(actual[AnnotationType.label].items, key=lambda t: t.name),
)
if expected[AnnotationType.label].label_groups:
assert len(expected[AnnotationType.label].label_groups) == len(
actual[AnnotationType.label].label_groups
)
for expected_group, actual_group in zip(
expected[AnnotationType.label].label_groups,
actual[AnnotationType.label].label_groups,
):
test.assertEqual(set(expected_group.labels), set(actual_group.labels))
test.assertEqual(expected_group.group_type, actual_group.group_type)

if AnnotationType.mask in expected:
test.assertEqual(
expected[AnnotationType.mask].colormap,
Expand Down

0 comments on commit e79cca2

Please sign in to comment.