Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release v0.1.5 #92

Merged
merged 45 commits into from
Jan 23, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
b716103
Relax importer for Pascal VOC dataset (seach in subdirectories) (#50)
Nov 13, 2020
5d3f3e5
Allow missing supercategory in COCO annotations (#54)
Nov 23, 2020
c675cbb
Add CamVid format support (#55)
yasakova-anastasia Dec 1, 2020
c4a0c19
Fix ImageNet format
yasakova-anastasia Dec 1, 2020
2447e7f
Merge pull request #58 from openvinotoolkit/ay/fix-imagenet
yasakova-anastasia Dec 2, 2020
cd74a87
Fix CamVid format (#57)
yasakova-anastasia Dec 3, 2020
9102d74
ability to install opencv-python-headless instead opencv-python (#62)
Dec 10, 2020
50ca547
Release 0.1.4 (#63)
Dec 10, 2020
ed9ab29
Add function to transform labels
yasakova-anastasia Dec 16, 2020
0e48bb8
Add Wider Face format support (#65)
yasakova-anastasia Dec 16, 2020
64ace3f
some fixes
yasakova-anastasia Dec 16, 2020
8fd5c53
update changelog
yasakova-anastasia Dec 17, 2020
4702754
Little refactoring
Dec 17, 2020
7d708b4
Merge pull request #66 from openvinotoolkit/ay/transform-labels
yasakova-anastasia Dec 17, 2020
77fdd4d
Fixed WiderFace (#67)
yasakova-anastasia Dec 18, 2020
893dd96
Add VGGFace2 format support (#69)
yasakova-anastasia Dec 25, 2020
6f1f494
Kate/splitter (#68)
Dec 30, 2020
909e3b2
Refactor Environment class (#70)
Jan 5, 2021
7f9767c
Fix yolo extractor on windows
Dec 17, 2020
2d66984
Fix windows installation
Dec 17, 2020
7f363d3
Update setup
Jan 5, 2021
e7759e8
Fix windows setup (#73)
Jan 5, 2021
54107ed
Extend Dataset class, allow Extractor-based datasets (#71)
Jan 5, 2021
c473ba9
Move dataset tests to a separate file (#74)
Jan 6, 2021
f8cdef5
Fixes (#76)
Jan 6, 2021
9d66c57
Snyk integration (#78)
Jan 11, 2021
7772613
Update dataset importing (#79)
Jan 11, 2021
69ae42c
Update dataset export (#80)
Jan 11, 2021
e4d333b
CLI: move project context commands to CLI root (#84)
Jan 13, 2021
8e602fc
Add more coco dataset examples (#83)
Jan 13, 2021
1ee908f
Kate/splitter cli (#81)
Jan 14, 2021
9ff4611
Support more image formats in ImageNet (#85)
yasakova-anastasia Jan 14, 2021
9644426
Allow importer-based project sources (#86)
Jan 15, 2021
8804931
Update setup script (#88)
Jan 15, 2021
911af5d
Add unique image count statistic (#87)
Jan 15, 2021
c754d7e
Fixed VGGFace2 (#82)
yasakova-anastasia Jan 18, 2021
8b1a6b4
Add a folder for unlabeled items in VggFace2 dataset format (#89)
yasakova-anastasia Jan 19, 2021
f489c17
Add label support in WiderFace dataset format (#90)
yasakova-anastasia Jan 19, 2021
fb9a638
Release 0.1.5 (dev) (#91)
Jan 19, 2021
0c906ef
Merge branch 'master' into zm/sync-master
Jan 19, 2021
3fdbf37
Merge pull request #93 from openvinotoolkit/zm/sync-master
Jan 19, 2021
529cc6a
Fix model docs (#94)
Jan 19, 2021
30c0648
Little fix in function for recursive find sources (#95)
yasakova-anastasia Jan 20, 2021
e1ed5f6
Update changelog and docs (#98)
Jan 23, 2021
c7e1fdf
Push version to sync with pip (#99)
Jan 23, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 20 additions & 5 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,21 +6,36 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).


## [Unreleased]
## 01/23/2021 - Release v0.1.5
### Added
-
- `WiderFace` dataset format (<https://github.com/openvinotoolkit/datumaro/pull/65>, <https://github.com/openvinotoolkit/datumaro/pull/90>)
- Function to transform annotations to labels (<https://github.com/openvinotoolkit/datumaro/pull/66>)
- Dataset splits for classification, detection and re-id tasks (<https://github.com/openvinotoolkit/datumaro/pull/68>, <https://github.com/openvinotoolkit/datumaro/pull/81>)
- `VGGFace2` dataset format (<https://github.com/openvinotoolkit/datumaro/pull/69>, <https://github.com/openvinotoolkit/datumaro/pull/82>)
- Unique image count statistic (<https://github.com/openvinotoolkit/datumaro/pull/87>)
- Installation with pip by name `datumaro`

### Changed
-
- `Dataset` class extended with new operations: `save`, `load`, `export`, `import_from`, `detect`, `run_model` (<https://github.com/openvinotoolkit/datumaro/pull/71>)
- Allowed importing `Extractor`-only defined formats (in `Project.import_from`, `dataset.import_from` and CLI/`project import`) (<https://github.com/openvinotoolkit/datumaro/pull/71>)
- `datum project ...` commands replaced with `datum ...` commands (<https://github.com/openvinotoolkit/datumaro/pull/84>)
- Supported more image formats in `ImageNet` extractors (<https://github.com/openvinotoolkit/datumaro/pull/85>)
- Allowed adding `Importer`-defined formats as project sources (`source add`) (<https://github.com/openvinotoolkit/datumaro/pull/86>)
- Added max search depth in `ImageDir` format and importers (<https://github.com/openvinotoolkit/datumaro/pull/86>)

### Deprecated
-
- `datum project ...` CLI context (<https://github.com/openvinotoolkit/datumaro/pull/84>)

### Removed
-

### Fixed
-
- Allow plugins inherited from `Extractor` (instead of only `SourceExtractor`) (<https://github.com/openvinotoolkit/datumaro/pull/70>)
- Windows installation with `pip` for `pycocotools` (<https://github.com/openvinotoolkit/datumaro/pull/73>)
- `YOLO` extractor path matching on Windows (<https://github.com/openvinotoolkit/datumaro/pull/73>)
- Fixed inplace file copying when saving images (<https://github.com/openvinotoolkit/datumaro/pull/76>)
- Fixed `labelmap` parameter type checking in `VOC` converter (<https://github.com/openvinotoolkit/datumaro/pull/76>)
- Fixed model copying on addition in CLI (<https://github.com/openvinotoolkit/datumaro/pull/94>)

### Security
-
Expand Down
55 changes: 41 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,23 +44,23 @@ CVAT annotations ---> Publication, statistics etc.
- Convert only non-`occluded` annotations from a [CVAT](https://github.com/opencv/cvat) project to TFrecord:
```bash
# export Datumaro dataset in CVAT UI, extract somewhere, go to the project dir
datum project filter -e '/item/annotation[occluded="False"]' \
datum filter -e '/item/annotation[occluded="False"]' \
--mode items+anno --output-dir not_occluded
datum project export --project not_occluded \
datum export --project not_occluded \
--format tf_detection_api -- --save-images
```

- Annotate MS COCO dataset, extract image subset, re-annotate it in [CVAT](https://github.com/opencv/cvat), update old dataset:
```bash
# Download COCO dataset http://cocodataset.org/#download
# Put images to coco/images/ and annotations to coco/annotations/
datum project import --format coco --input-path <path/to/coco>
datum project export --filter '/image[images_I_dont_like]' --format cvat \
datum import --format coco --input-path <path/to/coco>
datum export --filter '/image[images_I_dont_like]' --format cvat \
--output-dir reannotation
# import dataset and images to CVAT, re-annotate
# export Datumaro project, extract to 'reannotation-upd'
datum project project merge reannotation-upd
datum project export --format coco
datum merge reannotation-upd
datum export --format coco
```

- Annotate instance polygons in [CVAT](https://github.com/opencv/cvat), export as masks in COCO:
Expand All @@ -72,18 +72,18 @@ CVAT annotations ---> Publication, statistics etc.
- Apply an OpenVINO detection model to some COCO-like dataset,
then compare annotations with ground truth and visualize in TensorBoard:
```bash
datum project import --format coco --input-path <path/to/coco>
datum import --format coco --input-path <path/to/coco>
# create model results interpretation script
datum model add mymodel openvino \
--weights model.bin --description model.xml \
--interpretation-script parse_results.py
datum model run --model mymodel --output-dir mymodel_inference/
datum project diff mymodel_inference/ --format tensorboard --output-dir diff
datum diff mymodel_inference/ --format tensorboard --output-dir diff
```

- Change colors in PASCAL VOC-like `.png` masks:
```bash
datum project import --format voc --input-path <path/to/voc/dataset>
datum import --format voc --input-path <path/to/voc/dataset>

# Create a color map file with desired colors:
#
Expand All @@ -93,24 +93,42 @@ CVAT annotations ---> Publication, statistics etc.
#
# Save as mycolormap.txt

datum project export --format voc_segmentation -- --label-map mycolormap.txt
datum export --format voc_segmentation -- --label-map mycolormap.txt
# add "--apply-colormap=0" to save grayscale (indexed) masks
# check "--help" option for more info
# use "datum --loglevel debug" for extra conversion info
```

- Create a custom COCO-like dataset:
```python
import numpy as np
from datumaro.components.extractor import (DatasetItem,
Bbox, LabelCategories, AnnotationType)
from datumaro.components.dataset import Dataset

dataset = Dataset(categories={
AnnotationType.label: LabelCategories.from_iterable(['cat', 'dog'])
})
dataset.put(DatasetItem(id=0, image=np.ones((5, 5, 3)), annotations=[
Bbox(1, 2, 3, 4, label=0),
]))
dataset.export('test_dataset', 'coco')
```

<!--lint enable list-item-bullet-indent-->
<!--lint enable list-item-indent-->

## Features

[(Back to top)](#table-of-contents)

- Dataset reading, writing, conversion in any direction. Supported formats:
- Dataset reading, writing, conversion in any direction. [Supported formats](docs/user_manual.md#supported-formats):
- [COCO](http://cocodataset.org/#format-data) (`image_info`, `instances`, `person_keypoints`, `captions`, `labels`*)
- [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/htmldoc/index.html) (`classification`, `detection`, `segmentation`, `action_classification`, `person_layout`)
- [YOLO](https://github.com/AlexeyAB/darknet#how-to-train-pascal-voc-data) (`bboxes`)
- [TF Detection API](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/using_your_own_dataset.md) (`bboxes`, `masks`)
- [WIDER Face](http://shuoyang1213.me/WIDERFACE/) (`bboxes`)
- [VGGFace2](https://github.com/ox-vgg/vgg_face2) (`landmarks`, `bboxes`)
- [MOT sequences](https://arxiv.org/pdf/1906.04567.pdf)
- [MOTS PNG](https://www.vision.rwth-aachen.de/page/mots)
- [ImageNet](http://image-net.org/)
Expand All @@ -129,6 +147,14 @@ CVAT annotations ---> Publication, statistics etc.
- polygons to instance masks and vise-versa
- apply a custom colormap for mask annotations
- rename or remove dataset labels
- Splitting a dataset into multiple subsets like `train`, `val`, and `test`:
- random split
- task-specific splits based on annotations,
which keep initial label and attribute distributions
- for classification task, based on labels
- for detection task, based on bboxes
- for re-identification task, based on labels,
avoiding having same IDs in training and test splits
- Dataset quality checking
- Simple checking for errors
- Comparison with model infernece
Expand Down Expand Up @@ -162,7 +188,7 @@ python -m virtualenv venv
Install Datumaro package:

``` bash
pip install 'git+https://github.com/openvinotoolkit/datumaro'
pip install datumaro
```

## Usage
Expand Down Expand Up @@ -208,13 +234,14 @@ dataset = dataset.transform(project.env.transforms.get('remap_labels'),
{'cat': 'dog', # rename cat to dog
'truck': 'car', # rename truck to car
'person': '', # remove this label
}, default='delete')
}, default='delete') # remove everything else

# iterate over dataset elements
for item in dataset:
print(item.id, item.annotations)

# export the resulting dataset in COCO format
project.env.converters.get('coco').convert(dataset, save_dir='dst/dir')
dataset.export('dst/dir', 'coco')
```

> Check our [developer guide](docs/developer_guide.md) for additional information.
Expand Down
2 changes: 1 addition & 1 deletion datumaro/cli/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@

# Copyright (C) 2019-2020 Intel Corporation
# Copyright (C) 2019-2021 Intel Corporation
#
# SPDX-License-Identifier: MIT
25 changes: 16 additions & 9 deletions datumaro/cli/__main__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@

# Copyright (C) 2019-2020 Intel Corporation
# Copyright (C) 2019-2021 Intel Corporation
#
# SPDX-License-Identifier: MIT

Expand Down Expand Up @@ -58,18 +58,25 @@ def make_parser():
_LogManager._define_loglevel_option(parser)

known_contexts = [
('project', contexts.project, "Actions on projects (datasets)"),
('source', contexts.source, "Actions on data sources"),
('model', contexts.model, "Actions on models"),
('project', contexts.project, "Actions with project (deprecated)"),
('source', contexts.source, "Actions with data sources"),
('model', contexts.model, "Actions with models"),
]
known_commands = [
('create', commands.create, "Create project"),
('add', commands.add, "Add source to project"),
('remove', commands.remove, "Remove source from project"),
('export', commands.export, "Export project"),
('import', commands.import_, "Create project from existing dataset"),
('add', commands.add, "Add data source to project"),
('remove', commands.remove, "Remove data source from project"),
('export', commands.export, "Export project in some format"),
('filter', commands.filter, "Filter project"),
('transform', commands.transform, "Transform project"),
('merge', commands.merge, "Merge projects"),
('convert', commands.convert, "Convert dataset into another format"),
('diff', commands.diff, "Compare projects with intersection"),
('ediff', commands.ediff, "Compare projects for equality"),
('stats', commands.stats, "Compute project statistics"),
('info', commands.info, "Print project info"),
('explain', commands.explain, "Run Explainable AI algorithm for model"),
('merge', commands.merge, "Merge datasets"),
('convert', commands.convert, "Convert dataset"),
]

# Argparse doesn't support subparser groups:
Expand Down
13 changes: 10 additions & 3 deletions datumaro/cli/commands/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,13 @@

# Copyright (C) 2019-2020 Intel Corporation
# Copyright (C) 2019-2021 Intel Corporation
#
# SPDX-License-Identifier: MIT

from . import add, create, explain, export, remove, merge, convert
# pylint: disable=redefined-builtin

from . import (
create, add, remove, import_,
explain,
export, merge, convert, transform, filter,
diff, ediff, stats,
info
)
3 changes: 1 addition & 2 deletions datumaro/cli/commands/add.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@

# Copyright (C) 2019-2020 Intel Corporation
# Copyright (C) 2020-2021 Intel Corporation
#
# SPDX-License-Identifier: MIT

Expand Down
49 changes: 12 additions & 37 deletions datumaro/cli/commands/convert.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@

# Copyright (C) 2019-2020 Intel Corporation
# Copyright (C) 2019-2021 Intel Corporation
#
# SPDX-License-Identifier: MIT

Expand All @@ -9,6 +8,7 @@
import os.path as osp

from datumaro.components.project import Environment
from datumaro.components.dataset import Dataset

from ..contexts.project import FilterModes
from ..util import CliException, MultilineFormatter, make_file_name
Expand Down Expand Up @@ -63,51 +63,29 @@ def convert_command(args):
env = Environment()

try:
converter = env.converters.get(args.output_format)
converter = env.converters[args.output_format]
except KeyError:
raise CliException("Converter for format '%s' is not found" % \
args.output_format)
extra_args = converter.from_cmdline(args.extra_args)
def converter_proxy(extractor, save_dir):
return converter.convert(extractor, save_dir, **extra_args)
extra_args = converter.parse_cmdline(args.extra_args)

filter_args = FilterModes.make_filter_args(args.filter_mode)

fmt = args.input_format
if not args.input_format:
matches = []
for format_name in env.importers.items:
log.debug("Checking '%s' format...", format_name)
importer = env.make_importer(format_name)
try:
match = importer.detect(args.source)
if match:
log.debug("format matched")
matches.append((format_name, importer))
except NotImplementedError:
log.debug("Format '%s' does not support auto detection.",
format_name)

matches = env.detect_dataset(args.source)
if len(matches) == 0:
log.error("Failed to detect dataset format. "
"Try to specify format with '-if/--input-format' parameter.")
return 1
elif len(matches) != 1:
log.error("Multiple formats match the dataset: %s. "
"Try to specify format with '-if/--input-format' parameter.",
', '.join(m[0] for m in matches))
', '.join(matches))
return 2

format_name, importer = matches[0]
args.input_format = format_name
fmt = matches[0]
log.info("Source dataset format detected as '%s'", args.input_format)
else:
try:
importer = env.make_importer(args.input_format)
if hasattr(importer, 'from_cmdline'):
extra_args = importer.from_cmdline()
except KeyError:
raise CliException("Importer for format '%s' is not found" % \
args.input_format)

source = osp.abspath(args.source)

Expand All @@ -121,15 +99,12 @@ def converter_proxy(extractor, save_dir):
(osp.basename(source), make_file_name(args.output_format)))
dst_dir = osp.abspath(dst_dir)

project = importer(source)
dataset = project.make_dataset()
dataset = Dataset.import_from(source, fmt)

log.info("Exporting the dataset")
dataset.export_project(
save_dir=dst_dir,
converter=converter_proxy,
filter_expr=args.filter,
**filter_args)
if args.filter:
dataset = dataset.filter(args.filter, **filter_args)
dataset.export(format=args.output_format, save_dir=dst_dir, **extra_args)

log.info("Dataset exported to '%s' as '%s'" % \
(dst_dir, args.output_format))
Expand Down
3 changes: 1 addition & 2 deletions datumaro/cli/commands/create.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@

# Copyright (C) 2019-2020 Intel Corporation
# Copyright (C) 2019-2021 Intel Corporation
#
# SPDX-License-Identifier: MIT

Expand Down
7 changes: 7 additions & 0 deletions datumaro/cli/commands/diff.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Copyright (C) 2019-2021 Intel Corporation
#
# SPDX-License-Identifier: MIT

# pylint: disable=unused-import

from ..contexts.project import build_diff_parser as build_parser
7 changes: 7 additions & 0 deletions datumaro/cli/commands/ediff.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Copyright (C) 2019-2021 Intel Corporation
#
# SPDX-License-Identifier: MIT

# pylint: disable=unused-import

from ..contexts.project import build_ediff_parser as build_parser
3 changes: 1 addition & 2 deletions datumaro/cli/commands/explain.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@

# Copyright (C) 2019-2020 Intel Corporation
# Copyright (C) 2019-2021 Intel Corporation
#
# SPDX-License-Identifier: MIT

Expand Down
3 changes: 1 addition & 2 deletions datumaro/cli/commands/export.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@

# Copyright (C) 2019-2020 Intel Corporation
# Copyright (C) 2019-2021 Intel Corporation
#
# SPDX-License-Identifier: MIT

Expand Down
7 changes: 7 additions & 0 deletions datumaro/cli/commands/filter.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Copyright (C) 2020-2021 Intel Corporation
#
# SPDX-License-Identifier: MIT

# pylint: disable=unused-import

from ..contexts.project import build_filter_parser as build_parser
Loading