Merge branch 'develop' into resnext-atss

openvinotoolkit · Jul 7, 2023 · 761806d · 761806d
2 parents f54dd48 + 7027132
commit 761806d
Show file tree

Hide file tree

Showing 206 changed files with 4,544 additions and 1,961 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -12,7 +12,7 @@ All notable changes to this project will be documented in this file.
 - Add per-class XAI saliency maps for Mask R-CNN model (https://github.com/openvinotoolkit/training_extensions/pull/2227)
 - Add new object detector Deformable DETR (<https://github.com/openvinotoolkit/training_extensions/pull/2249>)
 - Add new object detector DINO(<https://github.com/openvinotoolkit/training_extensions/pull/2266>)
-- Add new visual prompting task (https://github.com/openvinotoolkit/training_extensions/pull/2203)
+- Add new visual prompting task (https://github.com/openvinotoolkit/training_extensions/pull/2203), (https://github.com/openvinotoolkit/training_extensions/pull/2274)
 - Add new object detector ResNeXt101-ATSS (<https://github.com/openvinotoolkit/training_extensions/pull/2309>)
 
 ### Enhancements
@@ -22,6 +22,7 @@ All notable changes to this project will be documented in this file.
 - Set persistent_workers and pin_memory as True in detection task (<https://github.com/openvinotoolkit/training_extensions/pull/2224>)
 - New algorithm for Semi-SL semantic segmentation based on metric lerning via class prototypes (https://github.com/openvinotoolkit/training_extensions/pull/2156)
 - Self-SL for classification now can recieve just folder with any images to start contrastive pretraining (https://github.com/openvinotoolkit/training_extensions/pull/2219)
+- Update OpenVINO version to 2023.0, and NNCF verion to 2.5 (<https://github.com/openvinotoolkit/training_extensions/pull/2090>)
 - Improve XAI saliency map generation for tiling detection and tiling instance segmentation (https://github.com/openvinotoolkit/training_extensions/pull/2240)
 
 ### Bug fixes
@@ -31,7 +32,7 @@ All notable changes to this project will be documented in this file.
 
 ### Known issues
 
-- OpenVINO(==2022.3) IR inference is not working well on 2-stage models (e.g. Mask-RCNN) exported from torch==1.13.1
+- OpenVINO(==2023.0) IR inference is not working well on 2-stage models (e.g. Mask-RCNN) exported from torch==1.13.1
 
 ## \[v1.3.1\]
 

diff --git a/docs/source/guide/explanation/additional_features/models_optimization.rst b/docs/source/guide/explanation/additional_features/models_optimization.rst
@@ -4,14 +4,14 @@ Models Optimization
 OpenVINO™ Training Extensions provides two types of optimization algorithms: `Post-training Optimization Tool (POT) <https://docs.openvino.ai/latest/pot_introduction.html#doxid-pot-introduction>`_ and `Neural Network Compression Framework (NNCF) <https://github.com/openvinotoolkit/nncf>`_.
 
 *******************************
-Post-training Optimization Tool 
+Post-training Optimization Tool
 *******************************
 
 POT is designed to optimize the inference of models by applying post-training methods that do not require model retraining or fine-tuning. If you want to know more details about how POT works and to be more familiar with model optimization methods, please refer to `documentation <https://docs.openvino.ai/latest/pot_introduction.html#doxid-pot-introduction>`_.
 
 To run Post-training optimization it is required to convert the model to OpenVINO™ intermediate representation (IR) first. To perform fast and accurate quantization we use ``DefaultQuantization Algorithm`` for each task. Please, see the `DefaultQuantization Parameters <https://docs.openvino.ai/latest/pot_compression_algorithms_quantization_default_README.html#doxid-pot-compression-algorithms-quantization-default-r-e-a-d-m-e>`_ for further information about configuring the optimization.
 
-POT parameters can be found and configured in ``template.yaml`` and ``configuration.yaml`` for each task. For Anomaly and Semantic Segmentation tasks, we have separate configuration files for POT, that can be found in the same directory with ``template.yaml``, for example for `PaDiM <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/algorithms/anomaly/configs/classification/padim/pot_optimization_config.json>`_, `OCR-Lite-HRNe-18-mod2 <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/algorithms/segmentation/configs/ocr_lite_hrnet_18_mod2/pot_optimization_config.json>`_ model.
+POT parameters can be found and configured in ``template.yaml`` and ``configuration.yaml`` for each task. For Anomaly and Semantic Segmentation tasks, we have separate configuration files for POT, that can be found in the same directory with ``template.yaml``, for example for `PaDiM <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/algorithms/anomaly/configs/classification/padim/ptq_optimization_config.py>`_, `OCR-Lite-HRNe-18-mod2 <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/algorithms/segmentation/configs/ocr_lite_hrnet_18_mod2/ptq_optimization_config.py>`_ model.
 
 ************************************
 Neural Network Compression Framework
@@ -23,8 +23,8 @@ The process of optimization is controlled by the NNCF configuration file. A JSON
 You can refer to configuration files for default templates for each task accordingly: `Classification <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/algorithms/classification/configs/efficientnet_b0_cls_incr/compression_config.json>`_, `Object Detection <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/algorithms/detection/configs/detection/mobilenetv2_atss/compression_config.json>`_, `Semantic segmentation <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/algorithms/segmentation/configs/ocr_lite_hrnet_18_mod2/compression_config.json>`_, `Instance segmentation <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/algorithms/detection/configs/instance_segmentation/efficientnetb2b_maskrcnn/compression_config.json>`_, `Anomaly classification <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/algorithms/anomaly/configs/classification/padim/compression_config.json>`_, `Anomaly Detection <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/algorithms/anomaly/configs/detection/padim/compression_config.json>`_, `Anomaly segmentation <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/algorithms/anomaly/configs/segmentation/padim/compression_config.json>`_. Configs for other templates can be found in the same directory.
 
 
-NNCF tends to provide better quality in terms of preserving accuracy as it uses training compression approaches. 
-Compression results achievable with the NNCF can be found `here <https://github.com/openvinotoolkit/nncf#nncf-compressed-model-zoo>`_ . 
+NNCF tends to provide better quality in terms of preserving accuracy as it uses training compression approaches.
+Compression results achievable with the NNCF can be found `here <https://github.com/openvinotoolkit/nncf#nncf-compressed-model-zoo>`_ .
 Meanwhile, the POT is faster but can degrade accuracy more than the training-enabled approach.
 
 .. note::

diff --git a/docs/source/guide/explanation/algorithms/segmentation/instance_segmentation.rst b/docs/source/guide/explanation/algorithms/segmentation/instance_segmentation.rst
@@ -58,15 +58,21 @@ Models
 
 We support the following ready-to-use model templates:
 
-+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+---------------------+-----------------+
-| Template ID                                                                                                                                                                                                                                    | Name                       | Complexity (GFLOPs) | Model size (MB) |
-+================================================================================================================================================================================================================================================+============================+=====================+=================+
-| `Custom_Counting_Instance_Segmentation_MaskRCNN_EfficientNetB2B <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/algorithms/detection/configs/instance_segmentation/efficientnetb2b_maskrcnn/template.yaml>`_      | MaskRCNN-EfficientNetB2B   | 68.48               | 13.27           |
-+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+---------------------+-----------------+
-| `Custom_Counting_Instance_Segmentation_MaskRCNN_ResNet50 <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/algorithms/detection/configs/instance_segmentation/resnet50_maskrcnn/template.yaml>`_                    | MaskRCNN-ResNet50          | 533.80              | 177.90          |
-+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+---------------------+-----------------+
++--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+---------------------+-----------------+
+| Template ID                                                                                                                                                                                                                                | Name                       | Complexity (GFLOPs) | Model size (MB) |
++============================================================================================================================================================================================================================================+============================+=====================+=================+
+| `Custom_Counting_Instance_Segmentation_MaskRCNN_EfficientNetB2B <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/algorithms/detection/configs/instance_segmentation/efficientnetb2b_maskrcnn/template.yaml>`_      | MaskRCNN-EfficientNetB2B   | 68.48           | 13.27           |
++--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+---------------------+-----------------+
+| `Custom_Counting_Instance_Segmentation_MaskRCNN_ResNet50 <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/algorithms/detection/configs/instance_segmentation/resnet50_maskrcnn/template.yaml>`_                    | MaskRCNN-ResNet50          | 533.80          | 177.90          |
++--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+---------------------+-----------------+
+| `Custom_Counting_Instance_Segmentation_MaskRCNN_ConvNeXt <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/algorithms/detection/configs/instance_segmentation/convnext_maskrcnn/template.yaml>`_                    | MaskRCNN-ConvNeXt          | 266.78          | 192.4          |
++--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+---------------------+-----------------+
 
-``MaskRCNN-ResNet50`` uses `ResNet-50 <https://arxiv.org/abs/1512.03385>`_ as the backbone network for the image features extraction. It has more parameters and FLOPs and needs more time to train, meanwhile providing superior performance in terms of accuracy. ``MaskRCNN-EfficientNetB2B`` uses `EfficientNet-B2 <https://arxiv.org/abs/1905.11946>`_ as the backbone network. It is a good trade-off between accuracy and speed. It is a better choice when training time and computational cost are in priority.
+MaskRCNN-ResNet50 utilizes the `ResNet-50 <https://arxiv.org/abs/1512.03385>`_ architecture as the backbone network for extracting image features. This choice of backbone network results in a higher number of parameters and FLOPs, which consequently requires more training time. However, the model offers superior performance in terms of accuracy.
+
+On the other hand, MaskRCNN-EfficientNetB2B employs the `EfficientNet-B2 <https://arxiv.org/abs/1905.11946>`_ architecture as the backbone network. This selection strikes a balance between accuracy and speed, making it a preferable option when prioritizing training time and computational cost.
+
+Recently, we have made updates to MaskRCNN-ConvNeXt, incorporating the `ConvNeXt backbone <https://arxiv.org/abs/2201.03545>`_. Through our experiments, we have observed that this variant achieves better accuracy compared to MaskRCNN-ResNet50 while utilizing less GPU memory. However, it is important to note that the training time and inference duration may slightly increase. If minimizing training time is a significant concern, we recommend considering a switch to MaskRCNN-EfficientNetB2B.
 
 .. In the table below the `mAP <https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient>`_ metric on some academic datasets using our :ref:`supervised pipeline <instance_segmentation_supervised_pipeline>` is presented. The results were obtained on our templates without any changes. We use 1024x1024 image resolution, for other hyperparameters, please, refer to the related template. We trained each model with single Nvidia GeForce RTX3090.
 
@@ -77,6 +83,8 @@ We support the following ready-to-use model templates:
 .. +---------------------------+--------------+------------+-----------------+
 .. | MaskRCNN-ResNet50         | N/A          | N/A        | N/A             |
 .. +---------------------------+--------------+------------+-----------------+
+.. | MaskRCNN-ConvNeXt         | N/A          | N/A        | N/A             |
+.. +---------------------------+--------------+------------+-----------------+
 
 .. *******************
 .. Tiling Pipeline

diff --git a/docs/source/guide/tutorials/base/how_to_train/instance_segmentation.rst b/docs/source/guide/tutorials/base/how_to_train/instance_segmentation.rst
@@ -136,6 +136,7 @@ The list of supported templates for instance segmentation is available with the
   +-----------------------+----------------------------------------------------------------+--------------------------+---------------------------------------------------------------------------------------------------+
   | INSTANCE_SEGMENTATION |    Custom_Counting_Instance_Segmentation_MaskRCNN_ResNet50     |    MaskRCNN-ResNet50     |     src/otx/algorithms/detection/configs/instance_segmentation/resnet50_maskrcnn/template.yaml    |
   | INSTANCE_SEGMENTATION | Custom_Counting_Instance_Segmentation_MaskRCNN_EfficientNetB2B | MaskRCNN-EfficientNetB2B | src/otx/algorithms/detection/configs/instance_segmentation/efficientnetb2b_maskrcnn/template.yaml |
+  | INSTANCE_SEGMENTATION | Custom_Counting_Instance_Segmentation_MaskRCNN_ConvNeXt        | MaskRCNN-ConvNeXt        | src/otx/algorithms/detection/configs/instance_segmentation/convnext_maskrcnn/template.yaml        |
   +-----------------------+----------------------------------------------------------------+--------------------------+---------------------------------------------------------------------------------------------------+
 
 2. We need to create

diff --git a/requirements/base.txt b/requirements/base.txt
@@ -4,7 +4,7 @@ natsort>=6.0.0
 prettytable
 protobuf>=3.20.0
 pyyaml
-datumaro==1.3.2
+datumaro@ git+https://github.com/openvinotoolkit/datumaro@3e77b3138d063db68a4efba3c03a6bac7df086b1#egg=datumaro
 psutil
 scipy>=1.8
 bayesian-optimization>=1.2.0

diff --git a/requirements/openvino.txt b/requirements/openvino.txt
@@ -1,8 +1,8 @@
 # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
 # OpenVINO Requirements.                                                      #
-nncf==2.4.0
+nncf==2.5.0
 onnx==1.13.0
-openmodelzoo-modelapi==2022.3.0
-openvino==2022.3.0
-openvino-dev==2022.3.0
+openvino-model-api==0.1.2
+openvino==2023.0
+openvino-dev==2023.0
 openvino-telemetry>=2022.1.0
diff --git a/src/otx/algorithms/action/adapters/mmaction/task.py b/src/otx/algorithms/action/adapters/mmaction/task.py
@@ -467,6 +467,7 @@ def dummy_dump_saliency_hook(model, inp, out):
 
     def _export_model(self, precision: ModelPrecision, export_format: ExportType, dump_features: bool):
         """Main export function."""
+        self._data_cfg = None
         self._init_task(export=True)
 
         cfg = self.configure(False, "test", None)

diff --git a/src/otx/algorithms/action/adapters/openvino/dataloader.py b/src/otx/algorithms/action/adapters/openvino/dataloader.py
@@ -18,13 +18,12 @@
 from typing import Dict, List
 
 import numpy as np
-from compression.api import DataLoader
 
 from otx.api.entities.annotation import AnnotationSceneEntity
 from otx.api.entities.datasets import DatasetEntity, DatasetItemEntity
 
 
-def get_ovdataloader(dataset: DatasetEntity, task_type: str, clip_len: int, width: int, height: int) -> DataLoader:
+def get_ovdataloader(dataset: DatasetEntity, task_type: str, clip_len: int, width: int, height: int):
     """Find proper dataloader for dataset and task type.
 
     If dataset has only a single video, this returns DataLoader for online demo
@@ -49,7 +48,7 @@ def _is_multi_video(dataset: DatasetEntity) -> bool:
     return False
 
 
-class ActionOVDemoDataLoader(DataLoader):
+class ActionOVDemoDataLoader:
     """DataLoader for online demo purpose.
 
     Since it is for online demo purpose it selects background frames from neighbor of key frame
@@ -91,7 +90,7 @@ def add_prediction(self, data: List[DatasetItemEntity], prediction: AnnotationSc
             dataset_item.append_annotations(prediction.annotations)
 
 
-class ActionOVClsDataLoader(DataLoader):
+class ActionOVClsDataLoader:
     """DataLoader for evaluation of action classification models.
 
     It iterates through clustered video, and it samples frames from given video
@@ -151,7 +150,7 @@ def add_prediction(self, dataset: DatasetEntity, data: List[DatasetItemEntity],
                 dataset_item.append_labels(prediction.annotations[0].get_labels())
 
 
-class ActionOVDetDataLoader(DataLoader):
+class ActionOVDetDataLoader:
     """DataLoader for evaluation of spatio-temporal action detection models.
 
     It iterates through DatasetEntity, which only contains non-empty frame(frame with actor annotation)

diff --git a/src/otx/algorithms/action/adapters/openvino/model_wrappers/openvino_models.py b/src/otx/algorithms/action/adapters/openvino/model_wrappers/openvino_models.py
@@ -19,22 +19,16 @@
 from typing import Any, Dict, List
 
 import numpy as np
+from openvino.model_api.adapters import OpenvinoAdapter
+from openvino.model_api.models.model import Model
+from openvino.model_api.models.utils import (
+    RESIZE_TYPES,
+    Detection,
+    InputTransform,
+)
 
 from otx.api.entities.datasets import DatasetItemEntity
 
-try:
-    from openvino.model_zoo.model_api.adapters import OpenvinoAdapter
-    from openvino.model_zoo.model_api.models.model import Model
-    from openvino.model_zoo.model_api.models.utils import (
-        RESIZE_TYPES,
-        Detection,
-        InputTransform,
-    )
-except ImportError as e:
-    import warnings
-
-    warnings.warn(f"{e}, ModelAPI was not found.")
-
 
 def softmax_numpy(x: np.ndarray):
     """Softmax numpy."""