Skip to content

Commit

Permalink
GPU-Suport: Mask-RCNN + Minor GPU fixes (#2714)
Browse files Browse the repository at this point in the history
* fixed cpu mask rcnn+preparation for gpu
* fix-limit gpu memory to 30% of total memory per worker

Co-authored-by: Nikita Manovich <[email protected]>
  • Loading branch information
jahaniam and Nikita Manovich authored Feb 16, 2021
1 parent daedff4 commit 59c3b28
Show file tree
Hide file tree
Showing 9 changed files with 97 additions and 59 deletions.
5 changes: 3 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Added

- CVAT-3D: support lidar data on the server side (<https://github.com/openvinotoolkit/cvat/pull/2534>)
- GPU support for Mask-RCNN and improvement in its deployment time (<https://github.com/openvinotoolkit/cvat/pull/2714>)
- CVAT-3D: Load all frames corresponding to the job instance
(<https://github.com/openvinotoolkit/cvat/pull/2645>)
- Intelligent scissors with OpenCV javascript (<https://github.com/openvinotoolkit/cvat/pull/2689>)
Expand All @@ -23,7 +24,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

- Updated HTTPS install README section (cleanup and described more robust deploy)
- Logstash is improved for using with configurable elasticsearch outputs (<https://github.com/openvinotoolkit/cvat/pull/2531>)
- Bumped nuclio version to 1.5.16
- Bumped nuclio version to 1.5.16 (<https://github.com/openvinotoolkit/cvat/pull/2578>)
- All methods for interative segmentation accept negative points as well
- Persistent queue added to logstash (<https://github.com/openvinotoolkit/cvat/pull/2744>)

Expand All @@ -36,7 +37,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
-

### Fixed

- More robust execution of nuclio GPU functions by limiting the GPU memory consumption per worker (<https://github.com/openvinotoolkit/cvat/pull/2714>)
- Kibana startup initialization (<https://github.com/openvinotoolkit/cvat/pull/2659>)
- The cursor jumps to the end of the line when renaming a task (<https://github.com/openvinotoolkit/cvat/pull/2669>)
- SSLCertVerificationError when remote source is used (<https://github.com/openvinotoolkit/cvat/pull/2683>)
Expand Down
4 changes: 2 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,10 +122,10 @@ You develop CVAT under WSL (Windows subsystem for Linux) following next steps.

### DL models as serverless functions

Install [nuclio platform](https://github.com/nuclio/nuclio):
Follow this [guide](/cvat/apps/documentation/installation_automatic_annotation.md) to install Nuclio:

- You have to install `nuctl` command line tool to build and deploy serverless
functions. Download [the latest release](https://github.com/nuclio/nuclio/blob/development/docs/reference/nuctl/nuctl.md#download).
functions.
- The simplest way to explore Nuclio is to run its graphical user interface (GUI)
of the Nuclio dashboard. All you need in order to run the dashboard is Docker. See
[nuclio documentation](https://github.com/nuclio/nuclio#quick-start-steps)
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ For more information about supported formats look at the
| [f-BRS](/serverless/pytorch/saic-vul/fbrs/nuclio) | interactor | PyTorch | X | |
| [Inside-Outside Guidance](/serverless/pytorch/shiyinzhang/iog/nuclio) | interactor | PyTorch | X | |
| [Faster RCNN](/serverless/tensorflow/faster_rcnn_inception_v2_coco/nuclio) | detector | TensorFlow | X | X |
| [Mask RCNN](/serverless/tensorflow/matterport/mask_rcnn/nuclio) | detector | TensorFlow | X | |
| [Mask RCNN](/serverless/tensorflow/matterport/mask_rcnn/nuclio) | detector | TensorFlow | X | X |

<!--lint enable maximum-line-length-->

Expand Down
2 changes: 1 addition & 1 deletion cvat/apps/documentation/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -290,7 +290,7 @@ docker-compose -f docker-compose.yml \

### Semi-automatic and automatic annotation

Please follow [instructions](/cvat/apps/documentation/installation_automatic_annotation.md)
Please follow this [guide](/cvat/apps/documentation/installation_automatic_annotation.md).

### Stop all containers

Expand Down
73 changes: 53 additions & 20 deletions cvat/apps/documentation/installation_automatic_annotation.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,47 +53,80 @@
- See [deploy_cpu.sh](/serverless/deploy_cpu.sh) for more examples.

#### GPU Support

You will need to install Nvidia Container Toolkit and make sure your docker supports GPU. Follow [Nvidia docker instructions](https://www.tensorflow.org/install/docker#gpu_support).
Also you will need to add `--resource-limit nvidia.com/gpu=1` to the nuclio deployment command.
You will need to install [Nvidia Container Toolkit](https://www.tensorflow.org/install/docker#gpu_support).
Also you will need to add `--resource-limit nvidia.com/gpu=1 --triggers '{"myHttpTrigger": {"maxWorkers": 1}}'` to
the nuclio deployment command. You can increase the maxWorker if you have enough GPU memory.
As an example, below will run on the GPU:

```bash
nuctl deploy tf-faster-rcnn-inception-v2-coco-gpu \
--project-name cvat --path "serverless/tensorflow/faster_rcnn_inception_v2_coco/nuclio" --platform local \
--base-image tensorflow/tensorflow:2.1.1-gpu \
--desc "Faster RCNN from Tensorflow Object Detection GPU API" \
--image cvat/tf.faster_rcnn_inception_v2_coco_gpu \
nuctl deploy --project-name cvat \
--path `pwd`/tensorflow/matterport/mask_rcnn/nuclio \
--platform local --base-image tensorflow/tensorflow:1.15.5-gpu-py3 \
--desc "GPU based implementation of Mask RCNN on Python 3, Keras, and TensorFlow." \
--image cvat/tf.matterport.mask_rcnn_gpu
--triggers '{"myHttpTrigger": {"maxWorkers": 1}}' \
--resource-limit nvidia.com/gpu=1
```

**Note:**

- Since the model is loaded during deployment, the number of GPU functions you can deploy will be limited to your GPU memory.

- The number of GPU deployed functions will be limited to your GPU memory.
- See [deploy_gpu.sh](/serverless/deploy_gpu.sh) script for more examples.

####Debugging Nuclio Functions:
**Troubleshooting Nuclio Functions:**

- You can open nuclio dashboard at [localhost:8070](http://localhost:8070). Make sure status of your functions are up and running without any error.
- Test your deployed DL model as a serverless function. The command below should work on Linux and Mac OS.

```bash
image=$(curl https://upload.wikimedia.org/wikipedia/en/7/7d/Lenna_%28test_image%29.png --output - | base64 | tr -d '\n')
cat << EOF > /tmp/input.json
{"image": "$image"}
EOF
cat /tmp/input.json | nuctl invoke openvino.omz.public.yolo-v3-tf -c 'application/json'
```
- To check for internal server errors, run `docker ps -a` to see the list of containers. Find the container that you are interested, e.g. `nuclio-nuclio-tf-faster-rcnn-inception-v2-coco-gpu`. Then check its logs by
<details>
```bash
docker logs <name of your container>
20.07.17 12:07:44.519 nuctl.platform.invoker (I) Executing function {"method": "POST", "url": "http://:57308", "headers": {"Content-Type":["application/json"],"X-Nuclio-Log-Level":["info"],"X-Nuclio-Target":["openvino.omz.public.yolo-v3-tf"]}}
20.07.17 12:07:45.275 nuctl.platform.invoker (I) Got response {"status": "200 OK"}
20.07.17 12:07:45.275 nuctl (I) >>> Start of function logs
20.07.17 12:07:45.275 ino.omz.public.yolo-v3-tf (I) Run yolo-v3-tf model {"worker_id": "0", "time": 1594976864570.9353}
20.07.17 12:07:45.275 nuctl (I) <<< End of function logs
> Response headers:
Date = Fri, 17 Jul 2020 09:07:45 GMT
Content-Type = application/json
Content-Length = 100
Server = nuclio
> Response body:
[
{
"confidence": "0.9992254",
"label": "person",
"points": [
39,
124,
408,
512
],
"type": "rectangle"
}
]
```
</details>
- To check for internal server errors, run `docker ps -a` to see the list of containers.
Find the container that you are interested, e.g., `nuclio-nuclio-tf-faster-rcnn-inception-v2-coco-gpu`.
Then check its logs by `docker logs <name of your container>`
e.g.,

```bash
docker logs nuclio-nuclio-tf-faster-rcnn-inception-v2-coco-gpu
```
- If you would like to debug a code inside a container, you can use vscode to directly attach to a container [instructions](https://code.visualstudio.com/docs/remote/attach-container). To apply your changes, make sure to restart the container.

- To debug a code inside a container, you can use vscode to attach to a container [instructions](https://code.visualstudio.com/docs/remote/attach-container).
To apply your changes, make sure to restart the container.
```bash
docker restart <name_of_the_container>
```

> **⚠ WARNING:**
> Do not use nuclio dashboard to stop the container because with any modifications, it rebuilds the container and you will lose your changes.
14 changes: 12 additions & 2 deletions serverless/deploy_gpu.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,18 @@ nuctl create project cvat
nuctl deploy --project-name cvat \
--path "$SCRIPT_DIR/tensorflow/faster_rcnn_inception_v2_coco/nuclio" \
--platform local --base-image tensorflow/tensorflow:2.1.1-gpu \
--desc "Faster RCNN from Tensorflow Object Detection GPU API" \
--desc "GPU based Faster RCNN from Tensorflow Object Detection API" \
--image cvat/tf.faster_rcnn_inception_v2_coco_gpu \
--resource-limit nvidia.com/gpu=1
--triggers '{"myHttpTrigger": {"maxWorkers": 1}}' \
--resource-limit nvidia.com/gpu=1 --verbose

nuctl deploy --project-name cvat \
--path "$SCRIPT_DIR/tensorflow/matterport/mask_rcnn/nuclio" \
--platform local --base-image tensorflow/tensorflow:1.15.5-gpu-py3 \
--desc "GPU based implementation of Mask RCNN on Python 3, Keras, and TensorFlow." \
--image cvat/tf.matterport.mask_rcnn_gpu\
--triggers '{"myHttpTrigger": {"maxWorkers": 1}}' \
--resource-limit nvidia.com/gpu=1 --verbose


nuctl get function
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,10 @@ def __init__(self, model_path):
serialized_graph = fid.read()
od_graph_def.ParseFromString(serialized_graph)
tf.import_graph_def(od_graph_def, name='')

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
gpu_fraction = 0.333
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=gpu_fraction,
allow_growth=True)
config = tf.ConfigProto(gpu_options=gpu_options)
self.session = tf.Session(graph=detection_graph, config=config)

self.image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
Expand Down
11 changes: 4 additions & 7 deletions serverless/tensorflow/matterport/mask_rcnn/nuclio/function.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -102,22 +102,19 @@ spec:
value: /opt/nuclio/Mask_RCNN
build:
image: cvat/tf.matterport.mask_rcnn
baseImage: tensorflow/tensorflow:2.1.0-py3
baseImage: tensorflow/tensorflow:1.13.1-py3
directives:
postCopy:
- kind: WORKDIR
value: /opt/nuclio
- kind: RUN
value: apt update && apt install --no-install-recommends -y git curl libsm6 libxext6 libgl1-mesa-glx
value: apt update && apt install --no-install-recommends -y git curl
- kind: RUN
value: git clone https://github.com/matterport/Mask_RCNN.git
value: git clone --depth 1 https://github.com/matterport/Mask_RCNN.git
- kind: RUN
value: curl -L https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5 -o Mask_RCNN/mask_rcnn_coco.h5
- kind: RUN
value: pip3 install scipy cython matplotlib scikit-image opencv-python-headless h5py \
imgaug IPython[all] tensorflow==1.13.1 keras==2.1.0 pillow pyyaml
- kind: RUN
value: pip3 install pycocotools
value: pip3 install numpy cython pyyaml keras==2.1.0 scikit-image Pillow

triggers:
myHttpTrigger:
Expand Down
38 changes: 17 additions & 21 deletions serverless/tensorflow/matterport/mask_rcnn/nuclio/model_loader.py
Original file line number Diff line number Diff line change
@@ -1,42 +1,40 @@
# Copyright (C) 2018-2020 Intel Corporation
# Copyright (C) 2020-2021 Intel Corporation
#
# SPDX-License-Identifier: MIT

import os
import numpy as np
import sys
from skimage.measure import find_contours, approximate_polygon

# workaround for tf.placeholder() is not compatible with eager execution
# https://github.com/tensorflow/tensorflow/issues/18165
import tensorflow as tf
tf.compat.v1.disable_eager_execution()
#import tensorflow.compat.v1 as tf
# tf.disable_v2_behavior()

# The directory should contain a clone of
# https://github.com/matterport/Mask_RCNN repository and
# downloaded mask_rcnn_coco.h5 model.
MASK_RCNN_DIR = os.path.abspath(os.environ.get('MASK_RCNN_DIR'))
if MASK_RCNN_DIR:
sys.path.append(MASK_RCNN_DIR) # To find local version of the library
sys.path.append(os.path.join(MASK_RCNN_DIR, 'samples/coco'))

from mrcnn import model as modellib
import coco
from mrcnn.config import Config


class ModelLoader:
def __init__(self, labels):
COCO_MODEL_PATH = os.path.join(MASK_RCNN_DIR, "mask_rcnn_coco.h5")
if COCO_MODEL_PATH is None:
raise OSError('Model path env not found in the system.')

class InferenceConfig(coco.CocoConfig):
# Set batch size to 1 since we'll be running inference on
# one image at a time. Batch size = GPU_COUNT * IMAGES_PER_GPU
class InferenceConfig(Config):
NAME = "coco"
NUM_CLASSES = 1 + 80 # COCO has 80 classes
GPU_COUNT = 1
IMAGES_PER_GPU = 1

# Limit gpu memory to 30% to allow for other nuclio gpu functions. Increase fraction as you like
import keras.backend.tensorflow_backend as ktf
def get_session(gpu_fraction=0.333):
gpu_options = tf.GPUOptions(
per_process_gpu_memory_fraction=gpu_fraction,
allow_growth=True)
return tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))

ktf.set_session(get_session())
# Print config details
self.config = InferenceConfig()
self.config.display()
Expand All @@ -54,7 +52,7 @@ def infer(self, image, threshold):
for i in range(len(output["rois"])):
score = output["scores"][i]
class_id = output["class_ids"][i]
mask = output["masks"][:,:,i]
mask = output["masks"][:, :, i]
if score >= threshold:
mask = mask.astype(np.uint8)
contours = find_contours(mask, MASK_THRESHOLD)
Expand All @@ -74,6 +72,4 @@ def infer(self, image, threshold):
"type": "polygon",
})

return results


return results

0 comments on commit 59c3b28

Please sign in to comment.