Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] mmengine - WARNING - Failed to search registry with scope "mmdet" in the "Codebases" registry tree #2334

Closed
3 tasks done
cozeybozey opened this issue Aug 7, 2023 · 25 comments
Assignees

Comments

@cozeybozey
Copy link

Checklist

  • I have searched related issues but cannot get the expected help.
  • 2. I have read the FAQ documentation but cannot get the expected help.
  • 3. The bug has not been fixed in the latest version.

Describe the bug

I am trying to fix the bug in RTMDET that causes the batch inference to not work for instance segmentation on a deployed onnx model. I already opened an issue about this bug: #2075.

To fix this issue I am trying to change the code in the rtmdet_ins_head.py file. But all the code I put in there does not get executed when I run deploy.py. I expect this is because the registries are not working. So my question is how I can fix these registries warnings.

Reproduction

python tools/deploy.py .../mmdeploy/configs/mmdet/instance-seg/instance-seg_onnxruntime_dynamic.py .../mmdetection/configs/rtmdet/rtmdet-ins_s_8xb32-300e_coco.py .../mmdetection/checkpoints/rtmdet-ins_s_8xb32-300e_coco_20221121_212604-fdc5d7ec.pth .../mmdetection/demo/demo.jpg

Environment

08/07 11:57:51 - mmengine - INFO - 

08/07 11:57:51 - mmengine - INFO - **********Environmental information**********
08/07 11:57:52 - mmengine - INFO - sys.platform: linux
08/07 11:57:52 - mmengine - INFO - Python: 3.8.16 (default, Mar  2 2023, 03:21:46) [GCC 11.2.0]
08/07 11:57:52 - mmengine - INFO - CUDA available: True
08/07 11:57:52 - mmengine - INFO - numpy_random_seed: 2147483648
08/07 11:57:52 - mmengine - INFO - GPU 0,1: NVIDIA GeForce GTX 1080
08/07 11:57:52 - mmengine - INFO - CUDA_HOME: /usr
08/07 11:57:52 - mmengine - INFO - NVCC: Cuda compilation tools, release 11.8, V11.8.89
08/07 11:57:52 - mmengine - INFO - GCC: gcc (Debian 12.2.0-14) 12.2.0
08/07 11:57:52 - mmengine - INFO - PyTorch: 2.0.0
08/07 11:57:52 - mmengine - INFO - PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2023.1-Product Build 20230303 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.8
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.7
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.8, CUDNN_VERSION=8.7.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

08/07 11:57:52 - mmengine - INFO - TorchVision: 0.15.0
08/07 11:57:52 - mmengine - INFO - OpenCV: 4.7.0
08/07 11:57:52 - mmengine - INFO - MMEngine: 0.7.3
08/07 11:57:52 - mmengine - INFO - MMCV: 2.0.0
08/07 11:57:52 - mmengine - INFO - MMCV Compiler: GCC 9.3
08/07 11:57:52 - mmengine - INFO - MMCV CUDA Compiler: 11.8
08/07 11:57:52 - mmengine - INFO - MMDeploy: 1.0.0+7f3e816
08/07 11:57:52 - mmengine - INFO - 

08/07 11:57:52 - mmengine - INFO - **********Backend information**********
08/07 11:57:52 - mmengine - INFO - tensorrt:	None
08/07 11:57:52 - mmengine - INFO - ONNXRuntime:	None
08/07 11:57:52 - mmengine - INFO - ONNXRuntime-gpu:	1.8.1
08/07 11:57:52 - mmengine - INFO - ONNXRuntime custom ops:	Available
08/07 11:57:52 - mmengine - INFO - pplnn:	None
08/07 11:57:52 - mmengine - INFO - ncnn:	None
08/07 11:57:52 - mmengine - INFO - snpe:	None
08/07 11:57:52 - mmengine - INFO - openvino:	None
08/07 11:57:52 - mmengine - INFO - torchscript:	2.0.0
08/07 11:57:52 - mmengine - INFO - torchscript custom ops:	NotAvailable
08/07 11:57:52 - mmengine - INFO - rknn-toolkit:	None
08/07 11:57:52 - mmengine - INFO - rknn-toolkit2:	None
08/07 11:57:52 - mmengine - INFO - ascend:	None
08/07 11:57:52 - mmengine - INFO - coreml:	None
08/07 11:57:52 - mmengine - INFO - tvm:	None
08/07 11:57:52 - mmengine - INFO - vacc:	None
08/07 11:57:52 - mmengine - INFO - 

08/07 11:57:52 - mmengine - INFO - **********Codebase information**********
08/07 11:57:52 - mmengine - INFO - mmdet:	3.0.0
08/07 11:57:52 - mmengine - INFO - mmseg:	None
08/07 11:57:52 - mmengine - INFO - mmcls:	None
08/07 11:57:52 - mmengine - INFO - mmocr:	None
08/07 11:57:52 - mmengine - INFO - mmedit:	None
08/07 11:57:52 - mmengine - INFO - mmdet3d:	None
08/07 11:57:52 - mmengine - INFO - mmpose:	None
08/07 11:57:52 - mmengine - INFO - mmrotate:	None
08/07 11:57:52 - mmengine - INFO - mmaction:	None
08/07 11:57:52 - mmengine - INFO - mmrazor:	None

Error traceback

08/07 11:47:58 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
08/07 11:47:58 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "mmdet_tasks" registry tree. As a workaround, the current "mmdet_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
@RunningLeon
Copy link
Collaborator

hi, there's something wrong with rtmdet-inst +onnxruntime. Will debug and fix later.

@RunningLeon RunningLeon self-assigned this Aug 8, 2023
@RunningLeon RunningLeon added bug Something isn't working onnxruntime mmdet labels Aug 8, 2023
@cozeybozey
Copy link
Author

Great, thanks!

@RunningLeon
Copy link
Collaborator

RunningLeon commented Aug 9, 2023

hi, the registry warnings could be safely ignored as the modules are in mmdet and other codebases, and the default scope is mmdeploy.
BTW, You can check #2328 and if you have the same issue. For batch inference, I'll double check.

@cozeybozey
Copy link
Author

I thought the warnings might be a problem, because changing the code in rtmdet_ins_head.py does not seem to do anything. Do you have any idea how that is possible if it is not because of the registry warnings? Also I think the problem mentioned in this issue #2328 is not a problem I am having, because I do get correct masks, but only if I use a batch size of 1. When I use a batch size larger than one, session.run just leads to a crash, so I don't even get any output masks.

@RunningLeon
Copy link
Collaborator

@cozeybozey hi could you try this pr #2343? batch infer should be fixed for ort.

@cozeybozey
Copy link
Author

Thanks for the pr! But I still get the same error with an RTMDet export from your forked repository unfortunately. Although I am not sure, should I do an entire fresh install in a new conda environment for the pr to work?

@RunningLeon
Copy link
Collaborator

hi, you have to rebuild mmdeploy and rerun deploy.py

@cozeybozey
Copy link
Author

I see, I am relatively new to the building of packages and such. I am following this guide since I am using Linux: https://github.com/open-mmlab/mmdeploy/blob/main/docs/en/01-how-to-build/linux-x86_64.md

But it seems this guide is intended for Ubuntu users. I on the other hand am using Debian. Does that mean I am out of luck? For now I am specifically stuck on this command:

# Add repository if ubuntu < 18.04
sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt-get update
sudo apt-get install gcc-7
sudo apt-get install g++-7

The first line seems to be for Ubuntu users only, but when I run the last 3 lines I get this:

Reading package lists... Done
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package gcc-7
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Package g++-7 is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source

@RunningLeon
Copy link
Collaborator

Hi, you could just use your old env and run this script to build after git pull the code

          bash .github/scripts/linux/build.sh "cuda" "ort" \
              -Dpplcv_DIR=${pplcv_DIR} \
              -DONNXRUNTIME_DIR=${ONNXRUNTIME_DIR}

@cozeybozey
Copy link
Author

Ah that makes building a lot easier, thanks!
However if I run that I get the following error:

-- CMAKE_INSTALL_PREFIX: /user/mmdeploy/build/install
-- The CXX compiler identification is unknown
CMake Error at CMakeLists.txt:12 (project):
  The CMAKE_CXX_COMPILER:

    g++-7

  is not a full path and was not found in the PATH.

  Tell CMake where to find the compiler by setting either the environment
  variable "CXX" or the CMake cache entry CMAKE_CXX_COMPILER to the full path
  to the compiler, or to the compiler name if it is in the PATH.


-- Configuring incomplete, errors occurred!
See also "/user/mmdeploy/build/CMakeFiles/CMakeOutput.log".
See also "/user/mmdeploy/build/CMakeFiles/CMakeError.log".
make: *** No targets specified and no makefile found.  Stop.

So maybe me not being able to install g++-7 like I sent in the comment before is still the problem.

@RunningLeon
Copy link
Collaborator

you can run g++ --version and if it's >= 7, then you can use the default g++ instead of g++-7

@cozeybozey
Copy link
Author

When I run g++ --version I get this:

g++ (Debian 12.2.0-14) 12.2.0

So I changed g++-7 to g++ in CMakeCache.txt, but then I get the following error:

 /usr/include/crt/host_config.h:132:2: error: #error -- unsupported GNU
  version! gcc versions later than 11 are not supported! The nvcc flag
  '-allow-unsupported-compiler' can be used to override this version check;
  however, using an unsupported host compiler may cause compilation failure
  or incorrect run time execution.  Use at your own risk.

@cozeybozey
Copy link
Author

I fixed the above problem, so now it uses the g++-11 compiler (g++-12 would not work either because it does not work with my cuda version). However after doing that I still had issues with it not recognizing OpenCV, ONNX Runtime, pplcv and the third party folders that seem contain a link according to this github page: https://github.com/RunningLeon/mmdeploy/tree/fix_rtmdet_inst/third_party

I now fixed the OpenCV and ONNX Runtime issues (by installing them both), but after building pplcv I still get errors with that and I haven't tried installing the third party packages yet. But I was wondering, is it normal that I have to do all this? Because you said that if I already had a working environment, then all I have to do is run:

          bash .github/scripts/linux/build.sh "cuda" "ort" \
              -Dpplcv_DIR=${pplcv_DIR} \
              -DONNXRUNTIME_DIR=${ONNXRUNTIME_DIR}

But now it seems I am rebuilding and installing the entire environment from scratch, so I think I might be doing something wrong.

@RunningLeon
Copy link
Collaborator

hi, How did you install mmdeploy before? If you have successfully used mmdeploy in your local machine, then you are supposed to have a runnable env and all you need is to git pull the new code and rebuild the mmdeploy.

@cozeybozey
Copy link
Author

I followed this guide: https://github.com/open-mmlab/mmdeploy/blob/main/docs/en/get_started.md
As far as I can see following this guide should take care of the ONNX Runtime, but where would opencv and pplcv and the third party stuff be installed? I also only really care about the conversion and not about the inference stuff necessarily, so maybe opencv and pplcv are redundant, but the building does seem to require them.

@RunningLeon
Copy link
Collaborator

hi, if you only need to do torch2onnx without inference, you can ignore building part and just run pip install -e . after pulling the repo. Then, use tools/torch2onnx.py instead of tools/deploy.py.

@cozeybozey
Copy link
Author

Thanks for all the help! I finally got it working and I can confirm that this PR indeed fixes the batch inference issue with RTMDet.

@cozeybozey
Copy link
Author

Do you know why there are multiple identity blocks in the beginning that don't seem to do anything?
image

@RunningLeon
Copy link
Collaborator

Don't worry about it. There are constant tensors.

@cozeybozey
Copy link
Author

Maybe I misunderstand, but if they are not connected to anything then they don't do anything right? Also I am having a bit of a memory problem, when I use a batch size of 12 or larger it crashes. I found this a bit strange, since I can run a batch size of 128 easily with yolov8 and I am using the smallest possible model for both instances. Finally I find the output a bit confusing, there is a dimension for number of detections, which seems to be dynamic. But it also seems capped at 99 and when I give it a bunch of black images it becomes equal to 5 * batch_size. I was wondering whether you can maybe give some insight into how the number of detections are produced.

@RunningLeon
Copy link
Collaborator

hi, for batch inference, you can refer to the following code:

# batch all
batched_dets = dets.unsqueeze(0).repeat(batch_size, 1, 1)
batch_template = torch.arange(
0, batch_size, dtype=batch_inds.dtype, device=batch_inds.device)
batched_dets = batched_dets.where(
(batch_inds == batch_template.unsqueeze(1)).unsqueeze(-1),
batched_dets.new_zeros(1))
batched_labels = cls_inds.unsqueeze(0).repeat(batch_size, 1)
batched_labels = batched_labels.where(
(batch_inds == batch_template.unsqueeze(1)),
batched_labels.new_ones(1) * -1)

@cozeybozey
Copy link
Author

Okay thanks, now I understand how the number of detections are produced. However, I still don't really understand why the model takes up so much memory. Have you experienced this issue as well?

@RunningLeon
Copy link
Collaborator

hi, we did not test the running mem on large batchsize.

@github-actions
Copy link

This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.

@github-actions github-actions bot added the Stale label Aug 29, 2023
@github-actions
Copy link

github-actions bot commented Sep 3, 2023

This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants