Skip to content
This repository has been archived by the owner on Nov 21, 2023. It is now read-only.

Method AffineChannel is not a registered operator #454

Closed
II-Matto opened this issue May 29, 2018 · 11 comments
Closed

Method AffineChannel is not a registered operator #454

II-Matto opened this issue May 29, 2018 · 11 comments

Comments

@II-Matto
Copy link

Expected results

I am trying to run Detectron in Windows. After successfully building caffe2 (with CUDA, cuDNN, OpenCV), COCOAPI and Detectron modules, I ran the tools/train_net.py, trying to train a network on Pascal VOC.

I have modified import_detectron_ops() in detectron/utils/c2.py to use my caffe2_detectron_ops_gpu.dll path.

I have added the following path with sys.path.insert(0, path) (in tools/train_net.py).

  • pytorch build directory
  • detectron root directory
  • COCOAPI PythonAPI directory

I have added the following path to my PATH variable.

  • cuDNN bin directory
  • pytorch build bin directory (pytorch/build/bin/Release), which contains caffe2_detectron_ops_gpu.dll
  • OpenCV bin directory

With all these prepared, I was hoping I could successfully train Faster R-CNN using Detectron.

Actual results

It seems to have succeeded in all import commands. But it then results in the following error:

...
  File "D:/repo/github/Detectron_facebookresearch\detectron\utils\train.py", line 53, in train_model
    model, weights_file, start_iter, checkpoints, output_dir = create_model()
  File "D:/repo/github/Detectron_facebookresearch\detectron\utils\train.py", line 132, in create_model
    model = model_builder.create(cfg.MODEL.TYPE, train=True)
  File "D:/repo/github/Detectron_facebookresearch\detectron\modeling\model_builder.py", line 124, in create
    return get_func(model_type_func)(model)
  File "D:/repo/github/Detectron_facebookresearch\detectron\modeling\model_builder.py", line 89, in generalized_rcnn
    freeze_conv_body=cfg.TRAIN.FREEZE_CONV_BODY
  File "D:/repo/github/Detectron_facebookresearch\detectron\modeling\model_builder.py", line 229, in build_generic_detection_model
    optim.build_data_parallel_model(model, _single_gpu_build_func)
  File "D:/repo/github/Detectron_facebookresearch\detectron\modeling\optimizer.py", line 40, in build_data_parallel_model
    all_loss_gradients = _build_forward_graph(model, single_gpu_build_func)
  File "D:/repo/github/Detectron_facebookresearch\detectron\modeling\optimizer.py", line 63, in _build_forward_graph
    all_loss_gradients.update(single_gpu_build_func(model))
  File "D:/repo/github/Detectron_facebookresearch\detectron\modeling\model_builder.py", line 169, in _single_gpu_build_func
    blob_conv, dim_conv, spatial_scale_conv = add_conv_body_func(model)
  File "D:/repo/github/Detectron_facebookresearch\detectron\modeling\ResNet.py", line 36, in add_ResNet50_conv4_body
    return add_ResNet_convX_body(model, (3, 4, 6))
  File "D:/repo/github/Detectron_facebookresearch\detectron\modeling\ResNet.py", line 98, in add_ResNet_convX_body
    p, dim_in = globals()[cfg.RESNETS.STEM_FUNC](model, 'data')
  File "D:/repo/github/Detectron_facebookresearch\detectron\modeling\ResNet.py", line 252, in basic_bn_stem
    p = model.AffineChannel(p, 'res_conv1_bn', dim=dim, inplace=True)
  File "D:/repo/github/Detectron_facebookresearch\detectron\modeling\detector.py", line 103, in AffineChannel
    return self.net.AffineChannel([blob_in, scale, bias], blob_in)
  File "D:/repo/github/pytorch/build\caffe2\python\core.py", line 2067, in __getattr__
    ",".join(workspace.C.nearby_opnames(op_type)) + ']'
AttributeError: Method AffineChannel is not a registered operator. Did you mean: []

I have checked out issue #320, and ensured that there is c2_utils.import_detectron_ops(), which should have imported the Detectron operators.

Detailed steps to reproduce

  1. Build caffe with CUDA, cuDNN, OpenCV.
  2. Build COCOAPI modules.
  3. Build Detectron modules.
  4. Add all sorts of paths properly (as described above).
  5. Run tools/train_net.py with proper arguments.

System information

  • Operating system: Windows 10 Home Edition
  • Compiler version: VS 2015
  • CUDA version: 8.0
  • cuDNN version: cudnn-8.0-windows10-x64-v7
  • NVIDIA driver version: 382.05
  • GPU models (for all devices if they are not all the same): (GTX 1050)
  • PYTHONPATH environment variable: (sys.path.insert() is used instead as described above)
  • python --version output: Python 2.7.12 :: Anaconda custom (64-bit)
  • Anything else that seems relevant: OpenCV 3.2.0-vc14
@II-Matto
Copy link
Author

I used the dumpbin tool to examine my caffe2_detectron_ops_gpu.dll, which only has a size of ~5.5MB.

With the EXPORTS option, the results are as follows:

Microsoft (R) COFF/PE Dumper Version 14.00.24215.1
Copyright (C) Microsoft Corporation.  All rights reserved.


Dump of file D:\repo\github\pytorch\build\bin\Release\caffe2_detectron_ops_gpu.dll

File Type: DLL

  Section contains the following exports for caffe2_detectron_ops_gpu.dll

    00000000 characteristics
    5B0CBE13 time date stamp Tue May 29 10:42:27 2018
        0.00 version
           1 ordinal base
           1 number of functions
           1 number of names

    ordinal hint RVA      name

          1    0 003393E8 NvOptimusEnablementCuda

  Summary

       13000 .data
        1000 .gfids
        1000 .nvFatBi
      20C000 .nv_fatb
       23000 .pdata
       E6000 .rdata
        5000 .reloc
        1000 .rsrc
      252000 .text
        1000 .tls

With the SYMBOLS option, the results are as follows:

Microsoft (R) COFF/PE Dumper Version 14.00.24215.1
Copyright (C) Microsoft Corporation.  All rights reserved.


Dump of file D:\repo\github\pytorch\build\bin\Release\caffe2_detectron_ops_gpu.dll

File Type: DLL

  Summary

       13000 .data
        1000 .gfids
        1000 .nvFatBi
      20C000 .nv_fatb
       23000 .pdata
       E6000 .rdata
        5000 .reloc
        1000 .rsrc
      252000 .text
        1000 .tls

Does this mean the Detectron operators are actually not compiled? If so, what could possibly be the reason and how can I make them compile?

@ir413
Copy link
Contributor

ir413 commented May 29, 2018

Hi @II-Matto, we do not support building on Windows at this time (see the list of requirements here; see also #25 and #276). Sorry for the inconvenience.

@ir413 ir413 closed this as completed May 29, 2018
@II-Matto
Copy link
Author

@ir413 Thanks for your reply. I have managed to build Caffe2 and obtain the (probably incorrect) caffe2_detectron_ops_gpu.dll. Does this mean the compilation is OK?

What are the extra modifications that may be needed for building on Windows? If it is not that complex, I may give it a shot myself. Would you be so kind to give me some hints?

@ir413
Copy link
Contributor

ir413 commented May 29, 2018

I don't have any experience with using Caffe2 on Windows so I don't really have a very clear idea of what may be required. I suspect that if you can get Caffe2 to work on Windows then it should be possible to run Detectron as well but I may be missing something.

Just to double-check, you verified your Caffe2 installation by running Caffe2 tests (e.g. this one), right?

AttributeError: Method AffineChannel is not a registered operator. Did you mean: []

It seems that the problem has to do with loading detectron operators. Detectron ops are built as a Caffe2 module (located here) and loaded dynamically from python (by this function). I guess, the issue may have do with (1) building Detectron ops or (2) loading them (or both). For (1), I would look into this CMake file and check if Windows specific modifications are required. For (2), I would look into this function and make sure that it is handling Windows libraries correctly.

@II-Matto
Copy link
Author

@ir413 I have just tried relu_op_test (Sorry I only did the two simple import tests previously). It succeeds with the CPU-only caffe2 (removing caffe2_pybind11_state_gpu.pyd), but reports the following errors with GPU-enabled caffe2.

...
  File "D:/repo/github/pytorch/build/caffe2/python/operator_test/relu_op_test.py", line 24, in test_relu
    engine=st.sampled_from(["", "CUDNN"]),
  File "C:\Program Files\Anaconda2\lib\site-packages\hypothesis\core.py", line 524, in test
    result = self.test(*args, **kwargs)
  File "D:/repo/github/pytorch/build/caffe2/python/operator_test/relu_op_test.py", line 31, in test_relu
    self.assertDeviceChecks(dc, op, [X], [0])
  File "D:/repo/github/pytorch/build\caffe2\python\hypothesis_test_util.py", line 350, in assertDeviceChecks
    dc.CheckSimple(op, inputs, outputs_to_check, input_device_options)
  File "D:/repo/github/pytorch/build\caffe2\python\device_checker.py", line 49, in CheckSimple
    workspace.RunOperatorOnce(op)
  File "D:/repo/github/pytorch/build\caffe2\python\workspace.py", line 165, in RunOperatorOnce
    return C.run_operator_once(StringifyProto(operator))
RuntimeError: [enforce fail at context_gpu.h:156] . Encountered CUDA error: invalid device function Error from operator:
input: "X" output: "Y" name: "" type: "Relu" device_option { device_type: 1 } engine: ""
...

All test cases fail with the invalid device function Error. Could it possibly be that my GPU-enabled Caffe2 is built for the wrong architecture? The GPU I used is a GTX 1050, and I found its architecture is Pascal from the NVIDIA website. In the build_windows.bat script in Caffe2, it passes an option of -DCUDA_ARCH_NAME=Maxwell to cmake, which I did not notice before (Sorry for my bad). Now I am trying to change this option to Pascal and rebuild Caffe2.

I do not know if this may lead to failure of loading Detectron operators. I will let you know as soon as the building is completed.

@II-Matto
Copy link
Author

II-Matto commented May 29, 2018

@ir413 The relu_op_test now runs successfully for Caffe2 with both CPU and GPU. But the Detectron operators still fail to be registered.

FYI, the cmake command for my Caffe2 building is as follows. The CMAKE_BUILD_TYPE is Visual Studio 14 2015 Win64.

cmake .. ^
  -G%CMAKE_GENERATOR% ^
  -DBUILD_TEST=OFF ^
  -DCMAKE_BUILD_TYPE=%CMAKE_BUILD_TYPE% ^
  -DUSE_CUDA=ON ^
  -DCUDA_ARCH_NAME=Pascal ^
  -DUSE_NNPACK=OFF ^
  -DUSE_CUB=OFF ^
  -DUSE_GLOG=OFF ^
  -DUSE_GFLAGS=OFF ^
  -DUSE_LMDB=OFF ^
  -DUSE_LEVELDB=OFF ^
  -DUSE_ROCKSDB=OFF ^
  -DUSE_OPENCV=ON ^
  -DBUILD_SHARED_LIBS=OFF ^
  -DBUILD_PYTHON=ON ^
  -DUSE_CUDNN=ON ^
  -DCUDNN_INCLUDE_DIR=D:/lib/cudnn/cudnn-8.0-windows10-x64-v7/include ^
  -DCUDNN_LIBRARY=D:/lib/cudnn/cudnn-8.0-windows10-x64-v7/lib/x64/cudnn.lib ^
  -DOpenCV_DIR=D:/lib/opencv-3.2.0-vc14/build

@ir413
Copy link
Contributor

ir413 commented May 29, 2018

Thanks for the updates @II-Matto. You can also use -DCUDA_ARCH_NAME=All option to build Caffe2 for all supported architectures. Glad to hear Caffe2 is working now. Regarding Detectron ops, have you looked into (1) and (2) from my previous message?

@II-Matto
Copy link
Author

II-Matto commented May 31, 2018

@ir413 For (2), I think the code for loading dll should be fine. I have checked out the corresponding code in MXNet, i.e. the _load_lib() function in mxnet/base.py, and they are the similar to those used in PyTorch (The mode parameter ctypes.RTLD_LOCAL is actually ignored on Windows as explained in the Python doc). Since I can successfully load and run MXNet on my laptop, I guess the code for library loading is correct.

For (1), currently I do not feel there should be anything special on Windows. But as I am not that familiar with CMake, I am not sure about this. I have been reading CMakeLists files in other repositories and documents of CMake, but have not found any useful information.

I notice that Caffe2 is built as static library by default on Windows with -DBUILD_SHARED_LIBS=OFF. Could it possibly be the cause for the problem? I tried to build Caffe2 as shared lib, but it failed due to errors in caffe2.pb.h (something like kIndexInFileMessages used before initialization). The build error is as described in this issue pytorch/pytorch#7962.

@ir413
Copy link
Contributor

ir413 commented May 31, 2018

Thanks for the info @II-Matto. I'm not sure about this and I suggest asking on the PyTorch/Caffe2 page. In the meantime to get unblocked, you could try moving the ops from the Detectron module to the standard Caffe2 operators to avoid building them separately and loading them dynamically.

@II-Matto
Copy link
Author

@ir413 Thanks for the advice. I guess this may be the optimal solution for me right now.

@TneitaP
Copy link

TneitaP commented Oct 28, 2018

Succeed in building detectron on [Win10 ,python36 , vs2015 cuda8.0, cudnn7.0.5 ]by the adivice from this discussion in this issues. Mean procedure:

  1. Build Caffe2 from source correcltly , put the python36.lib into Generating dir " pytorch\build\caffe2" while Run the "build_windows.bat" ;
  2. Open the caffe2.sln , Rebuild all , figure out all the errors output , rebuild the error projroject;
  3. Add "cuda8.0 " to caffe2_gpu's build dependencie-> build customizations ;
    Coppy ALL files(.cc , .cu, .h) from "pytorch\modules\detectron" to"pytorch\caffe2\operators" ,
    Add the .cu file in dir to caffe2_gpu proj ,and .cc file to caffe2 proj;
  4. Open the caffe2.sln , Rebuild all , figure out all the errors output(Espesially on caffe2 and caffe2_gpu) , rebuild the error projroject ,then build the solution.(Only need rebuild sln at first time changing , After Debug just need to build sln);

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants