Method AffineChannel is not a registered operator #454

II-Matto · 2018-05-29T03:17:47Z

Expected results

I am trying to run Detectron in Windows. After successfully building caffe2 (with CUDA, cuDNN, OpenCV), COCOAPI and Detectron modules, I ran the tools/train_net.py, trying to train a network on Pascal VOC.

I have modified import_detectron_ops() in detectron/utils/c2.py to use my caffe2_detectron_ops_gpu.dll path.

I have added the following path with sys.path.insert(0, path) (in tools/train_net.py).

pytorch build directory
detectron root directory
COCOAPI PythonAPI directory

I have added the following path to my PATH variable.

cuDNN bin directory
pytorch build bin directory (pytorch/build/bin/Release), which contains caffe2_detectron_ops_gpu.dll
OpenCV bin directory

With all these prepared, I was hoping I could successfully train Faster R-CNN using Detectron.

Actual results

It seems to have succeeded in all import commands. But it then results in the following error:

...
  File "D:/repo/github/Detectron_facebookresearch\detectron\utils\train.py", line 53, in train_model
    model, weights_file, start_iter, checkpoints, output_dir = create_model()
  File "D:/repo/github/Detectron_facebookresearch\detectron\utils\train.py", line 132, in create_model
    model = model_builder.create(cfg.MODEL.TYPE, train=True)
  File "D:/repo/github/Detectron_facebookresearch\detectron\modeling\model_builder.py", line 124, in create
    return get_func(model_type_func)(model)
  File "D:/repo/github/Detectron_facebookresearch\detectron\modeling\model_builder.py", line 89, in generalized_rcnn
    freeze_conv_body=cfg.TRAIN.FREEZE_CONV_BODY
  File "D:/repo/github/Detectron_facebookresearch\detectron\modeling\model_builder.py", line 229, in build_generic_detection_model
    optim.build_data_parallel_model(model, _single_gpu_build_func)
  File "D:/repo/github/Detectron_facebookresearch\detectron\modeling\optimizer.py", line 40, in build_data_parallel_model
    all_loss_gradients = _build_forward_graph(model, single_gpu_build_func)
  File "D:/repo/github/Detectron_facebookresearch\detectron\modeling\optimizer.py", line 63, in _build_forward_graph
    all_loss_gradients.update(single_gpu_build_func(model))
  File "D:/repo/github/Detectron_facebookresearch\detectron\modeling\model_builder.py", line 169, in _single_gpu_build_func
    blob_conv, dim_conv, spatial_scale_conv = add_conv_body_func(model)
  File "D:/repo/github/Detectron_facebookresearch\detectron\modeling\ResNet.py", line 36, in add_ResNet50_conv4_body
    return add_ResNet_convX_body(model, (3, 4, 6))
  File "D:/repo/github/Detectron_facebookresearch\detectron\modeling\ResNet.py", line 98, in add_ResNet_convX_body
    p, dim_in = globals()[cfg.RESNETS.STEM_FUNC](model, 'data')
  File "D:/repo/github/Detectron_facebookresearch\detectron\modeling\ResNet.py", line 252, in basic_bn_stem
    p = model.AffineChannel(p, 'res_conv1_bn', dim=dim, inplace=True)
  File "D:/repo/github/Detectron_facebookresearch\detectron\modeling\detector.py", line 103, in AffineChannel
    return self.net.AffineChannel([blob_in, scale, bias], blob_in)
  File "D:/repo/github/pytorch/build\caffe2\python\core.py", line 2067, in __getattr__
    ",".join(workspace.C.nearby_opnames(op_type)) + ']'
AttributeError: Method AffineChannel is not a registered operator. Did you mean: []

I have checked out issue #320, and ensured that there is c2_utils.import_detectron_ops(), which should have imported the Detectron operators.

Detailed steps to reproduce

Build caffe with CUDA, cuDNN, OpenCV.
Build COCOAPI modules.
Build Detectron modules.
Add all sorts of paths properly (as described above).
Run tools/train_net.py with proper arguments.

System information

Operating system: Windows 10 Home Edition
Compiler version: VS 2015
CUDA version: 8.0
cuDNN version: cudnn-8.0-windows10-x64-v7
NVIDIA driver version: 382.05
GPU models (for all devices if they are not all the same): (GTX 1050)
PYTHONPATH environment variable: (sys.path.insert() is used instead as described above)
python --version output: Python 2.7.12 :: Anaconda custom (64-bit)
Anything else that seems relevant: OpenCV 3.2.0-vc14

The text was updated successfully, but these errors were encountered:

II-Matto · 2018-05-29T08:23:32Z

I used the dumpbin tool to examine my caffe2_detectron_ops_gpu.dll, which only has a size of ~5.5MB.

With the EXPORTS option, the results are as follows:

Microsoft (R) COFF/PE Dumper Version 14.00.24215.1
Copyright (C) Microsoft Corporation.  All rights reserved.


Dump of file D:\repo\github\pytorch\build\bin\Release\caffe2_detectron_ops_gpu.dll

File Type: DLL

  Section contains the following exports for caffe2_detectron_ops_gpu.dll

    00000000 characteristics
    5B0CBE13 time date stamp Tue May 29 10:42:27 2018
        0.00 version
           1 ordinal base
           1 number of functions
           1 number of names

    ordinal hint RVA      name

          1    0 003393E8 NvOptimusEnablementCuda

  Summary

       13000 .data
        1000 .gfids
        1000 .nvFatBi
      20C000 .nv_fatb
       23000 .pdata
       E6000 .rdata
        5000 .reloc
        1000 .rsrc
      252000 .text
        1000 .tls

With the SYMBOLS option, the results are as follows:

Microsoft (R) COFF/PE Dumper Version 14.00.24215.1
Copyright (C) Microsoft Corporation.  All rights reserved.


Dump of file D:\repo\github\pytorch\build\bin\Release\caffe2_detectron_ops_gpu.dll

File Type: DLL

  Summary

       13000 .data
        1000 .gfids
        1000 .nvFatBi
      20C000 .nv_fatb
       23000 .pdata
       E6000 .rdata
        5000 .reloc
        1000 .rsrc
      252000 .text
        1000 .tls

Does this mean the Detectron operators are actually not compiled? If so, what could possibly be the reason and how can I make them compile?

ir413 · 2018-05-29T09:20:12Z

Hi @II-Matto, we do not support building on Windows at this time (see the list of requirements here; see also #25 and #276). Sorry for the inconvenience.

II-Matto · 2018-05-29T09:50:20Z

@ir413 Thanks for your reply. I have managed to build Caffe2 and obtain the (probably incorrect) caffe2_detectron_ops_gpu.dll. Does this mean the compilation is OK?

What are the extra modifications that may be needed for building on Windows? If it is not that complex, I may give it a shot myself. Would you be so kind to give me some hints?

ir413 · 2018-05-29T10:33:57Z

I don't have any experience with using Caffe2 on Windows so I don't really have a very clear idea of what may be required. I suspect that if you can get Caffe2 to work on Windows then it should be possible to run Detectron as well but I may be missing something.

Just to double-check, you verified your Caffe2 installation by running Caffe2 tests (e.g. this one), right?

AttributeError: Method AffineChannel is not a registered operator. Did you mean: []

It seems that the problem has to do with loading detectron operators. Detectron ops are built as a Caffe2 module (located here) and loaded dynamically from python (by this function). I guess, the issue may have do with (1) building Detectron ops or (2) loading them (or both). For (1), I would look into this CMake file and check if Windows specific modifications are required. For (2), I would look into this function and make sure that it is handling Windows libraries correctly.

II-Matto · 2018-05-29T11:23:18Z

@ir413 I have just tried relu_op_test (Sorry I only did the two simple import tests previously). It succeeds with the CPU-only caffe2 (removing caffe2_pybind11_state_gpu.pyd), but reports the following errors with GPU-enabled caffe2.

...
  File "D:/repo/github/pytorch/build/caffe2/python/operator_test/relu_op_test.py", line 24, in test_relu
    engine=st.sampled_from(["", "CUDNN"]),
  File "C:\Program Files\Anaconda2\lib\site-packages\hypothesis\core.py", line 524, in test
    result = self.test(*args, **kwargs)
  File "D:/repo/github/pytorch/build/caffe2/python/operator_test/relu_op_test.py", line 31, in test_relu
    self.assertDeviceChecks(dc, op, [X], [0])
  File "D:/repo/github/pytorch/build\caffe2\python\hypothesis_test_util.py", line 350, in assertDeviceChecks
    dc.CheckSimple(op, inputs, outputs_to_check, input_device_options)
  File "D:/repo/github/pytorch/build\caffe2\python\device_checker.py", line 49, in CheckSimple
    workspace.RunOperatorOnce(op)
  File "D:/repo/github/pytorch/build\caffe2\python\workspace.py", line 165, in RunOperatorOnce
    return C.run_operator_once(StringifyProto(operator))
RuntimeError: [enforce fail at context_gpu.h:156] . Encountered CUDA error: invalid device function Error from operator:
input: "X" output: "Y" name: "" type: "Relu" device_option { device_type: 1 } engine: ""
...

All test cases fail with the invalid device function Error. Could it possibly be that my GPU-enabled Caffe2 is built for the wrong architecture? The GPU I used is a GTX 1050, and I found its architecture is Pascal from the NVIDIA website. In the build_windows.bat script in Caffe2, it passes an option of -DCUDA_ARCH_NAME=Maxwell to cmake, which I did not notice before (Sorry for my bad). Now I am trying to change this option to Pascal and rebuild Caffe2.

I do not know if this may lead to failure of loading Detectron operators. I will let you know as soon as the building is completed.

II-Matto · 2018-05-29T13:10:47Z

@ir413 The relu_op_test now runs successfully for Caffe2 with both CPU and GPU. But the Detectron operators still fail to be registered.

FYI, the cmake command for my Caffe2 building is as follows. The CMAKE_BUILD_TYPE is Visual Studio 14 2015 Win64.

cmake .. ^
  -G%CMAKE_GENERATOR% ^
  -DBUILD_TEST=OFF ^
  -DCMAKE_BUILD_TYPE=%CMAKE_BUILD_TYPE% ^
  -DUSE_CUDA=ON ^
  -DCUDA_ARCH_NAME=Pascal ^
  -DUSE_NNPACK=OFF ^
  -DUSE_CUB=OFF ^
  -DUSE_GLOG=OFF ^
  -DUSE_GFLAGS=OFF ^
  -DUSE_LMDB=OFF ^
  -DUSE_LEVELDB=OFF ^
  -DUSE_ROCKSDB=OFF ^
  -DUSE_OPENCV=ON ^
  -DBUILD_SHARED_LIBS=OFF ^
  -DBUILD_PYTHON=ON ^
  -DUSE_CUDNN=ON ^
  -DCUDNN_INCLUDE_DIR=D:/lib/cudnn/cudnn-8.0-windows10-x64-v7/include ^
  -DCUDNN_LIBRARY=D:/lib/cudnn/cudnn-8.0-windows10-x64-v7/lib/x64/cudnn.lib ^
  -DOpenCV_DIR=D:/lib/opencv-3.2.0-vc14/build

ir413 · 2018-05-29T15:29:57Z

Thanks for the updates @II-Matto. You can also use -DCUDA_ARCH_NAME=All option to build Caffe2 for all supported architectures. Glad to hear Caffe2 is working now. Regarding Detectron ops, have you looked into (1) and (2) from my previous message?

II-Matto · 2018-05-31T05:23:09Z

@ir413 For (2), I think the code for loading dll should be fine. I have checked out the corresponding code in MXNet, i.e. the _load_lib() function in mxnet/base.py, and they are the similar to those used in PyTorch (The mode parameter ctypes.RTLD_LOCAL is actually ignored on Windows as explained in the Python doc). Since I can successfully load and run MXNet on my laptop, I guess the code for library loading is correct.

For (1), currently I do not feel there should be anything special on Windows. But as I am not that familiar with CMake, I am not sure about this. I have been reading CMakeLists files in other repositories and documents of CMake, but have not found any useful information.

I notice that Caffe2 is built as static library by default on Windows with -DBUILD_SHARED_LIBS=OFF. Could it possibly be the cause for the problem? I tried to build Caffe2 as shared lib, but it failed due to errors in caffe2.pb.h (something like kIndexInFileMessages used before initialization). The build error is as described in this issue pytorch/pytorch#7962.

ir413 · 2018-05-31T14:47:11Z

Thanks for the info @II-Matto. I'm not sure about this and I suggest asking on the PyTorch/Caffe2 page. In the meantime to get unblocked, you could try moving the ops from the Detectron module to the standard Caffe2 operators to avoid building them separately and loading them dynamically.

II-Matto · 2018-05-31T17:33:36Z

@ir413 Thanks for the advice. I guess this may be the optimal solution for me right now.

TneitaP · 2018-10-28T09:08:14Z

Succeed in building detectron on [Win10 ,python36 , vs2015 cuda8.0, cudnn7.0.5 ]by the adivice from this discussion in this issues. Mean procedure:

Build Caffe2 from source correcltly , put the python36.lib into Generating dir " pytorch\build\caffe2" while Run the "build_windows.bat" ;
Open the caffe2.sln , Rebuild all , figure out all the errors output , rebuild the error projroject;
Add "cuda8.0 " to caffe2_gpu's build dependencie-> build customizations ;
Coppy ALL files(.cc , .cu, .h) from "pytorch\modules\detectron" to"pytorch\caffe2\operators" ,
Add the .cu file in dir to caffe2_gpu proj ,and .cc file to caffe2 proj;
Open the caffe2.sln , Rebuild all , figure out all the errors output(Espesially on caffe2 and caffe2_gpu) , rebuild the error projroject ,then build the solution.(Only need rebuild sln at first time changing , After Debug just need to build sln);

ir413 closed this as completed May 29, 2018

II-Matto mentioned this issue May 31, 2018

[Caffe2] Operators of Detectron module not registered/compiled when built on windows pytorch/pytorch#7912

Open

ir413 mentioned this issue Jul 12, 2018

Command line error D8021 : invalid numeric argument '/Wno-cpp' #552

Closed

ir413 mentioned this issue Dec 24, 2018

how to implement Detectron in windows #776

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Method AffineChannel is not a registered operator #454

Method AffineChannel is not a registered operator #454

II-Matto commented May 29, 2018

II-Matto commented May 29, 2018

ir413 commented May 29, 2018 •

edited

Loading

II-Matto commented May 29, 2018

ir413 commented May 29, 2018

II-Matto commented May 29, 2018

II-Matto commented May 29, 2018 •

edited

Loading

ir413 commented May 29, 2018

II-Matto commented May 31, 2018 •

edited

Loading

ir413 commented May 31, 2018

II-Matto commented May 31, 2018

TneitaP commented Oct 28, 2018

Method AffineChannel is not a registered operator #454

Method AffineChannel is not a registered operator #454

Comments

II-Matto commented May 29, 2018

Expected results

Actual results

Detailed steps to reproduce

System information

II-Matto commented May 29, 2018

ir413 commented May 29, 2018 • edited Loading

II-Matto commented May 29, 2018

ir413 commented May 29, 2018

II-Matto commented May 29, 2018

II-Matto commented May 29, 2018 • edited Loading

ir413 commented May 29, 2018

II-Matto commented May 31, 2018 • edited Loading

ir413 commented May 31, 2018

II-Matto commented May 31, 2018

TneitaP commented Oct 28, 2018

ir413 commented May 29, 2018 •

edited

Loading

II-Matto commented May 29, 2018 •

edited

Loading

II-Matto commented May 31, 2018 •

edited

Loading