Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ImvoteNet 2-stage model reproduce error #448

Closed
lji72 opened this issue Apr 15, 2021 · 19 comments
Closed

ImvoteNet 2-stage model reproduce error #448

lji72 opened this issue Apr 15, 2021 · 19 comments
Assignees

Comments

@lji72
Copy link

lji72 commented Apr 15, 2021

Thanks for your error report and we appreciate it a lot.

Checklist

  1. I have searched related issues but cannot get the expected help.
  2. The bug has not been fixed in the latest version.

Describe the bug
I want to train an imVoteNet model and successfully train an "imvotenet_faster_rcnn_r50_fpn_2x4_sunrgbd-3d-10class.py", then train an "imvotenet_stage2_16x8_sunrgbd-3d-10class.py“ and raise a cuda error at the beginning of the training process.

Reproduction

  1. What command or script did you run?
   ./tools/dist_train.sh configs/imvotenet/imvotenet_stage2_16x8_sunrgbd-3d-10class.py 4 --work-dir ./train_log_imvotenet/ 
  1. Did you make any modifications on the code or config? Did you understand what you have modified?
    I make a modification on "load_from" value which is changed from HTTP link to a local path on the server.
  2. What dataset did you use?
    SUNRGBD

Environment

  1. Please run python mmdet3d/utils/collect_env.py to collect necessary environment infomation and paste it here.

Python: 3.7.10 | packaged by conda-forge | (default, Feb 19 2021, 16:07:37) [GCC 9.3.0]
CUDA available: True
GPU 0,1,2,3: Tesla P100-PCIE-16GB
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.2.r11.2/compiler.29373293_0
GCC: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
PyTorch: 1.4.0
PyTorch compiling details: PyTorch built with:

  • GCC 7.3
  • Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CUDA Runtime 10.1
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  • CuDNN 7.6.3
  • Magma 2.5.1
  • Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.5.0
OpenCV: 4.5.1
MMCV: 1.2.7
MMCV Compiler: GCC 5.4
MMCV CUDA Compiler: 11.2
MMDetection: 2.10.0
MMDetection3D: 0.11.0+

  1. You may add addition that may be helpful for locating the problem, such as
    1. I install env in conda.
    2. I use the generated sunrgbd dataset to train VoteNet successfully.
    3. I train the first stage model of imVoteNet successfully.
    4. I use python scripts instead of matlab scripts to preprocess sunrgbd data. But I use the generated data to train VoteNet and first stage model of imVoteNet successfully.

Error traceback
If applicable, paste the error trackback here.

/pytorch/aten/src/THC/THCTensorScatterGather.cu:100: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [1,0,0], thread: [32,0,0] Assertion `indexValue >= 0 && indexValue < src.sizes[dim]` failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:100: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [1,0,0], thread: [33,0,0] Assertion `indexValue >= 0 && indexValue < src.sizes[dim]` failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:100: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [1,0,0], thread: [34,0,0] Assertion `indexValue >= 0 && indexValue < src.sizes[dim]` failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:100: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [1,0,0], thread: [35,0,0] Assertion `indexValue >= 0 && indexValue < src.sizes[dim]` failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:100: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [1,0,0], thread: [36,0,0] Assertion `indexValue >= 0 && indexValue < src.sizes[dim]` failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:100: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [1,0,0], thread: [37,0,0] Assertion `indexValue >= 0 && indexValue < src.sizes[dim]` failed.

Bug fix
If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

@yezhen17
Copy link
Collaborator

Hi @lji72 ,

I've run the code and nothing unexpected happens. Can you provide more error trackback, such as which function triggered this? You can try CUDA_LAUNCH_BLOCKING=1 /tools/dist_train.sh configs/imvotenet/imvotenet_stage2_16x8_sunrgbd-3d-10class.py 4 --work-dir ./train_log_imvotenet/ if currently you cannot see more meaningful error trace.

@yezhen17 yezhen17 self-assigned this Apr 15, 2021
@lji72
Copy link
Author

lji72 commented Apr 15, 2021

1./scratch/workspace/xxxx/mmdet3d/mmdetection3d-master/mmdet3d/models/fusion_layers/coord_transform.py:33: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
if 'pcd_rotation' in img_meta else torch.eye(
/pytorch/aten/src/THC/THCTensorScatterGather.cu:100: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [0,0,0], thread: [64,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:100: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [0,0,0], thread: [65,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:100: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [0,0,0], thread: [66,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:100: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [0,0,0], thread: [67,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:100: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [0,0,0], thread: [68,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:100: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [0,0,0], thread: [69,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed.
@

2.Traceback (most recent call last):
File "./tools/train.py", line 212, in
main()
File "./tools/train.py", line 208, in main
meta=meta)
File "/scratch/workspace/xxx/mmdet3d/mmdetection-2.10.0/mmdet/apis/train.py", line 170, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/scratch/workspace/xxx/anaconda3/envs/mmdet3D/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 125, in run
epoch_runner(data_loaders[i], **kwargs)
File "/scratch/workspace/xxx/anaconda3/envs/mmdet3D/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True)
File "/scratch/workspace/xxx/anaconda3/envs/mmdet3D/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
**kwargs)
File "/scratch/workspace/xxx/anaconda3/envs/mmdet3D/lib/python3.7/site-packages/mmcv/parallel/distributed.py", line 46, in train_step
output = self.module.train_step(*inputs[0], **kwargs[0])
File "/scratch/workspace/xxx/mmdet3d/mmdetection-2.10.0/mmdet/models/detectors/base.py", line 247, in train_step
losses = self(**data)
File "/scratch/workspace/xxx/anaconda3/envs/mmdet3D/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/scratch/workspace/xxx/anaconda3/envs/mmdet3D/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 84, in new_func
return old_func(*args, **kwargs)
File "/scratch/workspace/xxx/mmdet3d/mmdetection3d-master/mmdet3d/models/detectors/base.py", line 58, in forward
return self.forward_train(**kwargs)
File "/scratch/workspace/xxx/mmdet3d/mmdetection3d-master/mmdet3d/models/detectors/imvotenet.py", line 453, in forward_train
img_metas, calib)
File "/scratch/workspace/xxx/anaconda3/envs/mmdet3D/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/scratch/workspace/xxx/mmdet3d/mmdetection3d-master/mmdet3d/models/fusion_layers/vote_fusion.py", line 202, in forward
txt_cue = torch.gather(img_flatten, dim=-1, index=uv_expanded)
RuntimeError: cuda runtime error (710) : device-side assert triggered at /pytorch/aten/src/THC/generic/THCTensorScatterGather.cu:67

3.frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7ff9b3022193 in /scratch/workspace/xxx/anaconda3/envs/mmdet3D/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: + 0x17f66 (0x7ff9b325ff66 in /scratch/workspace/xxx/anaconda3/envs/mmdet3D/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: + 0x19cbd (0x7ff9b3261cbd in /scratch/workspace/xxx/anaconda3/envs/mmdet3D/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::TensorImpl::release_resources() + 0x4d (0x7ff9b301263d in /scratch/workspace/xxx/anaconda3/envs/mmdet3D/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #4: + 0x67bac2 (0x7ff9fe439ac2 in /scratch/workspace/xxx/anaconda3/envs/mmdet3D/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: + 0x67bb66 (0x7ff9fe439b66 in /scratch/workspace/xxx/anaconda3/envs/mmdet3D/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #6: + 0x183dd6 (0x56407457edd6 in /scratch/workspace/xxx/anaconda3/envs/mmdet3D/bin/python)
frame #7: + 0xe730f (0x5640744e230f in /scratch/workspace/xxx/anaconda3/envs/mmdet3D/bin/python)
frame #8: + 0xe605b (0x5640744e105b in /scratch/workspace/xxx/anaconda3/envs/mmdet3D/bin/python)
frame #9: + 0xe605b (0x5640744e105b in /scratch/workspace/xxx/anaconda3/envs/mmdet3D/bin/python)
frame #10: + 0xe5928 (0x5640744e0928 in /scratch/workspace/xxx/anaconda3/envs/mmdet3D/bin/python)
frame #11: + 0xe62c8 (0x5640744e12c8 in /scratch/workspace/xxx/anaconda3/envs/mmdet3D/bin/python)
frame #12: + 0xe62de (0x5640744e12de in /scratch/workspace/xxx/anaconda3/envs/mmdet3D/bin/python)
frame #13: + 0xe62de (0x5640744e12de in /scratch/workspace/xxx/anaconda3/envs/mmdet3D/bin/python)
frame #14: + 0xe62de (0x5640744e12de in /scratch/workspace/xxx/anaconda3/envs/mmdet3D/bin/python)
frame #15: + 0xe62de (0x5640744e12de in /scratch/workspace/xxx/anaconda3/envs/mmdet3D/bin/python)
frame #16: + 0xe62de (0x5640744e12de in /scratch/workspace/xxx/anaconda3/envs/mmdet3D/bin/python)
frame #17: + 0xe62de (0x5640744e12de in /scratch/workspace/xxx/anaconda3/envs/mmdet3D/bin/python)
frame #18: PyDict_SetItem + 0x4bf (0x56407452661f in /scratch/workspace/xxx/anaconda3/envs/mmdet3D/bin/python)
frame #19: PyDict_SetItemString + 0x66 (0x5640745268c6 in /scratch/workspace/xxx/anaconda3/envs/mmdet3D/bin/python)
frame #20: PyImport_Cleanup + 0x9c (0x56407462e4cc in /scratch/workspace/xxx/anaconda3/envs/mmdet3D/bin/python)
frame #21: Py_FinalizeEx + 0x67 (0x56407462e897 in /scratch/workspace/xxx/anaconda3/envs/mmdet3D/bin/python)
frame #22: + 0x2484db (0x5640746434db in /scratch/workspace/xxx/anaconda3/envs/mmdet3D/bin/python)
frame #23: _Py_UnixMain + 0x3c (0x56407464385c in /scratch/workspace/xxx/anaconda3/envs/mmdet3D/bin/python)
frame #24: __libc_start_main + 0xe7 (0x7ffa050d4b97 in /lib/x86_64-linux-gnu/libc.so.6)

Are they enough? And I have a question, Are generated datasets used for VoteNet and ImVoteNet the same?

@yezhen17
Copy link
Collaborator

yezhen17 commented Apr 15, 2021

I'm not sure about the problem. Please check uv_expanded and see if there is something obviously going wrong. And maybe you can first check that the dataset preparation is correct here? As for your question, the datasets should be the same, while the image data and calibration information is useless for VoteNet.

@Divadi
Copy link
Contributor

Divadi commented Apr 18, 2021

I encountered the same problem, albeit I was training w/ v1 data, though perhaps v2 has the same issue (since calibs didn't change between v1 and v2 SUNRGBD)
Here:

uv_flatten = uv_rescaled[:, 1].round() * \
img_shape[1] + uv_rescaled[:, 0].round()
uv_expanded = uv_flatten.unsqueeze(0).expand(3, -1).long()

I added torch clamps to keep the projections within image bounds:

uv_rescaled[:, 0] = torch.clamp(uv_rescaled[:, 0].round(), 0, img_shape[1] - 1)
uv_rescaled[:, 1] = torch.clamp(uv_rescaled[:, 1].round(), 0, img_shape[0] - 1)
uv_flatten = uv_rescaled[:, 1].round() * \
    img_shape[1] + uv_rescaled[:, 0].round()
uv_expanded = uv_flatten.unsqueeze(0).expand(3, -1).long()

And this solved the issue. I was able to achieve similar numbers as the reference model with this method.

@yezhen17
Copy link
Collaborator

I encountered the same problem, albeit I was training w/ v1 data, though perhaps v2 has the same issue (since calibs didn't change between v1 and v2 SUNRGBD)
Here:

uv_flatten = uv_rescaled[:, 1].round() * \
img_shape[1] + uv_rescaled[:, 0].round()
uv_expanded = uv_flatten.unsqueeze(0).expand(3, -1).long()

I added torch clamps to keep the projections within image bounds:

uv_rescaled[:, 0] = torch.clamp(uv_rescaled[:, 0].round(), 0, img_shape[1] - 1)
uv_rescaled[:, 1] = torch.clamp(uv_rescaled[:, 1].round(), 0, img_shape[0] - 1)
uv_flatten = uv_rescaled[:, 1].round() * \
    img_shape[1] + uv_rescaled[:, 0].round()
uv_expanded = uv_flatten.unsqueeze(0).expand(3, -1).long()

And this solved the issue. I was able to achieve similar numbers as the reference model with this method.

Hi @Divadi ,

Thanks for the solution! Originally I thought that something may be wrong with the dataset so the calculated image coordinates are out of bound. But if clamping the coordinates can achieve similar performance, then perhaps the error is caused by some deviation in calculation. Can you kindly check the range of uv_rescaled and see how far it gets out of bound?

@lji72
Copy link
Author

lji72 commented Apr 18, 2021

Thanks for your replies. @Divadi @THU17cyz

@lji72
Copy link
Author

lji72 commented Apr 18, 2021

It works for me. @Divadi

@Divadi
Copy link
Contributor

Divadi commented Apr 18, 2021

I encountered the same problem, albeit I was training w/ v1 data, though perhaps v2 has the same issue (since calibs didn't change between v1 and v2 SUNRGBD)
Here:

uv_flatten = uv_rescaled[:, 1].round() * \
img_shape[1] + uv_rescaled[:, 0].round()
uv_expanded = uv_flatten.unsqueeze(0).expand(3, -1).long()

I added torch clamps to keep the projections within image bounds:

uv_rescaled[:, 0] = torch.clamp(uv_rescaled[:, 0].round(), 0, img_shape[1] - 1)
uv_rescaled[:, 1] = torch.clamp(uv_rescaled[:, 1].round(), 0, img_shape[0] - 1)
uv_flatten = uv_rescaled[:, 1].round() * \
    img_shape[1] + uv_rescaled[:, 0].round()
uv_expanded = uv_flatten.unsqueeze(0).expand(3, -1).long()

And this solved the issue. I was able to achieve similar numbers as the reference model with this method.

Hi @Divadi ,

Thanks for the solution! Originally I thought that something may be wrong with the dataset so the calculated image coordinates are out of bound. But if clamping the coordinates can achieve similar performance, then perhaps the error is caused by some deviation in calculation. Can you kindly check the range of uv_rescaled and see how far it gets out of bound?

It did not seem to be very much.
Printing a number of cases, it seemed like uv_rescaled[:, 0] was sometimes perhaps 1 or so pixels higher than the maximum image.
I had just decided that perhaps it was some issue with reversing data augmentation (but when I wrote my own pipeline for projecting points back to image without augmentation, there is an extremely close fit with corresponding pixels)

I was using info files generated via a February version of the repo (though I went through everything and checked that calib files were identical)

@Divadi
Copy link
Contributor

Divadi commented Apr 18, 2021

While we're on the subject of SUNRGB-D, I just wanted to bring to attention another thing

assert(strcmp(data2d.groundtruth2DBB(j).classname, classname));

This line tries to force consistency between 2D and 3D bounding boxes (there is not a bijection between them). However, I believe it is often the case that a number of 3D bounding boxes are just dropped.
Indeed, they don't have corresponding 2D annotations, but for 3D-only methods, this is not an issue, and having complete 3D boxes can likely help performance.

@yezhen17
Copy link
Collaborator

yezhen17 commented Apr 19, 2021

I encountered the same problem, albeit I was training w/ v1 data, though perhaps v2 has the same issue (since calibs didn't change between v1 and v2 SUNRGBD)
Here:

uv_flatten = uv_rescaled[:, 1].round() * \
img_shape[1] + uv_rescaled[:, 0].round()
uv_expanded = uv_flatten.unsqueeze(0).expand(3, -1).long()

I added torch clamps to keep the projections within image bounds:

uv_rescaled[:, 0] = torch.clamp(uv_rescaled[:, 0].round(), 0, img_shape[1] - 1)
uv_rescaled[:, 1] = torch.clamp(uv_rescaled[:, 1].round(), 0, img_shape[0] - 1)
uv_flatten = uv_rescaled[:, 1].round() * \
    img_shape[1] + uv_rescaled[:, 0].round()
uv_expanded = uv_flatten.unsqueeze(0).expand(3, -1).long()

And this solved the issue. I was able to achieve similar numbers as the reference model with this method.

Hi @Divadi ,
Thanks for the solution! Originally I thought that something may be wrong with the dataset so the calculated image coordinates are out of bound. But if clamping the coordinates can achieve similar performance, then perhaps the error is caused by some deviation in calculation. Can you kindly check the range of uv_rescaled and see how far it gets out of bound?

It did not seem to be very much.
Printing a number of cases, it seemed like uv_rescaled[:, 0] was sometimes perhaps 1 or so pixels higher than the maximum image.
I had just decided that perhaps it was some issue with reversing data augmentation (but when I wrote my own pipeline for projecting points back to image without augmentation, there is an extremely close fit with corresponding pixels)

I was using info files generated via a February version of the repo (though I went through everything and checked that calib files were identical)

I guess it's because reversing a rotation (taking the transpose) is not precise enough (and we are using float precision). Nevertheless, since you reproduced the results, I believe this is not a problem, and clamping the coordinates is a good solution. Are you willing to open a pull request to fix this?

@Divadi
Copy link
Contributor

Divadi commented Apr 19, 2021

Sure I can do it

@yezhen17
Copy link
Collaborator

While we're on the subject of SUNRGB-D, I just wanted to bring to attention another thing

assert(strcmp(data2d.groundtruth2DBB(j).classname, classname));

This line tries to force consistency between 2D and 3D bounding boxes (there is not a bijection between them). However, I believe it is often the case that a number of 3D bounding boxes are just dropped.
Indeed, they don't have corresponding 2D annotations, but for 3D-only methods, this is not an issue, and having complete 3D boxes can likely help performance.

I'm not sure about what you are implying. As SUNRGBD data take the form of depth-maps, the point clouds are partial, therefore each 3D annotation should correspond to a 2D one?

@Divadi
Copy link
Contributor

Divadi commented Apr 19, 2021

I don't precisely remember the name, but each 2D bbox annotation struct seems to have a "has_3d_box" parameter (or perhaps the other way around), so they don't seem to have a 1-for-1 correspondence.
I can look into this more when I have time in a few days

@yezhen17
Copy link
Collaborator

I don't precisely remember the name, but each 2D bbox annotation struct seems to have a "has_3d_box" parameter (or perhaps the other way around), so they don't seem to have a 1-for-1 correspondence.
I can look into this more when I have time in a few days

I can look into this later too. We follow the data preprocessing of the original VoteNet and ImVoteNet repo so we haven't digged deep into this.

@lji72
Copy link
Author

lji72 commented Apr 21, 2021

@THU17cyz I find that there is something wrong with my calibration data due to the difference between python and matlab code. Accuracy raises to 65.3% [email protected] from 59.80% MAP(use fix method from @Divadi ) with fixing the issue.
Many thanks for your guys' work!

ZwwWayne pushed a commit that referenced this issue Apr 21, 2021
* Clamp 2D Projection of 3D Votes to Image Boundaries

* 'lint'

* 'lint;

* lint

Co-authored-by: Yezhen Cong <[email protected]>
@yezhen17
Copy link
Collaborator

yezhen17 commented Apr 21, 2021

@THU17cyz I find that there is something wrong with my calibration data due to the difference between python and matlab code. Accuracy raises to 65.3% [email protected] from 59.80% MAP(use fix method from @Divadi ) with fixing the issue.
Many thanks for your guys' work!

Great to hear that. Can you explain more about why the calibration data is wrong? We wish to figure out what is the root cause of the out-of-bound bug, since we did not encounter this. Thanks!

@Tai-Wang
Copy link
Member

Fixed via #463

@yezhen17
Copy link
Collaborator

yezhen17 commented May 2, 2021

Hi @lji72 , @Divadi , just curious, did you also suffer from #507 ?

@Divadi
Copy link
Contributor

Divadi commented Aug 28, 2021

@THU17cyz
I never saw the notification for this; I apologize.

I had generated my pickle files before the imvotenet commit (which introduced #507 issue), so #448 was not caused by the Rt vs K issue.
Besides, I had visualized projections & they looked reasonable, and the Rt and K values of my generated pickle files are reasonable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants