Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About vis_rpn_anchors #21

Open
a5372935 opened this issue Mar 16, 2020 · 17 comments
Open

About vis_rpn_anchors #21

a5372935 opened this issue Mar 16, 2020 · 17 comments

Comments

@a5372935
Copy link

❓ Questions and Help

Which one is the match_anchor or anchor_proposal that I should care about?

And, why image has more than two bboxes on the same target when using inference_demo.py prediction, how can I make him output only one bboxes

@mrlooi
Copy link
Owner

mrlooi commented Mar 16, 2020

anchor_proposal is used to generate initial proposals for the network, before the rroi layer refines the (rotated) bounding boxes

You would get two rotated bboxes on the same target if their IoU < IoU threshold. Try decreasing the ROI IoU threshold

@a5372935
Copy link
Author

thanks. Then what does match_anchors mean

@a5372935
Copy link
Author

Is it first match_anchor, then we have to try to train the regression as anchor_proposal

@mrlooi
Copy link
Owner

mrlooi commented Mar 16, 2020

It's been a long time since I last saw the code, but based on the naming, it probably means anchors that have IoU > RPN IoU threshold. These anchors are fed to the RPN regression layer

@a5372935
Copy link
Author

I need help on my case
This is output : config https://drive.google.com/open?id=1AhByUq5SHwmo8xIadWziPG5_UPRR5vU2
My initial config : https://drive.google.com/open?id=1AhByUq5SHwmo8xIadWziPG5_UPRR5vU2
My log : https://drive.google.com/open?id=1HQfS0Fhqf-ABMcQfg9OOGeyOLTcQcett
My predict image : https://drive.google.com/open?id=1lupmX2EsgxJ5GA33knmsRusINB8vJ3Do

The loss of my training is already very low, why is the result still so bad, is my parameter tuning bad, or is it not enough training

@mrlooi
Copy link
Owner

mrlooi commented Mar 17, 2020

From the image, the target objects are really small. My guess is that there is a class significant imbalance where there are a lot more invalid region proposals (rotated RPNs) than valid ones. A possible fix is to remove very large anchor sizes (i.e. 256) or really small ones (i.e. 20) that don't fit the objects in the dataset, and start with a simpler model (R-50-FPN). It's generally good to reduce the number of total anchors to i.e. 9-15 anchors.

@a5372935
Copy link
Author

let me try

@a5372935
Copy link
Author

@mrlooi Sometime, I got

### File "/home/lab602/桌面/rotated_maskrcnn-master/maskrcnn_benchmark/modeling/roi_heads/maskiou_head/roi_maskiou_feature_extractors.py", line 66, in forward
mask_pool = self.max_pool2d(mask)

File "/home/lab602/anaconda3/envs/rotated/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/lab602/anaconda3/envs/rotated/lib/python3.6/site-packages/torch/nn/modules/pooling.py", line 146, in forward
self.return_indices)
File "/home/lab602/anaconda3/envs/rotated/lib/python3.6/site-packages/torch/_jit_internal.py", line 133, in fn
return if_false(*args, **kwargs)
File "/home/lab602/anaconda3/envs/rotated/lib/python3.6/site-packages/torch/nn/functional.py", line 494, in _max_pool2d
input, kernel_size, stride, padding, dilation, ceil_mode)
### RuntimeError: invalid argument 2: non-empty 3D or 4D input tensor expected but got: [0 x 1 x 28 x 28] at /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/generic/SpatialDilatedMaxPooling.cu:37

why?

@mrlooi
Copy link
Owner

mrlooi commented Mar 19, 2020

The error looks to originate from pooling.py. My guess is that the number of initial proposals were small/empty, and after pooling none of the proposals met the passing criterion (could be IoU with ground truth)

@a5372935
Copy link
Author

@mrlooi Thank you i understand. And I also want to ask a few questions about RRPN Faster


restore from pretrained_weighs in IMAGE_NET
2020-03-19 10:49:16.049380: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-03-19 10:49:16.199289: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-19 10:49:16.199738: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5564e3590510 executing computations on platform CUDA. Devices:
2020-03-19 10:49:16.199753: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): GeForce RTX 2080, Compute Capability 7.5
2020-03-19 10:49:16.221193: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3696000000 Hz
2020-03-19 10:49:16.223662: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5564e35fb270 executing computations on platform Host. Devices:
2020-03-19 10:49:16.223733: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): ,
2020-03-19 10:49:16.224412: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce RTX 2080 major: 7 minor: 5 memoryClockRate(GHz): 1.8
pciBusID: 0000:01:00.0
totalMemory: 7.76GiB freeMemory: 6.34GiB
2020-03-19 10:49:16.224473: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2020-03-19 10:49:16.230070: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-19 10:49:16.230131: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2020-03-19 10:49:16.230159: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2020-03-19 10:49:16.230719: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6162 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080, pci bus id: 0000:01:00.0, compute capability: 7.5)
WARNING:tensorflow:From /home/lab602/anaconda3/envs/faster/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
restore model
WARNING:tensorflow:From train.py:170: start_queue_runners (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version.
Instructions for updating:
To construct input pipelines, use the tf.data module.
2020-03-19 10:49:22.027217: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally

When I training RRPN Faster and it got stuck, is there anything I haven't changed?

@mrlooi
Copy link
Owner

mrlooi commented Mar 19, 2020

hmm not sure why but you've posted tensorflow logs

@NimaDL
Copy link

NimaDL commented Mar 20, 2020

@mrlooi Thank you. How can I solve @a5372935 problem when number of initial proposals are small/empty? I have got same error:
RuntimeError: invalid argument 2: non-empty 3D or 4D input tensor expected but got: [0 x 1 x 28 x 28] at /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/generic/SpatialDilatedMaxPooling.cu:37

@mrlooi
Copy link
Owner

mrlooi commented Mar 20, 2020

I would recommend starting with good RPN anchors. Use the vis_rpn_anchors.py file to visualize the anchors for your dataset.

@a5372935
Copy link
Author

@mrlooi I forgot to ask is the brackets after each loss refer to val_loss?

@mrlooi
Copy link
Owner

mrlooi commented Mar 23, 2020

If I remember correctly, it's the loss for that minibatch.

Actually I had a look again into your log : https://drive.google.com/open?id=1HQfS0Fhqf-ABMcQfg9OOGeyOLTcQcett
The loss values in brackets are certainly way too high, the training was unstable and will not work

@a5372935
Copy link
Author

Yes, the loss for that minibatch is really high, but I think vis_rpn_anchors are all correct. Why is this?

@mrlooi
Copy link
Owner

mrlooi commented Mar 23, 2020

Possibly due to version differences. I used torch 1.0 - 1.1
Or it could be a faulty dataset issue. The default pipeline does not handle missing, faulty or empty groundtruth very well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants