INCREASING NMS SPEED #679

glenn-jocher · 2019-12-03T01:11:47Z

Non Maximal Suppression (NMS) of bounding boxes is a significant speed constraint during testing. I am opening this issue to try to determine options for speeding up this operation. I am going to compare the default NMS method 'MERGE' with two newly available PyTorch methods. If anyone has any additional methods we could test, please post here.

yolov3/utils/utils.py

Line 456 in cadd2f7

def non_max_suppression(prediction, conf_thres=0.5, nms_thres=0.5):

The test code is below. Hardware is a 2080Ti.

python3 test.py --weights ultralytics68.pt --nms-thres 0.6 --img-size 512 --device 0

UPDATE: THESE ARE OLD RESULTS, SEE BOTTOM OF THREAD FOR IMPROVED RESULTS

	Speed mm:ss	COCO mAP @0.5...0.95	COCO mAP @0.5
ultralytics `'OR'`	8:20	39.7	60.3
ultralytics `'AND'`	7:38	39.6	60.1
ultralytics `'SOFT'`	12:00	39.1	58.7
ultralytics `'MERGE'`	11:25	40.2	60.4
torchvision.ops.boxes.nms()	5:08	39.7	60.3
torchvision.ops.boxes.batched_nms()	6:00	39.7	60.3

The text was updated successfully, but these errors were encountered:

glenn-jocher · 2019-12-03T04:21:26Z

Results of the test is that torchvision.ops.boxes.nms() is fastest but not the highest mAP. Ultralytics MERGE method increases AP + 0.5, so I will leave it for testing (when calling test.py directly using --conf-thres 0.001), and use torchvision.ops.boxes.nms() for calculating mAP when training using --conf-thres 0.10 (to increase training speed).

yolov3/utils/utils.py

Lines 513 to 517 in 1e9ddc5

    
           # Set NMS method https://github.com/ultralytics/yolov3/issues/679 
        
           # 'OR', 'AND', 'MERGE', 'VISION', 'VISION_BATCHED' 
        
           method = 'MERGE' if conf_thres <= 0.01 else 'VISION'  # MERGE is highest mAP, VISION is fastest

FranciscoReveriano · 2019-12-03T14:17:19Z

I will look more into this during the weekend.

developer0hye · 2019-12-05T01:55:49Z

great works!

omizonly · 2020-01-15T15:12:19Z

torchvision. ops implements operators that are specific for Computer Vision. Those operators currently do not support TorchScript. Performs non-maximum suppression (NMS) on the boxes according to their intersection-over-union (IoU)

AttributeError: module 'torchvision' has no attribute 'ops'

what should I do?

glenn-jocher · 2020-01-15T17:16:23Z

@omizonly what is your use case for TorchScript?

omizonly · 2020-01-15T17:30:48Z

@omizonly what is your use case for TorchScript?

tensorflow= 1.3.1

glenn-jocher · 2020-01-15T18:31:17Z

@omizonly I don't understand, can you elaborate? This repo only runs PyTorch, and exports to ONNX for onward use in other formats, however we clearly can not support you with problems in those other formats. I suggest you raise an issue on the PyTorch or TF repos.

glenn-jocher · 2020-01-16T19:16:06Z

I'll close this issue for now as the original issue appears to have been resolved, and/or no activity has been seen for some time. Feel free to comment if this is not the case.

glenn-jocher · 2020-01-19T21:16:18Z

Quick update with latest code on one T4 GPU. Second line is current default.

python3 test.py --weights yolov3-spp-ultralytics.pt --cfg yolov3-spp.cfg --img 608

	Time sec/image	Time mm:ss	COCO mAP @0.5...0.95	COCO mAP @0.5
`'vision_batched', multi_cls=False`	43ms	3:36	40.2	60.4
`'vision_batched', multi_cls=True`	48ms	4:01	40.9	61.4
`'merge', multi_cls=True`	172ms	14:23	41.3	61.7

FranciscoReveriano · 2020-01-28T00:28:18Z

Is there a way to make the model print the JSON file if it detects an object regardless of classification?

Zzh-tju · 2020-03-03T07:28:06Z

Hi, I saw a Fast NMS proposed by YOLACT. How is it? https://arxiv.org/abs/1912.06218

glenn-jocher · 2020-03-03T19:44:06Z

@Zzh-tju yes that seems an interesting approach. They apply NMS as a matrix operation to remove the for loop, which they say runs much faster with a minimum mAP penalty.

Depending on the conf-thres used, NMS may or may not be a very expensive operation in this repo. For most actual use applications with conf-thres around 0.1-0.9, NMS is not a speed concern, taking <10% of the total processing time for an image, but when calculating mAP near conf-thres = 0.0001 for example, NMS may take up 90% of the processing time.

If you can try to implement a fast NMS experiment here that would be very useful. The NMS function is here. In the meantime I will update this thread with the latest speeds on a T4 colab instance.

yolov3/utils/utils.py

Lines 504 to 512 in dce753e

    
           def non_max_suppression(prediction, conf_thres=0.5, iou_thres=0.5, multi_cls=True, classes=None, agnostic=False): 
        
               """ 
        
               Removes detections with lower object confidence score than 'conf_thres' 
        
               Non-Maximum Suppression to further filter detections. 
        
               Returns detections with shape: 
        
                   (x1, y1, x2, y2, object_conf, conf, class) 
        
               """ 
        
               # NMS methods https://github.com/ultralytics/yolov3/issues/679 'or', 'and', 'merge', 'vision', 'vision_batch'

UPDATE: I've posted an issue on yolact repo for this dbolya/yolact#366 (comment)

glenn-jocher · 2020-03-04T06:42:09Z

Update: I discovered a majority of time in test.py was spent building pycocotools JSON files for official mAPs. If I turn off this functionality (compute mAP only with repo code) I get the following times for the 5k COCO2014 val images. Machine is a 12-vCPU V100 instance.

python3 test.py --weights yolov3-spp-ultralytics.pt --cfg yolov3-spp --img 608

NMS method	Time ms/img	Time mm:ss	mAP @0.5:0.95	mAP @0.5
`'vision_batched'` (default)	15.2 ms	1:16	41.9	61.8
`'merge'`	103 ms	8:35	42.3	62.0
`'fast_batched'`	14.6 ms	1:13	41.5	61.5

glenn-jocher · 2020-03-04T08:19:40Z

@Zzh-tju FastNMS updates have been committed and pushed now after testing.

yolov3/utils/utils.py

Lines 564 to 571 in f915bf1

    
           elif method == 'fast_batch':  # FastNMS from https://github.com/dbolya/yolact 
        
               boxes += c.view(-1, 1) * max_wh 
        
               iou = box_iou(boxes, boxes).triu_(diagonal=1)  # zero upper triangle iou matrix 
        
               i = iou.max(dim=0)[0] < iou_thres 
        
           output[image_i] = pred[i] 
        
           continue

glenn-jocher · 2020-03-04T17:29:04Z

@Zzh-tju to clear up the timing a bit more, I added profiling code to test.py that specifically tracks inference and NMS times in e482392. This can be accessed with the --profile flag:

python3 test.py --weights yolov3-spp-ultralytics.pt --img 608 --conf 0.001 --profile

I ran with both default torchvision NMS and the yolact FastNMS, and actually saw a slight speed decrease with FastNMS:

Default: Profile results: 1.3/6.9/8.1 ms inference/NMS/total per image
FastNMS: Profile results: 1.3/7.1/8.4 ms inference/NMS/total per image

So perhaps the slight speed increase from FastNMS observed in the total test time is due simply to a reduced box count produced by this NMS method, which results in less postprocessing work during testing (mAP calculation etc.).

The other surprise was the great amount of total time spent on NMS vs inference. Even under the default settings 6.9/8.1 = 85% of the total time is spent on NMS!

glenn-jocher · 2020-03-04T19:18:51Z

CORRECTION: My previous analysis was incorrect, it lacked the torch.cuda.synchronize()
operations necessary when profiling cuda operations. I've fixed this in 1430a1e. Corrected results, consistent across several runs:

python3 test.py --weights yolov3-spp-ultralytics.pt --img 608 --conf 0.001 --profile

Default: Profile results: 6.6/1.6/8.2 ms inference/NMS/total per image
FastNMS: Profile results: 6.6/1.9/8.5 ms inference/NMS/total per image

Conclusion is that inference uses most (80%) of the runtime in both cases, and that FastNMS appears to run slightly slower than default torchvision.ops.boxes.batched_nms().

glenn-jocher · 2020-03-05T18:31:48Z

Inference can be sped up with larger batch sizes, but NMS is run per image in all cases, so the only ways to affect it's speed currently are here. Note that the 1.6 ms profile time uses all default settings though (none of these speedups are applied).

Increase your conf_thres
Turn off multi_cls
Decrease iou_thres

yolov3/utils/utils.py

Line 504 in 1dc1761

def non_max_suppression(prediction, conf_thres=0.1, iou_thres=0.6, multi_cls=True, classes=None, agnostic=False):

glenn-jocher · 2020-03-08T19:49:57Z

Running a few tests to document effects on speed. These are with a V100 from a docker container, which is slightly slower than running natively.

python3 test.py --cfg yolov3-spp.cfg --weights yolov3-spp-ultralytics.pt --img 608

rect=False
cudnn.deterministic=True, cudnn.benchmark = False:
12.9/1.8/14.8 ms inference/NMS/total per 608x608 image at batch-size 32
cudnn.deterministic=False, cudnn.benchmark = False:
9.9/1.7/11.6 ms inference/NMS/total per 608x608 image at batch-size 32
cudnn.deterministic=False, cudnn.benchmark = True:
9.5/1.7/11.1 ms inference/NMS/total per 608x608 image at batch-size 32

rect=True
cudnn.deterministic=True, cudnn.benchmark = False:
9.8/1.7/11.5 ms inference/NMS/total per 608x608 image at batch-size 32
cudnn.deterministic=False, cudnn.benchmark = False: (default)
6.8/1.7/8.6 ms inference/NMS/total per 608x608 image at batch-size 32
cudnn.deterministic=False, cudnn.benchmark = True:
18.2/1.7/19.9 ms inference/NMS/total per 608x608 image at batch-size 32
cudnn.deterministic=False, cudnn.benchmark = False, bs64
7.0/1.7/8.8 ms inference/NMS/total per 608x608 image at batch-size 64
cudnn.deterministic=False, cudnn.benchmark = False, bs1
14.0/2.0/16.0 ms inference/NMS/total per 608x608 image at batch-size 1
cudnn.deterministic=False, cudnn.benchmark = False, no contiguous() in models.py L207
6.8/1.7/8.5 ms inference/NMS/total per 608x608 image at batch-size 32
cudnn.deterministic=False, cudnn.benchmark = False, no contiguous(), reshape in models.py L207
6.8/1.7/8.5 ms inference/NMS/total per 608x608 image at batch-size 32

Running default natively:
Speed: 6.7/1.6/8.2 ms inference/NMS/total per 608x608 image at batch-size 32
no contiguous():
Speed: 6.6/1.6/8.2 ms inference/NMS/total per 608x608 image at batch-size 32
no contiguous() bs1:
Speed: 12.8/1.8/14.6 ms inference/NMS/total per 608x608 image at batch-size 1
yes contiguous() bs1:
Speed: 12.7/1.8/14.5 ms inference/NMS/total per 608x608 image at batch-size 1
no contiguous() bs1 img-size 512
Speed: 12.5/1.8/14.3 ms inference/NMS/total per 512x512 image at batch-size 1
no contiguous() bs1 img-size 416
Speed: 12.8/1.8/14.6 ms inference/NMS/total per 416x416 image at batch-size 1
no contiguous() bs1 img-size 608 yolov3-tiny
Speed: 3.2/1.8/4.9 ms inference/NMS/total per 608x608 image at batch-size 1

glenn-jocher · 2020-03-10T17:38:06Z

V100:
Speed: 6.6/1.5/8.1 ms inference/NMS/total per 608x608 image at batch-size 32
Speed: 17.2/1.5/18.8 ms inference/NMS/total per 800x800 image at batch-size 1
Speed: 11.8/1.5/13.3 ms inference/NMS/total per 608x608 image at batch-size 1
Speed: 11.6/1.5/13.1 ms inference/NMS/total per 512x512 image at batch-size 1
Speed: 11.6/1.5/13.1 ms inference/NMS/total per 416x416 image at batch-size 1
Speed: 11.6/1.5/13.1 ms inference/NMS/total per 320x320 image at batch-size 1

2080Ti:
Speed: 9.2/1.2/10.4 ms inference/NMS/total per 608x608 image at batch-size 32
Speed: 13.9/1.5/15.4 ms inference/NMS/total per 608x608 image at batch-size 1

CPU:
Speed: 753.0/2.9/756.0 ms inference/NMS/total per 608x608 image at batch-size 1

Zzh-tju · 2020-03-11T21:34:18Z

batch_size=32 means testing 32 images simultaneously including NMS？

glenn-jocher · 2020-03-11T22:14:59Z

@Zzh-tju batch-size 32 means for example a 32x3x608x608 tensor is passed to the model for inference. The inference outputs are passed to NMS, which operates sequentially over the images:
for img in range(32):

yolov3/utils/utils.py

Line 508 in 4089735

    
           def non_max_suppression(prediction, conf_thres=0.1, iou_thres=0.6, multi_label=True, classes=None, agnostic=False):

glenn-jocher · 2020-03-16T01:28:53Z

Test-time augmentation study #931:

Default + 0 ops: 11.8/1.5/13.3 ms inference/NMS/total per 608x608 image at batch-size 1
Default + 1 ops: 18.7/1.6/20.3 ms inference/NMS/total per 608x608 image at batch-size 1
Default + 2 ops: 26.4/1.8/28.2 ms inference/NMS/total per 608x608 image at batch-size 1

glenn-jocher · 2020-03-20T03:26:08Z

Updated V100 speeds with fused inference:
Speed: 11.1/1.7/12.8 ms inference/NMS/total per 608x608 image at batch-size 1 NEW RECORD
Speed: 6.5/1.5/8.1 ms inference/NMS/total per 608x608 image at batch-size 32 NEW RECORD

Default + 0 ops: 11.1/1.7/12.8 ms inference/NMS/total per 608x608 image at batch-size 1
Default + 2 ops: 26.1/1.9/28.1 ms inference/NMS/total per 608x608 image at batch-size 1

glenn-jocher · 2020-03-26T00:48:20Z

SOLOv2 Table 7: Matrix NMS:
https://arxiv.org/pdf/2003.10152.pdf

UPDATE: Unable to reproduce using this code:

            elif method == 'matrix_batch':  # Matrix NMS from https://arxiv.org/abs/2003.10152
                iou = box_iou(boxes, boxes).triu_(diagonal=1)  # upper triangular iou matrix
                m = iou.max(0)[0].view(-1, 1)  # max values
                decay = torch.exp(-(iou ** 2 - m ** 2) / 0.5).min(0)[0]  # gauss with sigma=0.5
                scores *= decay
                i = torch.full((boxes.shape[0],), fill_value=1).bool()

qtw1998 · 2020-04-19T13:03:52Z

torchvision. ops implements operators that are specific for Computer Vision. Those operators currently do not support TorchScript. Performs non-maximum suppression (NMS) on the boxes according to their intersection-over-union (IoU)

AttributeError: module 'torchvision' has no attribute 'ops'

what should I do?

Have you solved it? I met the same problems

github-actions · 2020-05-20T00:09:20Z

This issue is stale because it has been open 30 days with no activity. Remove Stale label or comment or this will be closed in 5 days.

Zzh-tju · 2020-07-01T21:35:45Z

@glenn-jocher Hi, could you tell me why we cannot do NMS cross batches. Currently, NMS is done on images one by one. However, we turn on batch testing.

The number of detections from different images are different, is it the reason why we cannot perform real batch NMS?

glenn-jocher · 2020-07-01T22:11:37Z

@Zzh-tju feel free to play around with the NMS code and try your idea out. If you see performance improvements please submit a PR! Thank you.

Zzh-tju · 2020-07-03T00:21:24Z

@glenn-jocher Now, I just figured out a speed improvement. And will give you a PR later. You can try it and give it more optimization.

Because Torchvision NMS cannot run across images mode. (if we add image related offset for boxes, it will enlarge the size of IoU matrix quadratically). So I have to try Cluster-NMS. I keep the preprocessing of NMS unchanged, and just replace the core part of your merge nms with Cluster-Weighted NMS.

	Batch Size	torchvision merge nms	batch mode Cluster-Weighted NMS	Cluster-Weighted NMS
AP	-	42.9	42.9	42.9
time	4	3.0ms	4.4ms	5.5ms
time	32	2.3ms	3.0ms	4.7ms

Now I want to ask you why with batchsize increase, NMS time decrease? (for torchvision nms)
What's the max batchsize can we use? I run on 2 2080Ti GPUs. Batchsize 32 takes me about 6~7 GB memory per GPU.
I guess if we continue to increase batchsize when testing, it may be benefited more by batch mode Cluster-NMS series.
However, limited by my personal code ability, it might be possible to optimize the code better.

I think maybe the best way is to intergrate the preprocessing of NMS into batch mode either, even if it will bring us a slight performance drop. Now it takes about 1.3~1.5ms for preprocessing. And just 0.8 ms for your torchvision merge NMS. It still room for accelarating.

glenn-jocher · 2020-07-03T00:30:38Z

@Zzh-tju ah! Thanks for the interesting study. We've actually discovered that in yolov5 the regression is improved enough that we can stop using merge, and simply use the default pytorch NMS to get the same results. So the current NMS strategy we have is in yolov5 function is not to use merge anymore.

It is an interesting idea to do a batched NMS approach instead of calling the nms function once per image. Your results show a significant improvement, 2.3 / 3.0 is about 25% faster (!). This would make a huge improvement on yolov5s for example, which has inference time of 2.1ms per image at batch-size 32 FP16, about half of which is used up with NMS. See speeds here. NMS is about 1 ms per image in these numbers, so a 25% speedup there would be noticeable in the table.
https://github.com/ultralytics/yolov5#pretrained-checkpoints

glenn-jocher · 2020-07-03T00:34:54Z

Right now the boxes are offset by (class * max_image_size) to get batched per image (so different classes never overlap). I suppose to run once per batch we would offset boxes by (class * max_image_size * image_index)? Are you using torchvision.ops.nms() or torchvision.ops._batched_nms()?

Zzh-tju · 2020-07-03T00:41:34Z

@glenn-jocher no, you misunderstand me. My question is why with batchsize increase, NMS speed increase either?

glenn-jocher · 2020-07-03T00:57:59Z

@Zzh-tju in my experiments with yolov5, NMS speed is the same no matter the batch size. For example from the notebook:

Notebook

!python test.py --weights yolov5s.pt --data coco128.yaml --img 640 --batch 1
!python test.py --weights yolov5s.pt --data coco128.yaml --img 640 --batch 8
!python test.py --weights yolov5s.pt --data coco128.yaml --img 640 --batch 32

Output:

Namespace(augment=False, batch_size=1, conf_thres=0.001, data='./data/coco128.yaml', device='', img_size=640, iou_thres=0.65, merge=False, save_json=False, single_cls=False, task='val', verbose=False, weights='yolov5s.pt')
Using CUDA device0 _CudaDeviceProperties(name='Tesla T4', total_memory=15079MB)

Model Summary: 191 layers, 7.46816e+06 parameters, 7.46816e+06 gradients
Fusing layers...
Model Summary: 140 layers, 7.45958e+06 parameters, 7.45958e+06 gradients
Caching labels ../coco128/labels/train2017 (126 found, 0 missing, 2 empty, 0 duplicate, for 128 images): 100% 128/128 [00:00<00:00, 8725.21it/s]
               Class      Images     Targets           P           R      [email protected]  [email protected]:.95: 100% 128/128 [00:03<00:00, 37.63it/s]
                 all         128         929       0.379        0.74       0.676        0.44
Speed: 9.3/1.8/11.1 ms inference/NMS/total per 640x640 image at batch-size 1


Namespace(augment=False, batch_size=8, conf_thres=0.001, data='./data/coco128.yaml', device='', img_size=640, iou_thres=0.65, merge=False, save_json=False, single_cls=False, task='val', verbose=False, weights='yolov5s.pt')
Using CUDA device0 _CudaDeviceProperties(name='Tesla T4', total_memory=15079MB)

Model Summary: 191 layers, 7.46816e+06 parameters, 7.46816e+06 gradients
Fusing layers...
Model Summary: 140 layers, 7.45958e+06 parameters, 7.45958e+06 gradients
Caching labels ../coco128/labels/train2017 (126 found, 0 missing, 2 empty, 0 duplicate, for 128 images): 100% 128/128 [00:00<00:00, 5722.17it/s]
               Class      Images     Targets           P           R      [email protected]  [email protected]:.95: 100% 16/16 [00:02<00:00,  5.41it/s]
                 all         128         929       0.381       0.744        0.68       0.442
Speed: 4.1/2.2/6.3 ms inference/NMS/total per 640x640 image at batch-size 8


Namespace(augment=False, batch_size=32, conf_thres=0.001, data='./data/coco128.yaml', device='', img_size=640, iou_thres=0.65, merge=False, save_json=False, single_cls=False, task='val', verbose=False, weights='yolov5s.pt')
Using CUDA device0 _CudaDeviceProperties(name='Tesla T4', total_memory=15079MB)

Model Summary: 191 layers, 7.46816e+06 parameters, 7.46816e+06 gradients
Fusing layers...
Model Summary: 140 layers, 7.45958e+06 parameters, 7.45958e+06 gradients
Caching labels ../coco128/labels/train2017 (126 found, 0 missing, 2 empty, 0 duplicate, for 128 images): 100% 128/128 [00:00<00:00, 9776.04it/s]
               Class      Images     Targets           P           R      [email protected]  [email protected]:.95: 100% 4/4 [00:04<00:00,  1.12s/it]
                 all         128         929       0.385       0.752       0.692       0.452
Speed: 4.2/2.1/6.3 ms inference/NMS/total per 640x640 image at batch-size 32

So 1.8ms, 2.2ms, 2.1ms at batch sizes 1, 8, 32. Basically NMS speed per image is not correlated to batch size.

Zzh-tju · 2020-07-03T09:44:14Z

got it @glenn-jocher , I will do more test with batchsize.

Zzh-tju · 2020-09-07T16:19:38Z

@glenn-jocher Hi, I have just finished a marginal work about Batch Mode Weighted Cluster-NMS for speeding up NMS. You can check https://github.com/Zzh-tju/yolov5 for details. My conclusion is Batch mode Weighted Cluster-NMS will benefit us when TTA is used.

glenn-jocher · 2020-09-07T22:34:37Z

@Zzh-tju ah, very interesting! I'll check out the forked repo.

glenn-jocher · 2020-09-08T02:32:57Z

@Zzh-tju I looked things over. You've clearly done a lot of work and experimentation!

I see it's hard to provide substantial gains off of the basic NMS unfortunately. I think this is because box regression is improving over past works, so perhaps the gains presented by merging two 0.90 iou boxes are less than for example merging two 0.5 iou boxes. It's unfortunate, because actually one of the yolov5 changes is increased grid sensetivity. In yolov3, only one cell per output layer could trigger on an object. In yolov5, >=3 cells per output layer always trigger per object (the nearest 3), so I'd expect many more boxes being proposed by yolov5 than by yolov3. It's frustrating that there isn't a better way to exploit all these extra statistics.

One very interesting piece of information I found out during the TTA and Ensembling work, I discovered that merging output grids always produced better results than appending output boxes togethor. If you look at the YOLOv5 ensembling module you will see that there are 3 options:
https://github.com/ultralytics/yolov5/blob/cab36f72a852ef00e8b42d3283ba9b2fc757b17f/models/experimental.py#L117-L129

mean ensemble: performs mean() of all output grids, i.e. YOLOv5s output small output grid and YOLOv5m small output grid are the same shape, this takes the mean() of the two grids. Best results.
max ensemble: same as mean(), but applies max(). Poor results.
nms ensemble: appends all output boxes togethor for NMS to sort out. Ok results.

If there was a way to mean() TTA output grids the way that mean ensemble works, this might produce the best results, but it is very complicated due to the varying output shapes unfortunately, so abandoned this effort.

Zzh-tju · 2020-09-08T02:48:03Z

@glenn-jocher wait a second, why do TTA output grids have different shape of outputs？

Zzh-tju · 2020-09-08T02:53:15Z

@glenn-jocher And I did saw an improvement when merging two 0.8 IoU boxes rather than two 0.65 boxes.

glenn-jocher · 2020-09-08T04:28:00Z

@Zzh-tju ensemble output grids will have the same shape, for example if you run both YOLOv5s and YOLOv5m at the same image size, the 3 output grids from YOLOv5s are the same size as from YOLOv5m.

TTA uses different inference sizes as part of it's augmentation, so naturally the output grids will change in size, and can no longer be directly meaned.

Hmm, interesting, 0.8 IoU is higher than I've ever tried. I think the more accurate the box regressions, the higher you can raise the IoU threshold. What was the improvement you saw using 0.8 IoU?

Zzh-tju · 2020-09-08T05:11:49Z

@glenn-jocher see the results in https://github.com/Zzh-tju/yolov5. weighted threshold is the merging threshold

Zzh-tju · 2020-09-08T05:19:38Z

@glenn-jocher

Do you mean with input size change, the size of output grid map will change too？

glenn-jocher · 2020-09-08T06:03:18Z

@Zzh-tju yes. YOLOv5 strides are 8, 16, 32 on the small, medium and large object output layers. So a 640x640 image will have 3 output grids of size 20x20, 40x40, 80x80.

The same output grids for a 320x320 image are 10x10, 20x20, 40x40.

glenn-jocher added the enhancement New feature or request label Dec 3, 2019

glenn-jocher mentioned this issue Dec 3, 2019

non_max_suppression with nms_style == 'SOFT' will output all proposal boxes #362

Closed

glenn-jocher mentioned this issue Dec 4, 2019

efficient calling test dataloader during training #688

Merged

glenn-jocher mentioned this issue Dec 20, 2019

MULTI-CLASS OUTPUT #732

Closed

glenn-jocher closed this as completed Jan 16, 2020

glenn-jocher mentioned this issue Feb 14, 2020

About Multi-Label NMS tianzhi0549/FCOS#175

Closed

glenn-jocher mentioned this issue Mar 4, 2020

FastNMS on Ultralytics YOLOv3 dbolya/yolact#366

Open

glenn-jocher reopened this Mar 4, 2020

glenn-jocher mentioned this issue Mar 5, 2020

test.py NMS ,conf, j = pred[:, 5:].max(1) #896

Closed

glenn-jocher mentioned this issue Mar 10, 2020

CSPResNeXt50-PANet-SPP #698

Closed

glenn-jocher mentioned this issue Apr 19, 2020

what's the original NMS method in yolov3？ #1069

Closed

sssmost mentioned this issue Apr 28, 2020

How to extract all class probabilities for each bounding box? #968

Closed

github-actions bot added the Stale Stale and schedule for closing soon label May 20, 2020

github-actions bot closed this as completed May 25, 2020

kirinlq mentioned this issue Apr 24, 2021

NMS’ test #1745

Closed

INCREASING NMS SPEED #679

INCREASING NMS SPEED #679

Comments

glenn-jocher commented Dec 3, 2019 • edited Loading

glenn-jocher commented Dec 3, 2019 • edited Loading

FranciscoReveriano commented Dec 3, 2019

developer0hye commented Dec 5, 2019

omizonly commented Jan 15, 2020

glenn-jocher commented Jan 15, 2020

omizonly commented Jan 15, 2020

glenn-jocher commented Jan 15, 2020

glenn-jocher commented Jan 16, 2020

glenn-jocher commented Jan 19, 2020 • edited Loading

FranciscoReveriano commented Jan 28, 2020

Zzh-tju commented Mar 3, 2020

glenn-jocher commented Mar 3, 2020 • edited Loading

glenn-jocher commented Mar 4, 2020 • edited Loading

glenn-jocher commented Mar 4, 2020

glenn-jocher commented Mar 4, 2020 • edited Loading

glenn-jocher commented Mar 4, 2020 • edited Loading

glenn-jocher commented Mar 5, 2020

glenn-jocher commented Mar 8, 2020 • edited Loading

glenn-jocher commented Mar 10, 2020 • edited Loading

Zzh-tju commented Mar 11, 2020

glenn-jocher commented Mar 11, 2020 • edited Loading

glenn-jocher commented Mar 16, 2020 • edited Loading

glenn-jocher commented Mar 20, 2020

glenn-jocher commented Mar 26, 2020 • edited Loading

qtw1998 commented Apr 19, 2020

github-actions bot commented May 20, 2020

Zzh-tju commented Jul 1, 2020

glenn-jocher commented Jul 1, 2020

Zzh-tju commented Jul 3, 2020 • edited Loading

glenn-jocher commented Jul 3, 2020

glenn-jocher commented Jul 3, 2020

Zzh-tju commented Jul 3, 2020

glenn-jocher commented Jul 3, 2020 • edited Loading

Zzh-tju commented Jul 3, 2020

Zzh-tju commented Sep 7, 2020 • edited Loading

glenn-jocher commented Sep 7, 2020

glenn-jocher commented Sep 8, 2020

Zzh-tju commented Sep 8, 2020

Zzh-tju commented Sep 8, 2020 • edited Loading

glenn-jocher commented Sep 8, 2020

Zzh-tju commented Sep 8, 2020

Zzh-tju commented Sep 8, 2020

glenn-jocher commented Sep 8, 2020

glenn-jocher commented Dec 3, 2019 •

edited

Loading

glenn-jocher commented Dec 3, 2019 •

edited

Loading

glenn-jocher commented Jan 19, 2020 •

edited

Loading

glenn-jocher commented Mar 3, 2020 •

edited

Loading

glenn-jocher commented Mar 4, 2020 •

edited

Loading

glenn-jocher commented Mar 4, 2020 •

edited

Loading

glenn-jocher commented Mar 4, 2020 •

edited

Loading

glenn-jocher commented Mar 8, 2020 •

edited

Loading

glenn-jocher commented Mar 10, 2020 •

edited

Loading

glenn-jocher commented Mar 11, 2020 •

edited

Loading

glenn-jocher commented Mar 16, 2020 •

edited

Loading

glenn-jocher commented Mar 26, 2020 •

edited

Loading

Zzh-tju commented Jul 3, 2020 •

edited

Loading

glenn-jocher commented Jul 3, 2020 •

edited

Loading

Zzh-tju commented Sep 7, 2020 •

edited

Loading

Zzh-tju commented Sep 8, 2020 •

edited

Loading