包含以下内容:
- nms_kernel(CPU/GPU)
- PyTorch bindings
nms cuda实现是最基础的版本,根据官方源码可以进行进一步优化。
# 只测试Ada架构 不指定默认编译所有架构 耗时较长: Volta, Ampere, Ada, Hopper, ...
export TORCH_CUDA_ARCH_LIST=Ada
python3 nms.py
输出:
-------------------------------------------------------------------------------------
nboxes=1024
out_nms: ['1021 ', '1022 ', '1023 '], len of keep: 950, time:0.26456594ms
out_nms_th: ['1021 ', '1022 ', '1023 '], len of keep: 950, time:0.19218683ms
-------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------
nboxes=2048
out_nms: ['2045 ', '2046 ', '2047 '], len of keep: 1838, time:0.47256470ms
out_nms_th: ['2044 ', '2045 ', '2047 '], len of keep: 1838, time:0.39437532ms
-------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------
nboxes=4096
out_nms: ['4092 ', '4093 ', '4095 '], len of keep: 3598, time:0.89909315ms
out_nms_th: ['4093 ', '4094 ', '4095 '], len of keep: 3598, time:1.03515625ms
-------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------
nboxes=8192
out_nms: ['8189 ', '8190 ', '8191 '], len of keep: 7023, time:1.49935722ms
out_nms_th: ['8189 ', '8190 ', '8191 '], len of keep: 7023, time:3.39094877ms
-------------------------------------------------------------------------------------