You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
our detector does not output scores, thus we set all to 1, which gives wrong results using the coco metrics. We know, that the metrics are written assuming that there exist scores, but I believe it should be clearly clarified in the docs that the mAP is not correct if the scores are not set.
More details and an analysis of the cause are following:
[...]
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.663
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.663
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.663
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.663
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.333
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.667
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.667
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.667
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
[...]
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.554
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.554
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.554
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.554
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.333
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.667
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.667
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.667
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
The cause for this is: For computing the AP, a discrete precision recall curve is computed. This curve is created prediction-by-prediction sorted by the score. But as the score is the same for all, they should actually be considered all at once, because there cannot be a different score threshold which excludes one prediction over the other (this should be independent of order).
Thus, the resulting PR-curves are different and not correct:
Reference code for plotting
importmatplotlib.pyplotaspltimportnumpyasnpdefplot_pr_curves(eval_results, cats, output_dir="."):
""" Function to plot Precision-Recall curves based on the accumulated results from COCOeval. """# Extract the necessary evaluation parametersparams=eval_results['params']
precision=eval_results['precision']
#recall = eval_results['recall']iouThrs=params.iouThrs# IoU thresholdscatIds=params.catIds# Category IDsareaRngLbl=params.areaRngLbl# Labels for area rangesrecThrs=np.array(params.recThrs) # Recall thresholdsmaxDets=params.maxDets# Max detectionsk=0# category = aa=0# area range = allm=2# max detections = 100t=0# IoU threshold = 0.5pr=precision[t, :, k, a, m]
# Create the plotplt.figure()
plt.plot(recThrs, pr, marker='o', label=f"IoU={iouThrs[t]:.2f}")
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title(f"Precision-Recall Curve\nCategory: {cats[catIds[k]]['name']}, Area: {areaRngLbl[a]}, MaxDets: {maxDets[m]}")
plt.legend()
# Create a unique filename based on category, IoU, area, and maxDetplt.savefig(f"{output_dir}/PR_Curve_cat{cats[catIds[k]]['name']}_iou{iouThrs[t]:.2f}_area{areaRngLbl[a]}_maxDet{maxDets[m]}.png")
plt.close()
if__name__=="__main__":
...
plot_pr_curves(eval.eval, coco.cats, "./")
where the tp_sum and fp_sum are computed as cumulative sum, but this is wrong if the scores are equal. Then the cumulative sum should contain all predictions. It may only increment if the score from one to the next prediction differs, otherwise all must be the same value or for efficiency be collapsed.
Effectively, this could be added on top of the current implementation (e.g. a switch which allows for equal scores).
The text was updated successfully, but these errors were encountered:
Hi,
our detector does not output scores, thus we set all to
1
, which gives wrong results using the coco metrics. We know, that the metrics are written assuming that there exist scores, but I believe it should be clearly clarified in the docs that the mAP is not correct if the scores are not set.More details and an analysis of the cause are following:
Example with source code
Output of the example source code
Output will be:
The cause for this is: For computing the AP, a discrete precision recall curve is computed. This curve is created prediction-by-prediction sorted by the score. But as the score is the same for all, they should actually be considered all at once, because there cannot be a different score threshold which excludes one prediction over the other (this should be independent of order).
Thus, the resulting PR-curves are different and not correct:
Reference code for plotting
The cause for this issue lies here:
cocoapi/PythonAPI/pycocotools/cocoeval.py
Lines 378 to 379 in 8c9bcc3
where the
tp_sum
andfp_sum
are computed as cumulative sum, but this is wrong if the scores are equal. Then the cumulative sum should contain all predictions. It may only increment if the score from one to the next prediction differs, otherwise all must be the same value or for efficiency be collapsed.Effectively, this could be added on top of the current implementation (e.g. a switch which allows for equal scores).
The text was updated successfully, but these errors were encountered: