Binned implementation of PrecisionRecallCurve is extremely slow for large batch sizes #1492
Labels
bug / fix
Something isn't working
help wanted
Extra attention is needed
Priority
Critical task/issue
Milestone
🐛 Bug
Since the classification metrics refactor in torchmetrics 0.10, the update step of
PrecisionRecallCurve
and metrics based on it (such asAveragePrecision
) is extremely slow and memory-consuming when using the binned implementation with many samples. This is, for example, the case in semantic segmentation scenarios, where every pixel is a sample.The binned implementation is intended to be both faster and memory efficient than computing the exact precision-recall curve. This is currently not the case.
To Reproduce
Output (using an RTX 3090):
Binned average precision was 1600x slower than exact average precision.
The memory consumption of exact AP on the GPU was 260 MB, while binned AP consumed 20,000 MB.
Expected behavior
The binned implementation of
PrecisionRecallCurve
, i.e., withthresholds != None
, should be faster and more memory-efficient.Environment
conda
,pip
, build from source): 0.11.1 (pip)Additional context
The current implementation of
_binary_precision_recall_curve_update
compares all predictions with all thresholds at once, which is very memory consuming:Computing
preds_t
andunique_mapping
is quite fast, but both consume large amounts of memory (number of samples * number of thresholds * 8 byte)._bincount
, on the other hand, does not consume much memory, but a lot of time.Iterating over the thresholds one by one is significantly faster (0.6 sec instead of 10 sec) and only consumes 150 MB memory:
Using the implementation of
BinnedPrecisionRecallCurve
from torchmetrics 0.9 is even faster (0.1 sec) and uses only 60 MB:The text was updated successfully, but these errors were encountered: