Wrong numbers for multiclass acc/prec/f1 if some class is not listed in true/pred tensors #295

notonlyvandalzzz · 2021-06-15T05:56:05Z

🐛 Bug

TM's output for precision, accuracy, f1 differs from sklearn' version of same metrics with average='macro' setting used and labels in pred/true tensors for all classes except one

To Reproduce

Steps to reproduce the behavior:

Get one tensor with classlabels as y_pred, copy it to y_true, and pull thru torchmetrics.functional.f1/accuracy/precision with average="macro", num_classes=number of your classes used for that.
Convert that tensors to lists and pass into sklearn metrics, of course with average=macro where its possible
Compare numbers

Code sample

y_pred = y_true = torch.tensor([1, 6, 6, 6, 3, 6, 3, 6, 6, 3, 6, 3, 6, 6, 3, 6, 3, 3, 6, 6, 6, 6, 6, 6,
        6, 3, 5, 6, 6, 3, 6, 6, 6, 6, 3, 6, 6, 3, 3, 1, 3, 1, 6, 3, 3, 1, 3, 6,
        4, 6, 6, 6, 6, 6, 6, 3, 3, 6, 3, 6, 1, 0, 5, 3, 6, 6, 6, 6, 3, 0, 6, 3,
        3, 3, 6, 3, 4]) # no class with id=2 !
print(f'Metrics:')
print(f'F1: {f1_score(y_true.numpy().astype(int).tolist(), y_pred.numpy().astype(int).tolist(), average="macro")}')
print(f'ACC: {accuracy_score(y_true.numpy().astype(int).tolist(), y_pred.numpy().astype(int).tolist())}')
print(f'PREC: {precision_score(y_true.numpy().astype(int).tolist(), y_pred.numpy().astype(int).tolist(), average="macro")}')
print(f'TM F1: {torchmetrics.functional.f1(y_pred, y_true, average="macro", num_classes=7)}')
print(f'TM acc: {torchmetrics.functional.accuracy(y_pred, y_true, average="macro", num_classes=7)}')
print(f'TM prec: {torchmetrics.functional.precision(y_pred, y_true, average="macro", num_classes=7)}')

Metrics:
F1: 1.0
ACC: 1.0
PREC: 1.0
TM F1: 0.8571428656578064
TM acc: 0.8571428656578064
TM prec: 0.8571428656578064

Expected behavior

They're should be 1.0, similar to sklearn version
But adding even single item with id=2 makes numbers right, so its definitely a bug when label is exist but no examples for it in pred and true tensors

Environment

CUDA:
- GPU:
- NVIDIA Quadro RTX 5000
- available: True
- version: 11.0
Packages:
- numpy: 1.20.3
- pyTorch_debug: True
- pyTorch_version: 1.7.0+cu110
- pytorch-lightning: 1.3.4
- torchmetrics: 0.3.2
- tqdm: 4.51.0
System:
- OS: Linux
- architecture:
- 64bit
-
- processor: x86_64
- python: 3.7.10
- version: Add metric code from lightning #1 SMP Sun Feb 14 18:10:38 EST 2021

The text was updated successfully, but these errors were encountered:

github-actions · 2021-06-15T05:56:43Z

Hi! thanks for your contribution!, great first issue!

vatch123 · 2021-06-17T05:39:15Z

Hi @Borda I can have a look at this if someone is already not involved here.

Borda · 2021-06-17T06:13:14Z

Hi @Borda I can have a look at this if someone is already not involved here.

That would be great, thank you!

SkafteNicki · 2021-06-17T08:09:40Z

Hi @vatch123
Thank you for wanting to contribute. Just wanted to let you know that I debugged this a bit I think it can be solved by changing this line
https://github.com/PyTorchLightning/metrics/blob/1841cad3839f5d1907a1bb8bb6a266de5c5333f9/torchmetrics/functional/classification/stat_scores.py#L186
to also account for the zero_div_mask so

weights = torch.where(zero_div_mask | ignore_mask, tensor(0.0, device=weights.device), weights)

vatch123 · 2021-06-17T11:07:46Z

Hi @vatch123
Thank you for wanting to contribute. Just wanted to let you know that I debugged this a bit I think it can be solved by changing this line
https://github.com/PyTorchLightning/metrics/blob/1841cad3839f5d1907a1bb8bb6a266de5c5333f9/torchmetrics/functional/classification/stat_scores.py#L186

to also account for the zero_div_mask so
weights = torch.where(zero_div_mask | ignore_mask, tensor(0.0, device=weights.device), weights)

Hi. Thanks for this. Let me have a look at this.

celsofranssa · 2021-07-25T21:41:45Z

Any update on this?

SkafteNicki · 2021-07-26T13:27:17Z

@celsofranssa we have a open PR #303 that should fix it.

notonlyvandalzzz added bug / fix Something isn't working help wanted Extra attention is needed labels Jun 15, 2021

notonlyvandalzzz changed the title ~~Wrong numbers for multiclass acc/prec/f1 if some class is missing in labels~~ Wrong numbers for multiclass acc/prec/f1 if some class is not listed in true/pred tensors Jun 15, 2021

Borda added the Priority Critical task/issue label Jun 15, 2021

Borda assigned SkafteNicki Jun 15, 2021

Borda assigned vatch123 and unassigned SkafteNicki Jun 17, 2021

SkafteNicki mentioned this issue Jun 17, 2021

F1 Macro score wrong(?) when label never appears #300

Closed

vatch123 mentioned this issue Jun 17, 2021

Fix metrics in macro average #303

Merged

4 tasks

Borda added this to the v0.4 milestone Jul 2, 2021

celsofranssa mentioned this issue Jul 25, 2021

The Macro-F1 metric is being calculated incorrectly. #403

Closed

SkafteNicki closed this as completed in #303 Aug 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong numbers for multiclass acc/prec/f1 if some class is not listed in true/pred tensors #295

Wrong numbers for multiclass acc/prec/f1 if some class is not listed in true/pred tensors #295

notonlyvandalzzz commented Jun 15, 2021

github-actions bot commented Jun 15, 2021

vatch123 commented Jun 17, 2021

Borda commented Jun 17, 2021

SkafteNicki commented Jun 17, 2021

vatch123 commented Jun 17, 2021

celsofranssa commented Jul 25, 2021

SkafteNicki commented Jul 26, 2021

Wrong numbers for multiclass acc/prec/f1 if some class is not listed in true/pred tensors #295

Wrong numbers for multiclass acc/prec/f1 if some class is not listed in true/pred tensors #295

Comments

notonlyvandalzzz commented Jun 15, 2021

🐛 Bug

To Reproduce

Code sample

Expected behavior

Environment

github-actions bot commented Jun 15, 2021

vatch123 commented Jun 17, 2021

Borda commented Jun 17, 2021

SkafteNicki commented Jun 17, 2021

vatch123 commented Jun 17, 2021

celsofranssa commented Jul 25, 2021

SkafteNicki commented Jul 26, 2021