Fix metrics in macro average #303

vatch123 · 2021-06-17T18:23:00Z

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure to update the docs?
Did you write any new necessary tests?

What does this PR do?

Fixes #295 and Fixes #300.

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

vatch123 · 2021-06-17T18:27:28Z

Hey @SkafteNicki changing the weights to account for missing classes seems to fix the accuracy, and precision but not f1 score. Have a look at the following snippet.

import torch
import torchmetrics
from sklearn.metrics import f1_score, accuracy_score, precision_score

y_pred = y_true = torch.tensor([1, 6, 6, 6, 3, 6, 3, 6, 6, 3, 6, 3, 6, 6, 3, 6, 3, 3, 6, 6, 6, 6, 6, 6,
        6, 3, 5, 6, 6, 3, 6, 6, 6, 6, 3, 6, 6, 3, 3, 1, 3, 1, 6, 3, 3, 1, 3, 6,
        4, 6, 6, 6, 6, 6, 6, 3, 3, 6, 3, 6, 1, 0, 5, 3, 6, 6, 6, 6, 3, 0, 6, 3,
        3, 3, 6, 3, 4]) # no class with id=2 !
print(f'Metrics:')
print(f'F1: {f1_score(y_true.numpy().astype(int).tolist(), y_pred.numpy().astype(int).tolist(), average="macro")}')
print(f'ACC: {accuracy_score(y_true.numpy().astype(int).tolist(), y_pred.numpy().astype(int).tolist())}')
print(f'PREC: {precision_score(y_true.numpy().astype(int).tolist(), y_pred.numpy().astype(int).tolist(), average="macro")}')
print(f'TM F1: {torchmetrics.functional.f1(y_pred, y_true, average="macro", num_classes=7)}')
print(f'TM acc: {torchmetrics.functional.accuracy(y_pred, y_true, average="macro", num_classes=7)}')
print(f'TM prec: {torchmetrics.functional.precision(y_pred, y_true, average="macro", num_classes=7)}')

Metrics:
F1: 1.0
ACC: 1.0
PREC: 1.0
TM F1: 0.8571428656578064
TM acc: 1.0
TM prec: 1.0

I would look into the f1 score issue tomorrow. I guess we should test this.

SkafteNicki · 2021-06-17T19:20:40Z

Hi @vatch123,
Could you try deleting this line of code:
https://github.com/PyTorchLightning/metrics/blob/1841cad3839f5d1907a1bb8bb6a266de5c5333f9/torchmetrics/functional/classification/f_beta.py#L52
I am pretty sure it is a redundant line of code which is messing things up

codecov · 2021-06-18T15:08:35Z

Codecov Report

Merging #303 (4b01415) into master (d8b89e0) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #303      +/-   ##
==========================================
+ Coverage   96.11%   96.12%   +0.01%     
==========================================
  Files         124      124              
  Lines        3986     4002      +16     
==========================================
+ Hits         3831     3847      +16     
  Misses        155      155

Flag	Coverage Δ
Linux	`75.56% <64.70%> (-0.06%)`	⬇️
Windows	`75.56% <64.70%> (-0.06%)`	⬇️
cpu	`96.05% <100.00%> (+0.01%)`	⬆️
gpu	`96.05% <100.00%> (+0.01%)`	⬆️
macOS	`96.05% <100.00%> (+0.01%)`	⬆️
pytest	`96.12% <100.00%> (+0.01%)`	⬆️
python3.6	`95.27% <100.00%> (+0.01%)`	⬆️
python3.8	`96.05% <100.00%> (+0.01%)`	⬆️
python3.9	`95.95% <100.00%> (+0.01%)`	⬆️
torch1.3.1	`95.27% <100.00%> (+0.01%)`	⬆️
torch1.4.0	`95.35% <100.00%> (+0.01%)`	⬆️
torch1.9.0	`95.95% <100.00%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
torchmetrics/functional/classification/accuracy.py	`93.15% <100.00%> (+0.39%)`	⬆️
torchmetrics/functional/classification/f_beta.py	`100.00% <100.00%> (ø)`
...rics/functional/classification/precision_recall.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d8b89e0...4b01415. Read the comment docs.

vatch123 · 2021-06-18T15:12:53Z

Hi @vatch123,
Could you try deleting this line of code:
https://github.com/PyTorchLightning/metrics/blob/1841cad3839f5d1907a1bb8bb6a266de5c5333f9/torchmetrics/functional/classification/f_beta.py#L52

I am pretty sure it is a redundant line of code which is messing things up

Yes doing that seems to fix it. Although some other tests do fail.

Borda · 2021-06-21T08:58:23Z

@SkafteNicki @vatch123 how is it going here? :]

vatch123 · 2021-06-21T11:51:18Z

@SkafteNicki @vatch123 how is it going here? :]

Hey sorry was a bit busy this weekend. However, removing the line with @SkafteNicki suggested breaks the accuracy tests. So something odd is happening which I wasn't able to figure out then. Let me have a look again

vatch123 · 2021-06-22T17:42:27Z

Hey, @SkafteNicki I was probing the issue. On local testing this particular test fails test_accuracy.py::test_average_accuracy[preds4-target4-4-0.041666666666666664-macro-samplewise] that is due to the following assertion error

===================================================== FAILURES =====================================================
__________________ test_average_accuracy[preds4-target4-4-0.041666666666666664-macro-samplewise] ___________________

preds = tensor([[[[0.1000, 0.1000, 0.1000],
          [0.2000, 0.2000, 0.2000],
          [0.3000, 0.3000, 0.3000],
          ...0.1000],
          [0.2000, 0.2000, 0.2000],
          [0.3000, 0.3000, 0.3000],
          [0.4000, 0.4000, 0.4000]]]])
target = tensor([[[1, 1, 0],
         [2, 2, 2],
         [3, 3, 3]],

        [[2, 2, 0],
         [1, 1, 1],
         [0, 0, 0]]])
num_classes = 4, exp_result = 0.041666666666666664, average = 'macro', mdmc_average = 'samplewise'

    @pytest.mark.parametrize(
        "preds, target, num_classes, exp_result, average, mdmc_average",
        [
            (_topk_preds_mcls, _topk_target_mcls, 4, 1 / 4, "macro", None),
            (_topk_preds_mcls, _topk_target_mcls, 4, 1 / 6, "weighted", None),
            (_topk_preds_mcls, _topk_target_mcls, 4, [0., 0., 0., 1.], "none", None),
            (_topk_preds_mcls, _topk_target_mcls, 4, 1 / 6, "samples", None),
            (_topk_preds_mdmc, _topk_target_mdmc, 4, 1 / 24, "macro", "samplewise"),
            (_topk_preds_mdmc, _topk_target_mdmc, 4, 1 / 6, "weighted", "samplewise"),
            (_topk_preds_mdmc, _topk_target_mdmc, 4, [0., 0., 0., 1 / 6], "none", "samplewise"),
            (_topk_preds_mdmc, _topk_target_mdmc, 4, 1 / 6, "samples", "samplewise"),
            (_topk_preds_mdmc, _topk_target_mdmc, 4, 1 / 6, "samples", "global"),
            (_av_preds_ml, _av_target_ml, 4, 5 / 8, "macro", None),
            (_av_preds_ml, _av_target_ml, 4, 0.70000005, "weighted", None),
            (_av_preds_ml, _av_target_ml, 4, [1 / 2, 1 / 2, 1., 1 / 2], "none", None),
            (_av_preds_ml, _av_target_ml, 4, 5 / 8, "samples", None),
        ],
    )
    def test_average_accuracy(preds, target, num_classes, exp_result, average, mdmc_average):
        acc = Accuracy(num_classes=num_classes, average=average, mdmc_average=mdmc_average)
    
        for batch in range(preds.shape[0]):
            acc(preds[batch], target[batch])
    
>       assert (acc.compute() == tensor(exp_result)).all()
E       assert tensor(False)
E        +  where tensor(False) = <built-in method all of Tensor object at 0x7f04240c6f40>()
E        +    where <built-in method all of Tensor object at 0x7f04240c6f40> = tensor(0.1667) == tensor(0.0417).all
E        +      where tensor(0.1667) = <function Accuracy.compute at 0x7f04b3c2e0d0>()
E        +        where <function Accuracy.compute at 0x7f04b3c2e0d0> = Accuracy().compute
E        +      and   tensor(0.0417) = tensor(0.041666666666666664)

test_accuracy.py:294: AssertionError

Seems like the calculated accuracy is getting rounded off somewhere. But the trace shows one more tensor(0.1667). Am I interpreting it correctly? The entire test suit is taking lot of time to run at my end don't know why though.

maximsch2 · 2021-06-22T19:37:09Z

As another comment, can you please also add the test case you used to find the issue to the tests to avoid regressions in the future?

For test suit being slow - you should be able to run just individual test files in pytest while you are debugging a particular test to avoid waiting for everything to run.

… into pr/vatch123/303

vatch123 · 2021-06-28T15:19:24Z

Hey @SkafteNicki, I am a little stuck here. I can't figure out which inputs are breaking the tests on F1. The tests don't run locally completely throwing this error - RuntimeError: Application timeout caused pair closure and on the CI also I can only see the tests which break no other details as CI runs don't complete. I am assuming that the names of the tests such as test_fbeta_f1[True-True-preds6-target6-5-None-None-_sk_fbeta_f1-None-macro-metric_class0-metric_fn0-sk_fn0] have some information about the inputs of the test but I am unable to correlate it completely.

Borda · 2021-07-01T19:26:01Z

@SkafteNicki @vatch123 any chance we get for tomorrow bugfix release?

vatch123 · 2021-07-03T17:05:15Z

Hi, Sorry for the late reply I am little stuck here and actually unable to figure what is actually breaking. I guess I would need some more inputs on this.

Hey @SkafteNicki, I am a little stuck here. I can't figure out which inputs are breaking the tests on F1. The tests don't run locally completely throwing this error - RuntimeError: Application timeout caused pair closure and on the CI also I can only see the tests which break no other details as CI runs don't complete. I am assuming that the names of the tests such as test_fbeta_f1[True-True-preds6-target6-5-None-None-_sk_fbeta_f1-None-macro-metric_class0-metric_fn0-sk_fn0] have some information about the inputs of the test but I am unable to correlate it completely.

tests/classification/test_accuracy.py

torchmetrics/functional/classification/f_beta.py

Borda · 2021-07-26T07:45:01Z

@vatch123 how is it going here? 🐰

vatch123 · 2021-07-27T16:27:30Z

@vatch123 how is it going here? 🐰

Hey. Sorry been busy recently did not get a chance to probe deeper. I tried with a debugger but haven't been able to pinpoint the error. Will need some time and help on this I guess.

for more information, see https://pre-commit.ci

vatch123 · 2021-08-02T17:42:26Z

Hey @SkafteNicki. Thanks for the help here. Can you let me know what was the actual issue? I see you mentioned dist_sync_on_step not working and you added one more test. Can you just briefly describe the issue? Thanks again.

Scass0807 · 2021-09-21T17:19:37Z

Yes @SkafteNicki @Borda Can someone please explain whether or not thins works for dist_sync_on_step. During testing I am getting a macro accuracy value of 0.1996 when it should be about 0.65. I calculated manually after testing by extracting all the data to a csv.

fix weights for nonexisting classes

ef0e947

Borda added the bug / fix Something isn't working label Jun 17, 2021

vatch123 and others added 2 commits June 18, 2021 21:21

Merge branch 'master' into bug/macro_metrics

df136e0

fix division by zero

a8ad678

Borda marked this pull request as ready for review June 18, 2021 19:01

Borda requested review from ananyahjha93, Borda, justusschock, SkafteNicki and tchaton as code owners June 18, 2021 19:01

Merge branch 'master' into bug/macro_metrics

7c57569

SkafteNicki and others added 5 commits June 23, 2021 11:20

part fix

4c9bc45

Merge branch 'bug/macro_metrics' of https://github.com/vatch123/metrics…

1668e1d

… into pr/vatch123/303

add test case

b62c82b

Merge branch 'master' into bug/macro_metrics

ab29824

Merge branch 'master' into bug/macro_metrics

ec186d3

Merge branch 'master' into bug/macro_metrics

eaff5bb

Borda reviewed Jul 7, 2021

View reviewed changes

tests/classification/test_accuracy.py Outdated Show resolved Hide resolved

Borda reviewed Jul 7, 2021

View reviewed changes

torchmetrics/functional/classification/f_beta.py Outdated Show resolved Hide resolved

mergify bot added ready and removed ready labels Jul 26, 2021

This was referenced Jul 26, 2021

Wrong numbers for multiclass acc/prec/f1 if some class is not listed in true/pred tensors #295

Closed

The Macro-F1 metric is being calculated incorrectly. #403

Closed

SkafteNicki linked an issue Jul 26, 2021 that may be closed by this pull request

The Macro-F1 metric is being calculated incorrectly. #403

Closed

Merge branch 'master' into bug/macro_metrics

324982f

antonibigata approved these changes Jul 27, 2021

View reviewed changes

Borda assigned SkafteNicki Jul 28, 2021

Merge branch 'master' into bug/macro_metrics

a48f638

mergify bot added the has conflicts label Jul 29, 2021

Merge branch 'master' into bug/macro_metrics

51b2147

mergify bot removed the has conflicts label Aug 2, 2021

SkafteNicki and others added 4 commits August 2, 2021 12:20

dist_sync not working

de57083

trying to fix

7d04724

[pre-commit.ci] auto fixes from pre-commit.com hooks

48e8e9c

for more information, see https://pre-commit.ci

Merge branch 'master' into bug/macro_metrics

6704190

mergify bot added the ready label Aug 2, 2021

Merge branch 'master' into bug/macro_metrics

2cd3c19

SkafteNicki approved these changes Aug 2, 2021

View reviewed changes

SkafteNicki enabled auto-merge (squash) August 2, 2021 14:47

Borda removed the waiting on author label Aug 2, 2021

mergify bot added the has conflicts label Aug 2, 2021

Merge branch 'master' into bug/macro_metrics

289f347

mergify bot removed the has conflicts label Aug 2, 2021

[pre-commit.ci] auto fixes from pre-commit.com hooks

4b01415

for more information, see https://pre-commit.ci

SkafteNicki merged commit 79cb5e2 into Lightning-AI:master Aug 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix metrics in macro average #303

Fix metrics in macro average #303

vatch123 commented Jun 17, 2021

vatch123 commented Jun 17, 2021

SkafteNicki commented Jun 17, 2021

codecov bot commented Jun 18, 2021 •

edited

Loading

vatch123 commented Jun 18, 2021 •

edited

Loading

Borda commented Jun 21, 2021

vatch123 commented Jun 21, 2021

vatch123 commented Jun 22, 2021

maximsch2 commented Jun 22, 2021

vatch123 commented Jun 28, 2021

Borda commented Jul 1, 2021

vatch123 commented Jul 3, 2021

Borda commented Jul 26, 2021

vatch123 commented Jul 27, 2021

vatch123 commented Aug 2, 2021

Scass0807 commented Sep 21, 2021

Fix metrics in macro average #303

Fix metrics in macro average #303

Conversation

vatch123 commented Jun 17, 2021

Before submitting

What does this PR do?

PR review

Did you have fun?

vatch123 commented Jun 17, 2021

SkafteNicki commented Jun 17, 2021

codecov bot commented Jun 18, 2021 • edited Loading

Codecov Report

vatch123 commented Jun 18, 2021 • edited Loading

Borda commented Jun 21, 2021

vatch123 commented Jun 21, 2021

vatch123 commented Jun 22, 2021

maximsch2 commented Jun 22, 2021

vatch123 commented Jun 28, 2021

Borda commented Jul 1, 2021

vatch123 commented Jul 3, 2021

Borda commented Jul 26, 2021

vatch123 commented Jul 27, 2021

vatch123 commented Aug 2, 2021

Scass0807 commented Sep 21, 2021

codecov bot commented Jun 18, 2021 •

edited

Loading

vatch123 commented Jun 18, 2021 •

edited

Loading