Classification metrics overhaul: stat scores (3/n) #4839

tadejsv · 2020-11-24T21:28:40Z

This PR is a spin-off from #4835, based on new input formatting from #4837

This will provide a basis for future PRs for recall, precision, fbeta and iou metrics.

What does this PR do?

`top_k` parameter for input formatting now also works with multi-label inputs

This was done so that StatScores can also provide a basis for Recall@K and Precision@K later - because these two metrics always take multi-label inputs, and count the top K highest probability predictions as True. For multi-class inputs this parameter works as before.

This addition was done in the input formatting function. This means that multi-label inputs can now be binarized in two ways: through the threshold parameter, or through the top_k parameter. I have decided to give the top_k parameter preference if both are set.

For Top-K Accuracy multi-label inputs don't make sense (or at least I have not seen any use of it), so I have updated the Accuracy metric so that an error is raised if top_k is used with multi-label inputs.

New StatScores metric (and updated functional counterpart)

Computes stat score, i.e. true positives, false positives, true negatives, false negatives. It is used as a base for many other metrics (recall, precision, fbeta, iou). It is made to work with all types of inputs, and is very configurable. There are two main parameters here:

reduce: This determines how should the statistics be counted: globally (summing across all labels), by calsses, or by samples. The possible values (micro, macro, samples), correspond to averaging names for metrics such as precision. This is "inspired" by sklearn's averaging argument in such metrics.
mdmc_reduce: In case of multi-dimensional multi-class (mdmc) inputs, how should the statistics be reduced? This is on top of the reduce argument. The possible values are global (i.e. extra dimensions are actually sample dimensions) and samplewise (compute statistics for each sample, taking the extra dimensions as a sample-within-sample dimension).

Why? The reason for these two options (right now PL metrics implements the global option by default) is that in some "downstream" metrics, such as iou, it is, in my opinion, much more natural to compute the metric per sample, and then average accross samples, rather than join everyhing into one "blob", and compute the averages for this blob. For example, if you are doing image segmentation, it makes more sense to compute the metrics per image, as the model is trained on images, and not blobs :) Also, aggregation of everything may disguise some unwanted behavior (such as inability to predict a minority class), which would be evident if averaging was done per sample (samplewise).

Also, this class metric (and the functional equivalent) now return the stat scores concatenated in a single tensor, instead of returning a tuple. I did this because the standard metrics testing framework in PL does not support non-tensor returns - and the change should be minor for the users.

I have deprecated the stat_scores_multiple_classes metric, as stat_scores is now perfectly capable of handling multiple classes itself.

Documentation

Second part of "Input types" section with examples of the use of is_multiclass parameter with StatScores is added.

pep8speaks · 2020-11-24T21:40:53Z

Hello @tadejsv! Thanks for updating this PR.

In the file pytorch_lightning/metrics/functional/stat_scores.py:

Line 23:63: E203 whitespace before ':'

Comment last updated at 2020-12-30 18:58:06 UTC

Borda · 2020-12-29T10:15:16Z

@tadejsv @justusschock @SkafteNicki how is it going here? :]

tadejsv · 2020-12-29T10:59:30Z

@Borda @SkafteNicki @justusschock @teddykoker @rohitgr7 This is ready for (re)review :)

justusschock

I really like it!

pytorch_lightning/metrics/classification/accuracy.py

pytorch_lightning/metrics/classification/helpers.py

pytorch_lightning/metrics/classification/stat_scores.py

pytorch_lightning/metrics/functional/accuracy.py

…ics_stat_scores

rohitgr7

still reading...

pytorch_lightning/metrics/classification/helpers.py

pytorch_lightning/metrics/functional/classification.py

tests/deprecated_api/test_remove_1-3.py

pytorch_lightning/metrics/classification/helpers.py

rohitgr7

LGTM... Great work!!!!
I'd recommend waiting for other reviewers before merging.

tests/deprecated_api/test_remove_1-4.py

some defaults

SkafteNicki

Great job as always :]

tests/metrics/classification/test_stat_scores.py

tadejsv added 3 commits November 24, 2020 20:48

Add stuff

6959ea0

Change metrics documentation layout

0679015

Add stuff

35627b5

tadejsv requested review from ananyahjha93, awaelchli, Borda, justusschock, nateraw, SeanNaren, tchaton, teddykoker and williamFalcon as code owners November 24, 2020 21:28

Add stat scores

0282f3c

tadejsv added 14 commits November 24, 2020 22:41

Change testing utils

55fdaaf

Merge branch 'cls_metrics_input_formatting' into cls_metrics_accuracy

35f8320

Merge branch 'cls_metrics_input_formatting' into cls_metrics_stat_scores

dd05912

Replace len(*.shape) with *.ndim

5cbf56a

More descriptive error message for input formatting

9c33d0b

Replace movedim with permute

6562205

Merge branch 'cls_metrics_input_formatting' into cls_metrics_accuracy

b97aef2

Merge branch 'cls_metrics_input_formatting' into cls_metrics_stat_scores

74261f7

PEP 8 compliance

cbbc769

WIP

33166c5

Add reduce_scores function

801abe8

Merge branch 'cls_metrics_accuracy' into cls_metrics_stat_scores

fb181ed

Temporarily add back legacy class_reduce

fbebd34

Merge branch 'cls_metrics_stat_scores' into cls_metrics_precision_recall

b3d1b8b

tadejsv changed the title ~~Classification metrics overhaul: stat scores (2b/n)~~ Classification metrics overhaul: stat scores (3/n) Nov 24, 2020

tadejsv mentioned this pull request Nov 24, 2020

Classification metrics overhaul: precision & recall (4/n) #4842

Merged

tadejsv added 6 commits December 28, 2020 19:38

Update changelog

01e8e63

Refactoring

3e58244

Fix typo

475c706

Refactor

d387eb1

Increase coverage

d2a92e8

Fix linting

c178cb6

Borda assigned justusschock and SkafteNicki Dec 29, 2020

justusschock approved these changes Dec 29, 2020

View reviewed changes

Borda added the ready PRs ready to be merged label Dec 29, 2020

Borda previously approved these changes Dec 29, 2020

View reviewed changes

tadejsv added 3 commits December 29, 2020 19:30

Consistent use of backticks

8bf6cf1

Merge remote-tracking branch 'upstream/release/1.2-dev' into cls_metr…

b2fcd55

…ics_stat_scores

Fix too long line in docs

169fc7c

rohitgr7 reviewed Dec 29, 2020

View reviewed changes

tadejsv added 3 commits December 29, 2020 20:45

Apply suggestions from code review

21551f1

Fix deprecation test

e52fa9c

Fix deprecation test

85d6e3a

rohitgr7 approved these changes Dec 29, 2020

View reviewed changes

tests/deprecated_api/test_remove_1-4.py Show resolved Hide resolved

tadejsv added 2 commits December 29, 2020 21:17

Default threshold back to 0.5

3461159

Minor documentation fixes

fe48912

SkafteNicki approved these changes Dec 30, 2020

View reviewed changes

Borda approved these changes Dec 30, 2020

View reviewed changes

tests/metrics/classification/test_stat_scores.py Outdated Show resolved Hide resolved

Add types to tests

c2c45f1

SkafteNicki merged commit 7f71ee9 into Lightning-AI:release/1.2-dev Dec 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Classification metrics overhaul: stat scores (3/n) #4839

Classification metrics overhaul: stat scores (3/n) #4839

tadejsv commented Nov 24, 2020 •

edited

Loading

pep8speaks commented Nov 24, 2020 •

edited

Loading

Borda commented Dec 29, 2020

tadejsv commented Dec 29, 2020

justusschock left a comment

rohitgr7 left a comment

rohitgr7 left a comment •

edited

Loading

SkafteNicki left a comment

Classification metrics overhaul: stat scores (3/n) #4839

Classification metrics overhaul: stat scores (3/n) #4839

Conversation

tadejsv commented Nov 24, 2020 • edited Loading

What does this PR do?

top_k parameter for input formatting now also works with multi-label inputs

New StatScores metric (and updated functional counterpart)

Documentation

pep8speaks commented Nov 24, 2020 • edited Loading

Comment last updated at 2020-12-30 18:58:06 UTC

Borda commented Dec 29, 2020

tadejsv commented Dec 29, 2020

justusschock left a comment

Choose a reason for hiding this comment

rohitgr7 left a comment

Choose a reason for hiding this comment

rohitgr7 left a comment • edited Loading

Choose a reason for hiding this comment

SkafteNicki left a comment

Choose a reason for hiding this comment

tadejsv commented Nov 24, 2020 •

edited

Loading

`top_k` parameter for input formatting now also works with multi-label inputs

pep8speaks commented Nov 24, 2020 •

edited

Loading

rohitgr7 left a comment •

edited

Loading