Add class-wise metrics logging and confusion matrix to DeepMIL #647

harshita-s · 2022-01-28T08:41:22Z

In this PR:

Absolute confusion matrix for test set is printed along with other metrics at the end of the run
Normalized confusion matrix for test set is saved as a heatmap figure in outputs/fig
Per-class accuracy is logged every epoch for train and validation stages (plots are visible along with other metrics)
Class names are added as an argument to DeepMIL (could be passed through the containers depending on task, default None)
Test for normalization and plotting the normalized confusion matrix

ant0nsc · 2022-01-31T11:25:24Z

InnerEye/ML/Histopathology/models/deepmil.py

@@ -53,7 +55,8 @@ def __init__(self,
                 verbose: bool = False,
                 slide_dataset: SlidesDataset = None,
                 tile_size: int = 224,
-                 level: int = 1) -> None:
+                 level: int = 1,
+                 class_names: List[str] = None) -> None:


Type annotation has a small discrepancy - if you set the default to None, it should be "Optional[List[str]]".

Thanks, now changed

InnerEye/ML/Histopathology/models/deepmil.py

ant0nsc · 2022-01-31T11:29:08Z

InnerEye/ML/Histopathology/models/deepmil.py

+                                  'precision': Precision(),
+                                  'recall': Recall(),
+                                  'f1score': F1(),
+                                  'confusion_matrix': ConfusionMatrix(num_classes=self.n_classes+1)})


Seeing this here: It would be good to add documentation for n_classes beyond what you have now. For two classes "0" and "1", n_classes should be set to 1, correct?

Thanks, added to the docstring

ant0nsc · 2022-01-31T11:35:14Z

InnerEye/ML/Histopathology/models/deepmil.py


    def log_metrics(self,
                    stage: str) -> None:
        valid_stages = ['train', 'test', 'val']
        if stage not in valid_stages:
            raise Exception(f"Invalid stage. Chose one of {valid_stages}")
        for metric_name, metric_object in self.get_metrics_dict(stage).items():
-            self.log(f'{stage}/{metric_name}', metric_object, on_epoch=True, on_step=False, logger=True, sync_dist=True)
+            if not metric_name == "confusion_matrix":
+                self.log(f'{stage}/{metric_name}', metric_object, on_epoch=True, on_step=False, logger=True, sync_dist=True)


can you replace the self.log calls with hi-ml's "log_on_epoch" function? that gives you a simpler interface and handle sync_dist better.

(also in the else branch)

wondering if log_on_epoch will work with torch module metrics objects?

This is now changed, thanks

ant0nsc · 2022-01-31T11:37:04Z

InnerEye/ML/Histopathology/models/deepmil.py


    def log_metrics(self,
                    stage: str) -> None:
        valid_stages = ['train', 'test', 'val']
        if stage not in valid_stages:
            raise Exception(f"Invalid stage. Chose one of {valid_stages}")
        for metric_name, metric_object in self.get_metrics_dict(stage).items():
-            self.log(f'{stage}/{metric_name}', metric_object, on_epoch=True, on_step=False, logger=True, sync_dist=True)
+            if not metric_name == "confusion_matrix":


I see you have some if statements that are "if not A then something() else otherthing()". If you have anyway handling both cases, it is easier to read the code if you do not negate the condition and swap if/else.

Thanks, I swapped if/else

ant0nsc · 2022-01-31T11:40:46Z

InnerEye/ML/Histopathology/models/deepmil.py

@@ -338,6 +357,18 @@ def test_epoch_end(self, outputs: List[Dict[str, Any]]) -> None:  # type: ignore
        fig = plot_scores_hist(results)
        self.save_figure(fig=fig, figpath=outputs_fig_path / 'hist_scores.png')

+        print("Computing and saving confusion matrix...")
+        metrics_dict = self.get_metrics_dict('test')
+        cf_matrix = metrics_dict["confusion_matrix"].compute()


This literal "confusion_matrix" needs to be in sync with your other uses of "confusion_matrix". This is an extremely common source of errors - you change the constant somewhere, and forget to change is somewhere else (as trivial/benign as this may sound). Your code will be a lot safer (and require fewer tests) if you define those literals as constants, CONF_MATRIX_METRIC = "confusion_matrix" and re-use it.

Thanks for the suggestion, now defined metrics names as constants

ant0nsc · 2022-01-31T11:44:30Z

InnerEye/ML/configs/histo_configs/classification/DeepSMILECrck.py

+        absolute_checkpoint_path_parent = Path(fixed_paths.repository_parent_directory(),
+                                    self.checkpoint_folder_path,
+                                    self.best_checkpoint_filename_with_suffix)
+        if absolute_checkpoint_path_parent.is_file():
+            return absolute_checkpoint_path_parent


Confused. This and the variable above are exactly the same?

I have taken this from our other config.

InnerEye-DeepLearning/InnerEye/ML/configs/histo_configs/classification/DeepSMILEPanda.py

Line 148 in fb258d5

def get_path_to_best_checkpoint(self) -> Path:

It was only to enable this config to work and may not be related to this PR. Will remove for now for another PR

ant0nsc · 2022-01-31T11:45:55Z

InnerEye/ML/configs/histo_configs/classification/DeepSMILECrck.py

+        absolute_checkpoint_path = Path(fixed_paths.repository_root_directory(),
+                                        self.checkpoint_folder_path,
+                                        self.best_checkpoint_filename_with_suffix)
+        if absolute_checkpoint_path.is_file():
+            return absolute_checkpoint_path


I'm puzzled. Checkpoints are normally stored in the "outputs" folder, so that they are also available at the end of an AzureML run. Why are the checkpoints here accessed as part of repository root?

I have taken this from our other config

InnerEye-DeepLearning/InnerEye/ML/configs/histo_configs/classification/DeepSMILEPanda.py

Line 148 in fb258d5

def get_path_to_best_checkpoint(self) -> Path:

It was only to enable the Crck config to work and may not be related to this PR. Will remove for now for another PR

ant0nsc · 2022-01-31T11:47:23Z

Tests/ML/histopathology/utils/test_metrics_utils.py

+    assert file.exists()
+    expected = full_ml_test_data_path("histo_heatmaps") / f"confusion_matrix_{n_classes}.png"
+    # To update the stored results, uncomment this line:
+    expected.write_bytes(file.read_bytes())


This should not be a checked in - your test will always pass now!

Suggested change

expected.write_bytes(file.read_bytes())

# expected.write_bytes(file.read_bytes())

Thanks for spotting this, changed now

dccastro

Great job! No other comments 👍

Harshita Sharma added 6 commits January 27, 2022 10:40

confusion matrix at test end

4cfbfe3

cm figure and print output

a399b5e

add test for plot_normalized_confusion_matrix

d66083a

perclass accuracy log

c60da3d

test for normalized cm

186a72b

changelog

dd82dd3

ant0nsc reviewed Jan 31, 2022

View reviewed changes

Harshita Sharma added 3 commits February 1, 2022 08:16

PR comments addressed

a60e62b

restore deepsmilecrck

88e2f39

replace self.log with log_on_epoch

d34d50c

ant0nsc approved these changes Feb 1, 2022

View reviewed changes

dccastro approved these changes Feb 1, 2022

View reviewed changes

ant0nsc merged commit 710bc36 into main Feb 1, 2022

ant0nsc deleted the hsharma/perclassmetrics branch February 1, 2022 13:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add class-wise metrics logging and confusion matrix to DeepMIL #647

Add class-wise metrics logging and confusion matrix to DeepMIL #647

harshita-s commented Jan 28, 2022

ant0nsc Jan 31, 2022

harshita-s Feb 1, 2022

ant0nsc Jan 31, 2022

harshita-s Feb 1, 2022

ant0nsc Jan 31, 2022

ant0nsc Jan 31, 2022

harshita-s Feb 1, 2022

harshita-s Feb 1, 2022

ant0nsc Jan 31, 2022

harshita-s Feb 1, 2022

ant0nsc Jan 31, 2022

harshita-s Feb 1, 2022

ant0nsc Jan 31, 2022

harshita-s Jan 31, 2022

ant0nsc Jan 31, 2022

harshita-s Jan 31, 2022

ant0nsc Jan 31, 2022

harshita-s Feb 1, 2022

dccastro left a comment

	expected.write_bytes(file.read_bytes())
	# expected.write_bytes(file.read_bytes())

Add class-wise metrics logging and confusion matrix to DeepMIL #647

Add class-wise metrics logging and confusion matrix to DeepMIL #647

Conversation

harshita-s commented Jan 28, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dccastro left a comment

Choose a reason for hiding this comment