Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Migrate to Pytorch Lightning #323

Merged
merged 214 commits into from
Jan 28, 2021
Merged
Changes from 1 commit
Commits
Show all changes
214 commits
Select commit Hold shift + click to select a range
0762a2f
work in progress
ant0nsc Nov 17, 2020
3c5cf48
training loop running
ant0nsc Nov 18, 2020
e1dfbf0
more changes, but training now broken
ant0nsc Nov 19, 2020
90b8db9
fix training
ant0nsc Nov 19, 2020
4b0f5be
HelloWorld running end to end
ant0nsc Nov 19, 2020
c1ab41d
small cleanup
ant0nsc Nov 19, 2020
e69b56d
Set seeds correctly
Shruthi42 Nov 20, 2020
60e6509
Switch to using our set_random_seed function
Shruthi42 Nov 20, 2020
a071cd6
work in progress: scalar models
ant0nsc Nov 20, 2020
90ea22b
Merge remote-tracking branch 'origin/shbannur/pl-patches' into antons…
ant0nsc Nov 20, 2020
12f02a7
Scalar models are training, first regression tests are passing
ant0nsc Nov 20, 2020
aa3a049
making seg run again
ant0nsc Nov 23, 2020
5130fbe
hello world is running
ant0nsc Nov 23, 2020
fd3b996
scalar inference passing
ant0nsc Nov 23, 2020
b9abb50
test_train_2d_classification_model passes
ant0nsc Nov 25, 2020
5a7afdb
Merge remote-tracking branch 'origin/master' into antonsc/pl
ant0nsc Nov 25, 2020
271ea9b
more test fixes
ant0nsc Nov 25, 2020
5b9a25a
trainer error messages
ant0nsc Nov 26, 2020
594e497
changes to enable ddp
ant0nsc Nov 27, 2020
e265762
error message when missing store
ant0nsc Nov 27, 2020
7509f7b
enable DDP via script
ant0nsc Nov 27, 2020
72be461
test on GPU
ant0nsc Nov 27, 2020
7113d09
move logers out of config
ant0nsc Nov 30, 2020
d6bf552
Log to MLFlow, sync_dist
ant0nsc Nov 30, 2020
c06cb99
avoid blobxfer
ant0nsc Nov 30, 2020
bd6e641
setting run ID
ant0nsc Nov 30, 2020
2d34b87
Fix tests
javier-alvarez Dec 1, 2020
26600df
Writing epoch metrics works
ant0nsc Dec 2, 2020
bb066e9
Merge branch 'antonsc/pl' of https://github.com/microsoft/InnerEye-De…
ant0nsc Dec 2, 2020
fe38c5c
create output dir
ant0nsc Dec 2, 2020
f7109be
dtype fix
ant0nsc Dec 2, 2020
26f0c41
Fix temperature_scaling.py
javier-alvarez Dec 2, 2020
fa5c210
Merge branch 'antonsc/pl' of https://github.com/microsoft/InnerEye-De…
Dec 2, 2020
e7297c7
writing epoch metrics re-done
ant0nsc Dec 2, 2020
e1c9e24
Fix major flake8 issues
Dec 2, 2020
54b7658
clean up diagnostics
ant0nsc Dec 2, 2020
ead16d9
Merge branch 'antonsc/pl' of https://github.com/microsoft/InnerEye-De…
ant0nsc Dec 2, 2020
33b660f
Remove blobxfer
javier-alvarez Dec 2, 2020
68a67ae
Update CHANGELOG.md
javier-alvarez Dec 2, 2020
13d6b8a
Remove configs that are not required
javier-alvarez Dec 2, 2020
3e07584
Remove from environment.yml
javier-alvarez Dec 2, 2020
2b36fdc
Fix numba issue
javier-alvarez Dec 2, 2020
73d80eb
Improve CHANGELOG.md
javier-alvarez Dec 2, 2020
480320c
Fix tests
javier-alvarez Dec 2, 2020
7029269
training test is green
ant0nsc Dec 2, 2020
04ce7d0
fix for typo
ant0nsc Dec 2, 2020
741323a
Merge remote-tracking branch 'origin/jaalvare/remove_blobxfer' into a…
ant0nsc Dec 2, 2020
30dfe30
more import fixes
ant0nsc Dec 2, 2020
6bdb33c
more test fixes
ant0nsc Dec 2, 2020
a1fa505
enable sequence models
ant0nsc Dec 2, 2020
43eeae3
delete legacy model steps
ant0nsc Dec 2, 2020
6ffdcd4
print pythonpath
ant0nsc Dec 3, 2020
fe7456d
print cwd
ant0nsc Dec 3, 2020
779ec98
Merge remote-tracking branch 'origin/master' into antonsc/pl
Dec 3, 2020
59f1f75
Working around pickling problem
ant0nsc Dec 3, 2020
66c62cf
Merge branch 'antonsc/pl' of https://github.com/microsoft/InnerEye-De…
ant0nsc Dec 3, 2020
efe13ba
cleanup of args file
ant0nsc Dec 3, 2020
4fcd1ed
sys.path hack
ant0nsc Dec 3, 2020
ef98103
comments
ant0nsc Dec 3, 2020
b8aa84a
absolute path
ant0nsc Dec 3, 2020
9e4889f
reduce workers
ant0nsc Dec 3, 2020
ec7c98e
reduce number of GPUs for tests
ant0nsc Dec 3, 2020
a31c045
restore environment
ant0nsc Dec 4, 2020
683ceb5
import error
ant0nsc Dec 4, 2020
fb88437
wider format
ant0nsc Dec 4, 2020
5715048
adding time as metric
ant0nsc Dec 9, 2020
955f39f
docu
ant0nsc Dec 16, 2020
39eb2c6
Add PL metrics for scalar models (#340)
melanibe Dec 16, 2020
86c2de0
cleanup and tests
ant0nsc Dec 16, 2020
6d1f4e2
Merge branch 'antonsc/pl' of https://github.com/microsoft/InnerEye-De…
ant0nsc Dec 16, 2020
29afa1f
Checkpoint handling in Pytorch Lightning (#337)
Shruthi42 Dec 17, 2020
35295be
Update, but not working yet
ant0nsc Dec 17, 2020
5010021
Merge branch 'antonsc/pl' of https://github.com/microsoft/InnerEye-De…
ant0nsc Dec 17, 2020
199428d
training of segmentation model works, test_valid_model_train passes
ant0nsc Dec 17, 2020
7e5263a
logs the loss in both train and val, but not the other metrics
ant0nsc Dec 17, 2020
31279e1
metrics are working correctly
ant0nsc Dec 17, 2020
0665e9d
timing works for validaiton, but not for training
ant0nsc Dec 18, 2020
f5e4fec
Remove optimizer from ModelAndInfo for move to Pytorch Lightning (#341)
Shruthi42 Dec 18, 2020
33ac4df
more tests working
ant0nsc Dec 19, 2020
803bcb0
Merge branch 'antonsc/pl' of https://github.com/microsoft/InnerEye-De…
ant0nsc Dec 19, 2020
000f91d
avoid rank zero writing stats
ant0nsc Dec 19, 2020
8a7ddb3
test fix
ant0nsc Dec 19, 2020
47817c6
rename. fix voxel count test
ant0nsc Dec 21, 2020
7b353d0
more test fixes
ant0nsc Dec 21, 2020
6d0f98d
Merge remote-tracking branch 'origin/master' into antonsc/pl
ant0nsc Dec 21, 2020
f10beef
r2score test failure
ant0nsc Dec 21, 2020
fec4691
RNN tests working now
ant0nsc Dec 21, 2020
e39224e
fix blocked test
ant0nsc Dec 22, 2020
5a9c473
fix gradcam test
ant0nsc Dec 22, 2020
f2523c5
removing outdated tests
ant0nsc Dec 22, 2020
a27927c
fixing more tests
ant0nsc Dec 22, 2020
dc1cc67
Remove dependence on hardcoded run IDs in tests (#342)
Shruthi42 Dec 22, 2020
c618a62
fixing more tests
ant0nsc Dec 22, 2020
62a61d9
Pin PyJWT package, it causes auth issues
ant0nsc Dec 22, 2020
251da9e
update mlflow and other related packages to resolve hanging job
ant0nsc Dec 22, 2020
49a80ea
redoing checkpoint loading
ant0nsc Dec 22, 2020
2fac5e1
Specific AzureML logger
ant0nsc Dec 22, 2020
22de090
removing blobxfer
ant0nsc Dec 22, 2020
256cbbf
Do not refer to specific epochs in inference code, create random chec…
Shruthi42 Dec 23, 2020
163876e
improving tests
ant0nsc Dec 23, 2020
f0433ae
Merge branch 'antonsc/pl' of https://github.com/microsoft/InnerEye-De…
ant0nsc Dec 23, 2020
6eb3c71
fix more tests
ant0nsc Dec 23, 2020
bc5a3c2
fix more tests
ant0nsc Dec 23, 2020
976d3ed
fix output_to issue for test suite
ant0nsc Dec 23, 2020
830a3d0
removing weird duplication of code
ant0nsc Dec 23, 2020
98ce4ce
Avoid file upload when using single GPU training, as we do in test suite
ant0nsc Dec 23, 2020
ff7aba3
diag
ant0nsc Dec 23, 2020
90f41f9
Further cleanup
ant0nsc Dec 23, 2020
a89fb0c
Fix LR Scheduler loading from state dict, change save_step_epochs nam…
Shruthi42 Dec 24, 2020
e7d82fc
Merge branch 'master' into antonsc/pl
javier-alvarez Jan 4, 2021
7764c47
Migrate to Pytorch Lightning: Flake8 and mypy fixes (#353)
Shruthi42 Jan 6, 2021
9496107
Migrate to Pytorch Lightning - Remove ModelAndInfo and fix tests (#352)
Shruthi42 Jan 7, 2021
cfb2949
reduce batch size and workers
ant0nsc Jan 8, 2021
e5b380e
ddp
ant0nsc Jan 8, 2021
9eff3c5
remove manual dataloader initialization
ant0nsc Jan 11, 2021
c337055
run either pytest or training
ant0nsc Jan 11, 2021
b5fe93e
Merge remote-tracking branch 'origin/master' into antonsc/pl
ant0nsc Jan 11, 2021
1234466
fix mark
ant0nsc Jan 11, 2021
5e3d755
fix import error
ant0nsc Jan 11, 2021
d16a7fa
cleanup PR build, add tags
ant0nsc Jan 11, 2021
25845fb
simplify code
ant0nsc Jan 11, 2021
be16d71
fix metrics off-by-one bug
ant0nsc Jan 13, 2021
f5ae7a4
Metrics now get correctly averaged, but not yet batch weightedd
ant0nsc Jan 14, 2021
051049a
fix sync issues
ant0nsc Jan 14, 2021
61f6455
fix subject count aggregation
ant0nsc Jan 14, 2021
160ca7e
fix sync issue
ant0nsc Jan 15, 2021
e27df44
remove diag
ant0nsc Jan 15, 2021
f9fa352
Merge branch 'master' into antonsc/pl
Shruthi42 Jan 15, 2021
ba05f87
Refactoring to use custom Dice computer
ant0nsc Jan 15, 2021
8adb101
Merge branch 'antonsc/pl' of https://github.com/microsoft/InnerEye-De…
ant0nsc Jan 15, 2021
a485cad
Test for TrackedMetrics
ant0nsc Jan 18, 2021
b4f128a
partitioning model
ant0nsc Jan 18, 2021
4bf304d
Refactoring of checkpoint loading
ant0nsc Jan 18, 2021
6a3af2f
adjust batch size to PL
ant0nsc Jan 18, 2021
341f933
cleanup, but still fails with OOM
ant0nsc Jan 18, 2021
88dcf85
import fix
ant0nsc Jan 19, 2021
fa7ee38
adding no_grad
ant0nsc Jan 19, 2021
03104be
Merge remote-tracking branch 'origin/master' into antonsc/pl
ant0nsc Jan 19, 2021
84663fb
fix import errors
ant0nsc Jan 19, 2021
b74ed27
test fixes
ant0nsc Jan 19, 2021
9e8b997
test fixes
ant0nsc Jan 19, 2021
d0d279d
test and flake8 fixes
ant0nsc Jan 19, 2021
dfcc49e
mypy
ant0nsc Jan 19, 2021
632026f
mypy (#364)
Shruthi42 Jan 19, 2021
605fb2d
clean up inference tests
ant0nsc Jan 19, 2021
d87ee11
Merge branch 'antonsc/pl' of https://github.com/microsoft/InnerEye-De…
ant0nsc Jan 19, 2021
5f5b84a
test fix
ant0nsc Jan 19, 2021
4b7337f
test fix
ant0nsc Jan 19, 2021
73c0b1c
mypy
ant0nsc Jan 19, 2021
d5d1ab5
mypy
ant0nsc Jan 19, 2021
0712e1a
flake
ant0nsc Jan 19, 2021
34b02b3
test fixes
ant0nsc Jan 20, 2021
a3ec586
reformatting
ant0nsc Jan 20, 2021
5b3206f
Merge branch 'antonsc/pl' of https://github.com/microsoft/InnerEye-De…
ant0nsc Jan 20, 2021
60ee117
Merge branch 'antonsc/pl' of https://github.com/microsoft/InnerEye-De…
ant0nsc Jan 20, 2021
57ee1ac
Avoid complex dependencies for TrackedMetrics
ant0nsc Jan 20, 2021
21bde3c
old ref
ant0nsc Jan 20, 2021
2e4ed42
mypy fixes
ant0nsc Jan 20, 2021
568416f
model_id fixes
ant0nsc Jan 20, 2021
19549e5
test fix and cleanup
ant0nsc Jan 20, 2021
6820f84
Merge remote-tracking branch 'origin/master' into antonsc/pl
ant0nsc Jan 20, 2021
359d746
import fix
ant0nsc Jan 20, 2021
d034597
flake
ant0nsc Jan 20, 2021
69c90f4
DRY
ant0nsc Jan 20, 2021
84daa5a
more test and mypy fixes
ant0nsc Jan 20, 2021
b01ed5e
test fixes
ant0nsc Jan 21, 2021
9e3e4ac
ensemble refactoring
ant0nsc Jan 21, 2021
5639865
crossval fixes
ant0nsc Jan 21, 2021
52211d3
better length check
ant0nsc Jan 21, 2021
9b4d099
build to nc12
ant0nsc Jan 21, 2021
0f82a7c
Removing dead code
ant0nsc Jan 22, 2021
534b6b9
Removing dead code
ant0nsc Jan 22, 2021
73b5a2f
Removing dead code
ant0nsc Jan 22, 2021
df683ee
Removing dead code
ant0nsc Jan 22, 2021
e7aca79
remove dead code
ant0nsc Jan 22, 2021
f2c13c9
test fixes: Don't run notebooks on sequence models
ant0nsc Jan 22, 2021
1335946
printing epoch diagnostics when loading
ant0nsc Jan 25, 2021
d298f14
Using last epoch checkpoint as best
ant0nsc Jan 25, 2021
c833b97
flake
ant0nsc Jan 25, 2021
6f05002
logging cleanup
ant0nsc Jan 25, 2021
7885767
Merge remote-tracking branch 'origin/master' into antonsc/pl
ant0nsc Jan 25, 2021
4cc668a
test fixes
ant0nsc Jan 25, 2021
6133747
docu
ant0nsc Jan 25, 2021
2968ec8
only 1 recovery checkpoint
ant0nsc Jan 26, 2021
36075ca
fix failing tests, refactor hardcoded run IDs
ant0nsc Jan 26, 2021
a5f5cf7
Fix recovery paths
ant0nsc Jan 26, 2021
745b224
sleep to avoid test failures
ant0nsc Jan 26, 2021
d724dd1
diagnostics
ant0nsc Jan 26, 2021
970a979
docu
ant0nsc Jan 26, 2021
bf97b4b
upload file path fix
ant0nsc Jan 26, 2021
81fd40f
syntax fix
ant0nsc Jan 26, 2021
96d7d19
flake
ant0nsc Jan 26, 2021
32a71b9
cleaning up IO logging
ant0nsc Jan 26, 2021
9bee3df
Test fixes
ant0nsc Jan 26, 2021
fe4679d
update to latest runs
ant0nsc Jan 26, 2021
6244b90
fix tests for checkpoint handling
ant0nsc Jan 26, 2021
f81ba49
fix rest of the tests
ant0nsc Jan 26, 2021
0e58589
iml file
ant0nsc Jan 26, 2021
3b3e9df
cleanup and changelog
ant0nsc Jan 26, 2021
65588c9
increase timeout
ant0nsc Jan 26, 2021
37eeaae
lightning 1.1.6
ant0nsc Jan 26, 2021
2ab12f8
Refactoring to include all time columns
ant0nsc Jan 27, 2021
a40f462
PL 1.1.6 -> 1.0.6 again
ant0nsc Jan 27, 2021
7dc62e7
mixed precision
ant0nsc Jan 28, 2021
b4da62c
remove TODOs and dead code
ant0nsc Jan 28, 2021
1776d5a
more PR comments
ant0nsc Jan 28, 2021
f38c1f3
more PR comments
ant0nsc Jan 28, 2021
7f1c257
more PR comments
ant0nsc Jan 28, 2021
1309792
more PR comments
ant0nsc Jan 28, 2021
b6a2c2b
move
ant0nsc Jan 28, 2021
8f40358
split lightning_models.py into pieces
ant0nsc Jan 28, 2021
85693a2
docu update
ant0nsc Jan 28, 2021
06a2c8b
PR updates
ant0nsc Jan 28, 2021
10eaa66
avoid 16bit
ant0nsc Jan 28, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
remove TODOs and dead code
ant0nsc committed Jan 28, 2021
commit b4da62c6b7443249df0fed5077bb1341842903f2
2 changes: 1 addition & 1 deletion InnerEye/Common/generic_parsing.py
Original file line number Diff line number Diff line change
@@ -56,7 +56,7 @@ def get_cuda_devices(self) -> List[Any]:
from torch.cuda import device_count
from torch import device
if self.use_gpu:
return [device(type='cuda', index=ii) for ii in list(range(device_count()))]
return [device(type='cuda', index=i) for i in list(range(device_count()))]
else:
return []

11 changes: 0 additions & 11 deletions InnerEye/ML/dataset/full_image_dataset.py
Original file line number Diff line number Diff line change
@@ -49,17 +49,6 @@ def collate_with_metadata(batch: List[Dict[str, Any]]) -> Dict[str, Any]:
raise TypeError(f"Unexpected batch data: Expected a dictionary, but got: {type(elem)}")


# TODO antonsc: Remove?
def set_random_seed_for_dataloader_worker(worker_id: int) -> None:
"""
Set the seed for the random number generators of python, numpy.
"""
# Set the seeds for numpy and python random based on the offset of the worker_id and initial seed,
# converting the initial_seed which is a long to modulo int32 which is what numpy expects.
random_seed = (torch.initial_seed() + worker_id) % (2 ** 32)
ml_util.set_random_seed(random_seed, f"Data loader worker ({worker_id})")


class _RepeatSampler(BatchSampler):
"""
A batch sampler that wraps another batch sampler. It repeats the contents of that other sampler forever.
9 changes: 0 additions & 9 deletions InnerEye/ML/lightning_models.py
Original file line number Diff line number Diff line change
@@ -710,7 +710,6 @@ def __init__(self, config: ScalarModelBase, *args: Any, **kwargs: Any) -> None:
self.train_metric_computers = self.create_metric_computers()
self.val_metric_computers = self.create_metric_computers()

# TODO antonsc: Work out how we handle mean teacher model
# if config.compute_grad_cam:
# model_to_evaluate = self.train_val_params.mean_teacher_model if \
# config.compute_mean_teacher_model else self.train_val_params.model
@@ -841,14 +840,6 @@ def compute_and_log_metrics(self,
LoggingColumns.Label.value: label,
LoggingColumns.DataSplit.value: data_split.value
})
# TODO antonsc: Find a better place for this code. We can only draw plots once all results are aggregated,
# maybe move to the report?
# if self._should_save_regression_error_plot(self.current_epoch):
# error_plot_name = f"error_plot_{self.train_val_params.epoch}"
# path = str(self.config.outputs_folder / f"{error_plot_name}.png")
# plot_variation_error_prediction(epoch_metrics.get_labels(), epoch_metrics.get_predictions(), path)
# logger = self.config.azure_loggers_train if is_training else self.config.azure_loggers_val
# logger.log_image(error_plot_name, path)

def training_or_validation_epoch_end(self, is_training: bool) -> None:
"""
4 changes: 0 additions & 4 deletions InnerEye/ML/model_testing.py
Original file line number Diff line number Diff line change
@@ -44,10 +44,6 @@
THUMBNAILS_FOLDER = "thumbnails"


# TODO antonsc:
# We need to clarify if we want to keep the ability to test on multiple checkpoints


def model_test(config: ModelConfigBase,
data_split: ModelExecutionMode,
checkpoint_handler: CheckpointHandler,
6 changes: 3 additions & 3 deletions InnerEye/ML/model_training.py
Original file line number Diff line number Diff line change
@@ -139,8 +139,7 @@ def model_train(config: ModelConfigBase,
config.read_dataset_if_needed()

# Create the trainer object. Backup the environment variables before doing that, in case we need to run a second
# training in the unit tests.
# TODO antonsc: Can we do in-situ cross validation with multiple GPUs still?
# training in the unit tests.d
old_environ = dict(os.environ)
trainer, storing_logger = create_lightning_trainer(config, checkpoint_path)

@@ -182,7 +181,8 @@ def model_train(config: ModelConfigBase,
logging.info("Starting training")

lightning_data = TrainingAndValidationDataLightning(config) # type: ignore
# TODO: Why can't we do that in the constructor?
# When trying to store the config object in the constructor, it does not appear to get stored at all, later
# reference of the object simply fail. Hence, have to set explicitly here.
lightning_data.config = config
trainer.fit(lightning_model,
datamodule=lightning_data)
1 change: 0 additions & 1 deletion InnerEye/ML/pipelines/scalar_inference.py
Original file line number Diff line number Diff line change
@@ -86,7 +86,6 @@ def create_from_checkpoint(path_to_checkpoint: Path,
logging.warning(f"Could not recover model from checkpoint path {path_to_checkpoint}")
return None
if config.compute_mean_teacher_model:
# TODO antonsc: Need to adjust that
raise NotImplementedError("Mean teacher models not supported yet.")
else:
model = load_from_checkpoint_and_adjust_for_inference(config, path_to_checkpoint)
2 changes: 0 additions & 2 deletions InnerEye/ML/run_ml.py
Original file line number Diff line number Diff line change
@@ -258,8 +258,6 @@ def run(self) -> None:
# train a new model if required
if self.azure_config.train:
with logging_section("Model training"):
# TODO antonsc: Return the ModelCheckpoint object here, with the path to the best checkpoints,
# or convert it into a checkpoint_handler object
model_train(self.model_config, checkpoint_handler)
else:
self.model_config.write_dataset_files()
9 changes: 3 additions & 6 deletions InnerEye/ML/utils/dataset_util.py
Original file line number Diff line number Diff line change
@@ -139,21 +139,18 @@ def __post_init__(self) -> None:


def store_and_upload_example(dataset_example: DatasetExample,
args: Optional[SegmentationModelBase],
args: Optional[SegmentationModelBase] = None,
images_folder: Optional[Path] = None) -> None:
"""
Stores an example input and output of the network to Nifti files.

:param dataset_example: The dataset example, with image, label and prediction, that should be written.
:param args: configuration information to be used for normalization. TODO: This should not be optional why is this
assigning to example_images_folder
:param args: configuration information to be used for normalization.
:param images_folder: The folder to which the result Nifti files should be written. If args is not None,
the args.example_images_folder is used instead.
"""

folder = Path("") if images_folder is None else images_folder
if args is not None:
folder = args.example_images_folder
folder = images_folder or args.example_images_folder
if folder != "" and not os.path.exists(folder):
os.mkdir(folder)

24 changes: 0 additions & 24 deletions InnerEye/ML/utils/hdf5_util.py
Original file line number Diff line number Diff line change
@@ -127,27 +127,3 @@ def from_file(cls: Type[T], hdf5_path: Path, load_segmentation: bool) -> T:
volume=volume,
segmentation=segmentation,
acquisition_date=acquisition_date)


def load_labels(hdf5: HDF5Object) -> np.ndarray:
"""
Load labels containing segmentation binary labels in one-hot-encoding.
:return A numpy array containing ground-truth information.
"""
# For labels we are using the segmentation data provided in the HDF5 files.
labels = hdf5.segmentation # 1 x N x H x W
n_classes = int(np.amax(labels) - np.amin(labels)) + 1
labels = multi_label_array_to_binary(labels, n_classes)
return labels.astype(HDF5ImageDataType.SEGMENTATION.value)


def get_mask(hdf5_object: HDF5Object) -> np.ndarray:
"""
TODO: Replace this with actual mask
:param hdf5_object:
:return:
"""
img_shape = hdf5_object.volume.shape
mask = np.ones(img_shape, dtype=HDF5ImageDataType.MASK.value)
mask[-1, -1, -1] = 0
return mask
1 change: 0 additions & 1 deletion InnerEye/ML/utils/metrics_util.py
Original file line number Diff line number Diff line change
@@ -125,7 +125,6 @@ def get_number_of_voxels_per_class(labels: torch.Tensor) -> torch.Tensor:
if len(labels.shape) == 4:
labels = labels[None, ...]

# TODO antonsc: Switch to Pytorch 1.7 and use torch.count_nonzero
return torch.tensor(np.count_nonzero(labels.cpu().numpy(), axis=(2, 3, 4)))


4 changes: 0 additions & 4 deletions InnerEye/ML/utils/temperature_scaling.py
Original file line number Diff line number Diff line change
@@ -82,10 +82,6 @@ def eval_criterion() -> torch.Tensor:
# zero the gradients for the next optimization step
optimizer.zero_grad()
loss, ece = criterion_fn(self.temperature_scale(logits), labels)
# TODO antonsc: re-enable logging
# if logger:
# logger.log_to_azure_and_tensorboard("Temp_Scale_LOSS", loss.item())
# logger.log_to_azure_and_tensorboard("Temp_Scale_ECE", ece.item())
loss.backward()
return loss

1 change: 0 additions & 1 deletion InnerEye/ML/visualizers/plot_cross_validation.py
Original file line number Diff line number Diff line change
@@ -322,7 +322,6 @@ def download_metrics_file(config: PlotCrossValidationConfig,
if config.model_category == ModelCategory.Segmentation:
if epoch is None:
raise ValueError("Epoch must be provided in segmentation runs")
# TODO remove epoch arg here
src = get_epoch_results_path(mode) / SUBJECT_METRICS_FILE_NAME
else:
src = Path(mode.value) / SUBJECT_METRICS_FILE_NAME
Original file line number Diff line number Diff line change
@@ -187,7 +187,6 @@ def _get_mock_sequence_dataset(dataset_contents: Optional[str] = None) -> pd.Dat
(True, ImagingFeatureType.ImageAndSegmentation)])
@pytest.mark.parametrize("combine_hidden_state", (True, False))
@pytest.mark.parametrize("use_encoder_layer_norm", (True, False))
# TODO antonsc: re-enable when mean teacher is back in
@pytest.mark.parametrize("use_mean_teacher_model", (False,))
@pytest.mark.gpu
def test_rnn_classifier_via_config_1(use_combined_model: bool,
@@ -388,7 +387,7 @@ def test_rnn_classifier_via_config_2(test_output_dirs: OutputFolderForTests) ->
print(f"Validation loss after {config.num_epochs} epochs: {actual_val_loss}")
assert actual_train_loss <= expected_max_train_loss, "Training loss too high"
assert actual_val_loss <= expected_max_val_loss, "Validation loss too high"
# TODO antonsc: put back in when temperature scaling is enabled again
# Issue #374: put back in when temperature scaling is enabled again
# assert np.allclose(results.optimal_temperature_scale_values_per_checkpoint_epoch, [0.97], rtol=0.1)


2 changes: 1 addition & 1 deletion Tests/ML/models/test_scalar_model.py
Original file line number Diff line number Diff line change
@@ -291,7 +291,7 @@ def test_scalar_metrics(has_hues: bool, is_classification: bool) -> None:
labels = [[2.0, 2.0, 2.0], [1.0, 1.0, 1.0]]
expected_accuracy = [0.25, 5, 0]
accuracy_metric_key = MetricType.MEAN_SQUARED_ERROR.value
# TODO antonsc: We have odd values here for ExplainedVariance, and had already for r2score
# Issue #373: We have odd values here for ExplainedVariance, and had already for r2score
expected_info_format_strs = [
"MeanSquaredError: 0.2500, MeanAbsoluteError: 0.5000, ExplainedVariance: 0.0000",
"MeanSquaredError: 5.0000, MeanAbsoluteError: 2.0000, ExplainedVariance: -19.0000",
1 change: 0 additions & 1 deletion Tests/ML/pipelines/test_forward_pass.py
Original file line number Diff line number Diff line change
@@ -54,7 +54,6 @@ def test_use_gpu_flag(use_gpu_override: bool) -> None:
assert config.use_gpu == use_gpu_override

# @pytest.mark.azureml
# TODO antonsc: re-enable once we have mean teacher in place again
# def test_mean_teacher_model(test_output_dirs: OutputFolderForTests) -> None:
# """
# Test training and weight updates of the mean teacher model computation.
2 changes: 1 addition & 1 deletion Tests/ML/test_model_testing.py
Original file line number Diff line number Diff line change
@@ -131,7 +131,7 @@ def __init__(self) -> None:
[(SimpleUNet(), InferencePipeline, EnsemblePipeline),
(ClassificationModelForTesting(mean_teacher_model=False),
ScalarInferencePipeline, ScalarEnsemblePipeline),
# TODO: re-enable once we have mean teacher in place again
# Re-enable once we have mean teacher in place again
# (ClassificationModelForTesting(mean_teacher_model=True),
# ScalarInferencePipeline, ScalarEnsemblePipeline)
])
1 change: 0 additions & 1 deletion Tests/ML/test_model_train_test_and_recovery.py
Original file line number Diff line number Diff line change
@@ -19,7 +19,6 @@
from Tests.ML.util import get_default_checkpoint_handler


# TODO: re-enable once we have mean teacher in place again
# @pytest.mark.parametrize("mean_teacher_model", [True, False])
@pytest.mark.parametrize("mean_teacher_model", [False])
def test_recover_testing_from_run_recovery(mean_teacher_model: bool,
4 changes: 1 addition & 3 deletions Tests/ML/test_model_training.py
Original file line number Diff line number Diff line change
@@ -187,9 +187,7 @@ def assert_all_close(metric: str, expected: List[float], **kwargs: Any) -> None:
# Logging the metric is called, but they never make it to the logger object.
# model_training_result.get_training_metric(MetricType.SECONDS_PER_BATCH.value)

# TODO antonsc: Check that both Train and Val epoch_metrics.csv have all relevant columns and 2 rows

# TODO antonsc: enable
# Issue #372
# # Test for saving of example images
# assert train_config.example_images_folder.is_dir()
# example_files = list(train_config.example_images_folder.rglob("*.*"))
2 changes: 1 addition & 1 deletion Tests/ML/utils/test_io_util.py
Original file line number Diff line number Diff line change
@@ -172,7 +172,7 @@ def test_save_dataset_example(test_output_dirs: OutputFolderForTests) -> None:
labels=labels)

images_folder = test_output_dirs.root_dir
store_and_upload_example(dataset_sample, None, images_folder)
store_and_upload_example(dataset_sample, images_folder=images_folder)
image_from_disk = io_util.load_nifti_image(os.path.join(images_folder, "p2_e_1_image.nii.gz"))
labels_from_disk = io_util.load_nifti_image(os.path.join(images_folder, "p2_e_1_label.nii.gz"))
prediction_from_disk = io_util.load_nifti_image(os.path.join(images_folder, "p2_e_1_prediction.nii.gz"))