-
Notifications
You must be signed in to change notification settings - Fork 143
Conversation
@@ -20,6 +20,7 @@ dependencies: | |||
- azureml-tensorboard==1.36.0 | |||
- conda-merge==0.1.5 | |||
- cryptography==3.3.2 | |||
- cucim==21.10.1; platform_system=="Linux" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This spec prevents the Windows builds from failing, as cuCIM is incompatible.
main(panda_dir="/tmp/datasets/PANDA", | ||
root_output_dir="/datadrive", | ||
level=1, | ||
from InnerEye.ML.Histopathology.datasets.tcga_prad_dataset import TcgaPradDataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If TcgaPrad is removed also this block should be removed - is it a problem we don't actually have a single dataset implementation that is compatible with this script
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following your separate suggestion, I've decided to keep TCGA-PRAD as an example, and added a clarifying comment here.
|
||
image_path = sample[dataset.IMAGE_COLUMN] | ||
assert isinstance(image_path, str) | ||
assert os.path.isfile(image_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To not leave things completely untested, do you think we could have a SlideDataset test? obviously we can't test the length or number of positives ... but we can test the dataset contains the expected keys and and that the content of the dict has the expected type. Looking at the dataset definition, if path is an existing path and we pass a dataset.csv, we can run these tests without need for mounting any real data. What you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now added a test_slides_dataset.csv
and some basic tests in test_slides_dataset.py
.
a56a599
This PR implements the following major changes in the tiling/preprocessing pipeline:
LoadROId
transform using foreground auto-segmentation using Otsu threshold by default if threshold is unspecified.create_tiles_dataset.py
andazure_tiles_creation.py
).create_panda_tiles_dataset.py
andazure_panda_tiles_creation.py
for backward-compatibility.Additionally, I've refactored our dataset classes:
SlideKey
andTileKey
schemas for indexing the respective batch dictionaries instead of hardcoded strings. Note thatTileKey
is not yet used inTilesDataset
andDeepMIL
; this will be addressed in a separate follow-up PR.SlidesDataset
, now inherited by the simplifiedPandaDataset
andTcgaPradDataset
.Other:
.tiff
file from the PANDA dataset, added viagit-lfs
.